Applications Open now for May 2024 Batch | Applications Close: May 26, 2024 | Exam: Jul 07, 2024

Applications Open now for May 2024 Batch | Applications Close: May 26, 2024 | Exam: Jul 07, 2024

Degree Level Course

Introduction to Natural Language Processing (i-NLP)

Natural language (NL) refers to the language spoken/written by humans. NL is the primary mode of communication for humans. With the growth of the world wide web, data in the form of text has grown exponentially. It calls for the development of algorithms and techniques for processing natural language for the automation and development of intelligent machines: Natural Language Processing (NLP). On the completing the course, the participant will learn the following: 1. Why is processing language computationally hard and why specialized techniques need to be developed to process texts? 2. Knowledge and in-depth understanding of linguistics techniques and classical (statistical) approaches (pre-deep learning era) to NLP and their limitations. 3. Knowledge and in-depth understanding of deep learning approaches (RNN and CNN) to NLP. 4. Knowledge and in-depth understanding of Attention Mechanism, Transformers and Large Language Models (LLMs) 5. Ability to read and understand latest NLP-related research papers. 6. Ability to identify applicable NLP technique to solve a real-world problem involving text processing. 7. Ability to implement NLP models and algorithms for problems related to text processing. 8. Ability to develop applications based on textual generative models (LLMs - Large Language Models)

by Ashutosh Modi

Course ID: BSCS5002

Course Credits:

Course Type:

Pre-requisites: None

Course structure & Assessments

For details of standard course structure and assessments, visit Academics page.

Introduction to Natural Language (NL) Why is it hard to process natural language? Levels of Language Processing Linguistic Fundamentals for NLP
NLP Pipeline: Tokenization, lemmatization, normalization, POS, Parsing, etc. Sub-tokenization Text Prediction: Introduction, Framework, and its components Evaluation
Feed Forward Neural Networks for NLP, Regularization, Dropout Computational Graphs and Backpropagation Word Representation: Distributed Representations Language Models: n-gram and Neural Word2Vec, GloVe
CNNs for NLP Neural Sequence Models Contextualized Word Embeddings Attention Mechanism Assessment: Hands on assignment
Self-attention Mechanism Transformers Pretrained Language Models (PLMs): BERT, GPT, etc. Fine tuning and transfer learning
Large Language Models (LLMs) Parameter Efficient Fine Tuning: Prefix-coding, LORA, etc. Emergent Behavior: In-context learning, Instruction Tuning RLHF
Naïve Bayes Classifier Expectation Maximization Algorithm Logistic Regression Maximum Entropy Models
Classical Sequence Models: HMMs, MEMMS, CRF, RNN-CRF
Information Extraction Automatic Speech Recognition Machine Translation
Information Extraction Automatic Speech Recognition Machine Translation
Coreference Resolution Discourse Parsing
Distributional Semantics Logical Semantics: Representation and Semantic Parsing Predicate Argument Semantics: Semantic Role Labeling and Frame Semantics
+ Show all weeks

Prescribed Books

The following are the suggested books for the course:

Speech and Language Processing, Daniel Jurafsky, James H.Martin

Foundations of Statistical Natural Language Processing, CH Manning, H Schuetze

Introduction to Natural Language Processing, Jacob Eisenstein

Natural Language Understanding, James Allen

Deep Dive into Deep Learning, Aston Zhang, Zack C. Lipton, Mu Li, Alexander J. Smola

Neural Network Methods for Natural Language Processing, Yaov Goldeberg

About the Instructors

Ashutosh Modi
Assistant Professor, CS Department, IIT Kanpur