Applications Open now for May 2024 Batch | Applications Close: May 26, 2024 | Exam: Jul 07, 2024

Applications Open now for May 2024 Batch | Applications Close: May 26, 2024 | Exam: Jul 07, 2024

Degree Level Course

Introduction to Natural Language Processing (i-NLP)

Natural language (NL) refers to the language spoken/written by humans. NL is the primary mode of communication for humans. With the growth of the world wide web, data in the form of text has grown exponentially. It calls for the development of algorithms and techniques for processing natural language for the automation and development of intelligent machines: Natural Language Processing (NLP). On the completing the course, the participant will learn the following: 1. Why is processing language computationally hard and why specialized techniques need to be developed to process texts? 2. Knowledge and in-depth understanding of linguistics techniques and classical (statistical) approaches (pre-deep learning era) to NLP and their limitations. 3. Knowledge and in-depth understanding of deep learning approaches (RNN and CNN) to NLP. 4. Knowledge and in-depth understanding of Attention Mechanism, Transformers and Large Language Models (LLMs) 5. Ability to read and understand latest NLP-related research papers. 6. Ability to identify applicable NLP technique to solve a real-world problem involving text processing. 7. Ability to implement NLP models and algorithms for problems related to text processing. 8. Ability to develop applications based on textual generative models (LLMs - Large Language Models)

by Ashutosh Modi

Course ID: BSCS5002

Course Credits:

Course Type:

Pre-requisites: None

Course structure & Assessments

For details of standard course structure and assessments, visit Academics page.

WEEK 1 Introduction to Natural Language (NL) Why is it hard to process a natural language? Levels of Language Processing, Linguistic Fundamentals for NLP
WEEK 2 Text Processing and Preprocessing: Tokenization, Normalization, Stop word removal, Stemming, lemmatization, Morphological Analysis & Finite State Transducers, Part-of-speech tagging and Named entities
WEEK 3 Classical Sequence Models: HMM and CRF
WEEK 4 Syntax and Parsing: Constituency parsing, Dependency parsing, Parsing algorithms
WEEK 5 Meaning Representation: Distributional Semantics, Logical Semantics, Semantic Role Labelling
WEEK 6 -Language Models: n-gram and Word2Vec, GloVe -Discourse Processing: Anaphora and Coreference Resolution and Discourse Connectives. Machine Translation
WEEK 7 Naïve Bayes Classifier Expectation Maximization Algorithm Logistic Regression Maximum Entropy Models
WEEK 8 Recurrent neural networks, LSTMs/GRUs, Neural Sequence Models, Contextualized Word Embeddings: TagLM, ELMO, ULMFIT, etc., Attention Mechanism (Code demo: LSTM/GRU)
WEEK 9 Transformers, Self-attention Mechanism, Sub-word tokenization, Positional encoding, Pre-trained Language Models (PLMs): BERT, GPT, etc. Fine-tuning and transfer learning (Code demo: Transformers)
WEEK 10 Information Extraction Automatic Speech Recognition Machine Translation
WEEK 11 Coreference Resolution Discourse Parsing
WEEK 12 Distributional Semantics Logical Semantics: Representation and Semantic Parsing Predicate Argument Semantics: Semantic Role Labeling and Frame Semantics
+ Show all weeks

Prescribed Books

The following are the suggested books for the course:

Speech and Language Processing, Daniel Jurafsky, James H.Martin

Foundations of Statistical Natural Language Processing, CH Manning, H Schuetze

Introduction to Natural Language Processing, Jacob Eisenstein

Natural Language Understanding, James Allen

Deep Dive into Deep Learning, Aston Zhang, Zack C. Lipton, Mu Li, Alexander J. Smola

Neural Network Methods for Natural Language Processing, Yaov Goldeberg

About the Instructors

Ashutosh Modi
Assistant Professor, CS Department, IIT Kanpur