Application for the next batch of the Degree Program is open now.

Application for the next batch of the Degree Program is open now.

Degree Level Course

Speech Technology

by Prof. S. Umesh , Hema A Murthy

Course ID: BSCCS3051

Course Credits: TBD

Course Type: Elective

Prerequisites: TBD

What you’ll learn

To understand the concepts of speech and speech technologies, and to apply them to real-world scenarios
To gain hands-on experience of the relevant toolkits used for speech processing

Course structure & Assessments

12 weeks of coursework, weekly online assignments, 3 in-person invigilated quizzes, 1 in-person invigilated end term exam. For details of standard course structure and assessments, visit Academics page.

WEEK 1 Review of Signals and Systems, Continuous time signals and transforms
WEEK 2 Discrete time signals, Discrete Fourier transform, Autocorrelation and Cross-Correlation
WEEK 3 Acoustic Feature Analysis of Speech Signals I
WEEK 4 Acoustic Feature Analysis of Speech Signals II
WEEK 5 Gaussian mixture models (GMM), universal background model (UBM-GMM), singular value decomposition (SVD)
WEEK 6 Hidden Markov model (HMM), Examples of HMM based approach for ASR, TTS, speaker diarization, Information bottleneck (IB) based clustering for diarization
WEEK 7 Introduction and History of ASR and TTS, Components of ASR: Acoustic Modelling, Punctuation Model (Lexicon) and language modelling (N-Gram Language models) HMMs for Acoustic Modelling - Monophone, Triphone
WEEK 8 Speech Synthesis: unit selection, statistical parametric synthesis (HTS)
WEEK 9 Neural networks for building speech technologies, NN for Acoustic Modelling - Hybrid modelling- Hybrid-NN: DNN,CNN,TDNN, -- simple examples -- Speaker recognition -- multilayer perceptron for phone recognition
WEEK 10 End-to-End Approaches I: CTC, Encoder-decoder Architecture E2E with RNN; Applications to ASR and TTS
WEEK 11 End-to-End Approaches II: Encoder-decoder Architecture E2E with transformers for ASR and TTS
WEEK 12 Interesting Problems: Speaker recognition/verification: with ivector, xvector Speaker diarization: using x-vector Speaker adaptation: (revisit i, x vectors) and introduce s-vectors. Code Switched Speech recognition; Speech Translation; Singing voice synthesis; voice conversion; generic voice synthesis
+ Show all 12 weeks

Prescribed Books

The following are the suggested books for the course:

L R Rabiner and R W Schafer, "Theory and Application of Digital Speech Processing", PH, Pearson, 2011.

L R Rabiner, B-H Juang and B Yegnanarayana, "Fundamentals of Speech Recognition", Pearson, 2009 (Indian subcontinent adaptation).

Xuedong Huang, Alex Acero, Hsiao-wuen Hon, "Spoken Language Processing: A guide to Theory, Algorithm, and System Development", Prentice Hall PTR, 2001.


Thomas Quatieri, "Discrete-time Speech Processing: Principles and Practice", PH, 2001.

Rabiner and Schafer, "Digital Processing of Speech Signals", Pearson Education, 1993.

Recent research papers

About the Instructors

Prof. S. Umesh
Professor, Department of Electrical Engineering, Indian Institute of Technology, IIT Madras

S. Umesh is a  Professor of Electrical Engineering at IIT-Madras. He completed his PhD from the University of Rhode Island,USA and his PostDoctoral Fellowship from the City University of New York. He has also been a visiting researcher at AT&T Research Laboratories, USA; at Machine Intelligence Laboratory Cambridge University Engineering Department, UK and the Department of Computer Science, RWTH-Aachen, Germany.

...  more

He is a recipient of the AICTE Career Award for Young Teachers in 1997 and the Alexander von Humboldt Research Fellowship in 2004.  During his stint at Cambridge University in 2004, he was part of the U.S. DARPA's Effective, Affordable Reusable Speech-to-text (EARS) programme. Similarly in 2005 he was part of the RWTH-Aachen's TC-STAR project for transcription of speech from European Parliament's Plenary Sessions. Between 2010-2016, he led a multi-institution consortium to develop ASR systems in Indian languages in the agriculture domain which was funded by MeiTY. He is currently leading the ASR efforts for the Natural Language Translation Mission managed by the Office of Principal Scientific Adviser of Govt. of India.


Hema A Murthy
Professor, Department of Computer Science and Engineering, IIT Madras

Faculty at the Department of Computer Science and Engineering, Indian Institute of Technology Madras.