⭐⭐⭐ Go here for more actual materials (2025-2026): https://github.com/severilov/DL-Audio-AIMasters-Course ⭐⭐⭐
Topics discussed in course:
- Digital Signal Processing
- Automatic Speech Recognition (ASR)
- Key-word spotting (KWS)
- Text-to-Speech (TTS)
- Voice Conversion
- Unsupervised learning in Audio
- Music Generation with NNs
| # | Date | Description | Slides | Video |
|---|---|---|---|---|
| 1 | September, 12 | Lecture 1: Introduction and Digital Signal Processing | slides | video |
| 2 | September, 19 | Seminar 1: Introduction, Spectrograms and Griffin-Lim | notebook | video |
| 3 | September, 30 | Lecture 2: Automatic Speech Recognition 1: WER, CTC, LAS, Beam Search | slides | video |
| 4 | October, 3 | Seminar 2: CTC, Beam Search | notebook | video |
| 5 | October, 10 | Lecture 3: Automatic Speech Recognition 2: RNN-T, Conformer, Whisper, Language models in ASR, BPE | slides | video |
| 6 | October, 17 | Lecture 4: Key-word spotting (KWS) | slides | video |
| 6 | October, 24 | Seminar 3: CTC, Beam Search | notebook | video |
| 8 | October, 31 | Lecture 5: Text-to-speech: Tacotron, FastSpeech, Guided Attention | slides | video |
| 9 | November, 7 | Seminar 4: Key-word spotting | notebook | video |
| 10 | November, 14 | Seminar 5: Text-to-speech: Tacotron2 | notebook | video |
| 11 | November, 21 | Lecture 6: Text-to-speech: Neural Vocoders (WaveNet, PWGAN, DiffWave) | slides | video |
| 12 | November, 28 | Lecture 7: Self-supervised learning in Audio | slides | video |
- 4 homeworks each of 2 points = 8 points
- final test = 2 points
- maximum points: 8 + 2 = 10 points
Author + Lectures: Pavel Severilov
- telegram: @severilov
- e-mail: pseverilov@gmail.com
Seminars: Viacheslav Shokorov
- telegram: @vshokorov
- e-mail: shokorov.va@phystech.edu
Help build course materials and held seminars Daniel Knyazev
- telegram: @Oorgien
- e-mail: xmaximuskn@gmail.com
