Skip to content

severilov/DL-Audio-AIMasters-Course

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

126 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

logo

Deep Learning for Audio Course, 2026

Description

Topics discussed in course:

  • Digital Signal Processing
  • Automatic Speech Recognition (ASR)
  • Key-word spotting (KWS)
  • Text-to-Speech (TTS)
  • Voice Conversion
  • Self supervised learning in Audio
  • Codec models
  • LLM-based Audio Generation
  • Music & Audio Generation
  • Speaker verification

Course materials

Materials

# Date Description Slides
1 Lecture: Introduction and Digital Signal Processing slides
Seminar: Introduction and Spectrograms, Griffin-Lim Algorithm notebook
2 Lecture: Automatic Speech Recognition 1: introduction, WER, Datasets, CTC, LAS slides
Seminar: WER, Levenstein distance, CTC notebook
3 Lecture: Automatic Speech Recognition 2: RNN-T, Language models in ASR, BPE, Whisper slides
Seminar: Automatic Speech Recognition 2: RNN-T, Whisper notebook
4 Lecture: Key-word spotting (KWS) slides
Seminar: Key-word spotting notebook
5 Lecture: Text-to-speech 1: WaveNet, Tacotron, FastSpeech, Guided Attention slides
Seminar: Text-to-speech: Tacotron2 notebook
6 Lecture: Text-to-speech 2: Neural Vocoders (PWGAN, DiffWave, Glow-TTS, Hi-Fi GAN, VITS) slides
Seminar: Wavenet notebook
7 Lecture: Voice Conversion: CycleGAN-VC, StarGAN-VC, AutoVC, Singing Voice Conversion slides
Seminar: VAE Wavenet Vocoder, Normalizing Flow notebook
8 Lecture: Self-supervised learning in Audio (wav2vec2, GigaAM, HuBERT, BEST-RQ) slides
Seminar: HIFI-GAN notebook
9 Lecture: Speaker verification and identification slides
Seminar: Speaker verification, Angular Softmax, Margin Softmax notebook
10 Lecture: Codec Models (RVQ, SoundStream, Encodec, Mimi), VQ-VAE, VALL-E slides
Seminar: Encodec, Soundstream, Residual Vector Quantization notebook
11 Lecture: LLM-based audio models: SEED-ASR, Llama3, Phi4, SpeechGPT, Mini-Omni, Llama-Omni, Moshi slides
Seminar: VITS, Normalizing flows notebook
12 Lecture: Audio & Music Generation: Jukebox, Diffusion models (Diffsound, Riffusion, Noise2Music), AudioLM & MusicLM, AudioGen & MusicGen, MeLoDy, YuE, Music Agents slides
Seminar:
TBD Lecture: FishSpeech, XTTS, SpearTTS, MQTTS

Homeworks

Homework Date Deadline Description Link
1 (2 points) February, 4 February, 18 (23:59)
  1. Audio classification
  2. Audio preprocessing
Open In Github
2 (2 points) February, 17 March, 4 (23:59) ASR-1: CTC Open In Github
3 (2 points) March, 6 March, 22 (23:59) ASR-2: RNN-T Open In Github
4 (2 points) Speaker Verification Open In Github
5 (2 points) Text-to-speech: FastPitch Open In Github

Game rules

  • 5 homeworks each of 2 points = 10 points
  • final test = 1 point
  • Bonus points in HWs
  • maximum points: 10 (hws) + 1 (test) + (bonus points in hws) = 11 points + bonus points

Authors

Pavel Severilov

  • telegram: @severilov
  • e-mail: pseverilov@gmail.com
  • BIO:
    • Education: MIPT
    • Experience: AI-assistants (NLP, ASR, OCR), signals (Samokat+Kuper, Domclick, Dbrain, Gazpromneft, MIL-team)
    • Lecturer: AI Masters, MIPT, ex-Deep Learning School

Daniel Knyazev

Roman Vlasov

About

Deep Learning Audio Course – AI Masters

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors