Sang-Hoon Lee sh-lee-prml

Sang-Hoon Lee

I am an Assistant Professor in the Department of Software and Computer Engineering at Ajou University, starting in March 2024, where I lead SAIL, Speech AI Lab.. Prior to this, I worked as a postdoctoral researcher in AI Research Center, Korea University, Seoul, South Korea. I received the Ph.D. degree in the Department of Brain and Cognitive Engineering, Korea University in 2023. In March 2016, I started my integrated M.S.&Ph.D. in Pattern Recognition & Machine Learning (PRML) Lab at the Korea University in Seoul, Korea, under the supervision of Seong-Whan Lee.

E-mail: sanghoonlee@ajou.ac.kr, sh_lee@korea.ac.kr
Google Scholar: Link
Speech AI Lab. (SAIL): Link
PRML Speech Team (Supervisor: Seong-Whan Lee): Link

👀 Research Interests

Speech Synthesis (2019-, HierSpeech++, DDDM-VC, Diff-HierVC)
Neural Vocoder (2021-, PeriodWave, PeriodWave-Turbo, Fre-GAN, Fre-GAN2)
Neural Audio Codec (2024-)
Singing Voice Synthesis (2022-, MIDI-Voice, HiddenSinger)
Speech-to-Speech Translation (2023-, TranSentence)
Brain-Computer Interface (2019-2020, Brain-to-Speech System)
Reinforcement Learning (2017-2018, AI Curling Robot Curly)

🎉 Publications

Arxiv

Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization, S.-H. Lee, H.-Y. Choi, and S.-W. Lee, 2024. (Under Review) [Code]

2025

StreamFlow: Streaming Audio Generation from Discrete Tokens via Streaming Flow Matching, H.-Y. Choi, S.-H. Lee, NeurIPS, 2025.
CoreaSpeech: Korean Speech Corpus via JAMO-based Coreset Selection for Efficient and Robust Korean Speech Generation, K.-J. Kwon, J.-H. So, and S.-H. Lee, NeurIPS Datasets and Benchmarks, 2025.
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation by Hierarchical Variational Inference for Zero-shot Speech Synthesis, S.-H. Lee, H.-Y. Choi, S.-B. Kim, and S.-W. Lee, IEEE Trans. on Neural Networks and Learning Systems, 2025. [Demo] [Code] [Gradio]
Parameter-Efficient Fine-Tuning for Low-Resource Text-to-Speech via Cross-Lingual Continual Learning, K.-J. Kwon, J.-H. So, and S.-H. Lee, Interspeech, 2025. (Oral) [Demo]
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation, S.-H. Lee, H.-Y. Choi, and S.-W. Lee, ICLR, 2025. [Code]
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations Without Text Alignment, H.-S. Oh, S.-H. Lee, D.-H. Cho, and S.-W. Lee, IEEE Trans. on Affective Computing, 2025. [Demo] [Code]
Personalized and Controllable Voice Style Transfer with Speech Diffusion Transformer, H.-Y. Choi, S.-H. Lee, and S.-W. Lee, IEEE Trans. on Audio, Speech and Language Processing, 2025
HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models, J.-S. Hwang, S.-H. Lee, and S.-W. Lee, Neural Networks, 2025 [Demo]

[-2024]

### 2024 - [DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training](https://arxiv.org/abs/2307.16549), H.-S. Oh, **S.-H. Lee**, and S.-W. Lee, **IEEE Trans. on Audio, Speech and Language Processing**, 2024 [[Demo]](https://prml-lab-speech-team.github.io/demo/DiffProsody/) [[Code]](https://github.com/hsoh0306/DiffProsody) - [Cross-lingual Text-to-Speech via Hierarchical Style Transfer](https://sites.google.com/view/limmits24/home?authuser=0), **S.-H. Lee**, H.-Y. Choi, and S.-W. Lee, **ICASSPW**, 2024. - [Audio Super-resolution with Robust Speech Representation Learning of Masked Autoencoder](https://ieeexplore.ieee.org/document/10381805), S.-B. Kim, **S.-H. Lee**, H.-Y. Choi, S.-W. Lee, **IEEE Trans. on Audio, Speech and Language Processing**, 2024. - [TranSentence: Speech-to-Speech Translation via Language-agnostic Sentence-level Speech Encoding without Language-parallel Data](https://ieeexplore.ieee.org/abstract/document/10447331), S.-B. Kim, **S.-H. Lee**, and S.-W. Lee, **ICASSP**, 2024. - [MIDI-Voice: Expressive Zero-shot Singing Voice Synthesis via MIDI-driven Priors](https://ieeexplore.ieee.org/abstract/document/10447981/), D.-M. Byun, **S.-H. Lee**, J.-S. Hwang, and S.-W. Lee, **ICASSP**, 2024. - [DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion](https://arxiv.org/abs/2305.15816), H.-Y. Choi*, **S.-H. Lee***, and S.-W. Lee, **AAAI**, 2024. [[Demo]](https://hayeong0.github.io/DDDM-VC-demo/) [[Code]](https://github.com/hayeong0/DDDM-VC) [[Poster]](https://github.com/sh-lee-prml/sh-lee-prml/blob/main/DDDM-VC_poster.pdf)

2023

HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer, S.-H. Lee*, H.-Y. Choi*, H.-S. Oh, and S.-W. Lee, Interspeech, 2023. (Oral) [Arxiv] [Demo]
Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation, H.-Y. Choi, S.-H. Lee, and S.-W. Lee, Interspeech, 2023. (Oral) [Demo] [Code]
PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling, J.-S. Hwang, S.-H. Lee, and S.-W. Lee, ACPR, 2023. [Demo]

2022

HierSpeech: Bridging the Gap between Text and Speech by Hierarchical Variational Inference using Self-supervised Representations for Speech Synthesis, S.-H. Lee, S.-B. Kim, J.-H. Lee, E. Song, M.-J. Hwang, and S.-W. Lee, NeurIPS, 2022. [OpenReview] [Demo] [Poster]
Duration Controllable Voice Conversion via Phoneme-based Information Bottleneck, S.-H. Lee, H.-R. Noh, W. Nam, and S.-W. Lee, IEEE Trans. on Audio, Speech and Language Processing, 2022. (2022-JCR-IF: 5.4, JIF PERCENTILE TOP 8.10%)
StyleVC: Non-Parallel Voice Conversion with Adversarial Style Generalization, I. Hwang, S.-H. Lee, and S.-W. Lee, ICPR, 2022. [Demo] [Code]
Fre-GAN 2: Fast and Efficient Frequency-consistent Audio Synthesis, S.-H. Lee, J.-H. Kim, G.-E. Lee, and S.-W. Lee, ICASSP, 2022. [Demo] [Code]
PVAE-TTS: Progressively Style Adaptive Text-to-Speech via Progressive Variaional Autoencoder, J.-H. Lee, S.-H. Lee, J.-H. Kim, and S.-W. Lee, ICASSP, 2022. [Demo]
EmoQ-TTS: Emotion Intensity Quantization for Fine-Grained Controllable Emotional Text-to-Speech, C.-B. Im, S.-H. Lee, and S.-W. Lee, ICASSP, 2022. [Demo]

2021

VoiceMixer: Adversarial Voice Style Mixup, S.-H. Lee, J.-H. Kim, H. Chung, and S.-W. Lee, NeurIPS, 2021. [Demo]
Multi-SpectroGAN: High-Diversity and High-Fidelity Spectrogram Generation with Adversarial Style Combination for Speech Synthesis, S.-H. Lee, H.-W. Yoon, H.-R. Noh, J.-H. Kim, and S.-W. Lee, AAAI, 2021. [Demo]
GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints, J.-H. Kim, S.-H. Lee, J.-H. Lee, H.-G. Jung, and S.-W. Lee, SMC, 2021.
Fre-GAN: Adversarial Frequency-consistent Audio Synthesis, J.-H. Kim, S.-H. Lee, J.-H. Lee, and S.-W. Lee, Interspeech, 2021.
Reinforce-Aligner: Reinforcement Alignment Search for Robust End-to-End Text-to-Speech, H. Chung, S.-H. Lee, and S.-W. Lee, Interspeech, 2021.

-2020

Audio Dequantization for High Fidelity Audio Generation in Flow-based Neural Vocoder, H.-W. Yoon, S.-H. Lee, H.-R. Noh, and S.-W. Lee, Interspeech, 2020.
Learning Machines Can Curl - Adaptive deep reinforcement learning enables the robot Curly to win against human players in an icy world, D.-O. Won, S.-H. Lee, K.-R. Muller, and S.-W. Lee, NeurIPS 2019 Demonstration Track, 2019. [Video] [Poster]

✨ Educations

2016.03-2023.02: Integrated M.S.&Ph.D, Dept. of Brain and Cognitive Engineering, Korea University

2012.03-2016.02: B.S, Dept. of Life Science, Dongguk University

🎁 Awards and Services

AC: NeurIPS

Reviewer: NeurIPS, ICLR, ICML, AAAI, ICASSP, Interspeech, ACL ARR, IEEE/ACM Transactions on Audio, Speech, and, Language Processing

2022.02.25: Paper Award (Multi-SpectroGAN: High-Diversity and High-Fidelity Spectrogram Generation with Adversarial Style Combination for Speech Synthesis), Korea University

🎙Invited Talks

[Read More]

2024.06.25: Fake Audio Detection, Ajou University.

2024.06.07: Speech Synthesis, 제2회AI융합워크숍, Ajou University.

2024.05.24: Speech Language Model for Generative AI, KSCS2024

2023.08.18: Towards Unified Speech Synthesis for Text-to-Speech and Voice Conversion, Deepbrain AI

2023.08.11: Towards Unified Speech Synthesis for Text-to-Speech and Voice Conversion, Workshop on Brain and Artificial Intelligence 2023

2023.06.20: HierSpeech: Bridging the Gap between Text and Speech by Hierarchical Variational Inference using Self-supervised Representations for Speech Synthesis, Top Conference Session in KCC2023

2022.08.19: VoiceMixer: Adversarial Voice Style Mixup, AIGS Symposium 2022

2022.07.01: VoiceMixer: Adversarial Voice Style Mixup, Top Conference Session in KCC2022

2021.12.02: Voice Conversion, Netmarble

2021.07.29: Speech Synthesis and Voice Conversion, Neosapience

Provide feedback

Saved searches

Use saved searches to filter your results more quickly