EmotionThinker

Official repository of the ICLR 2026 (Oral) paper "EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning". For details, please refer to our Paper.

Introduction

EmotionThinker is the first RL–enhanced SpeechLLM framework for interpretable speech emotion reasoning.

Unlike conventional speech emotion recognition (SER) systems that treat emotion as a flat classification problem, EmotionThinker reframes SER as a deep reasoning problem, enabling models to jointly produce accurate emotion labels and structured, human-aligned explanations.

EmotionThinker offers the following advantages:

Higher emotion recognition accuracy compared to existing SpeechLLMs;
Deep reasoning ability to integrate emotion-related cues for justification;
Fine-grained audio caption covering speaker traits, prosodic cues and semantic information.

News

[Feb. 12, 2026] We open-source the EmotionThinker model on Hugging Face.
[Feb. 12, 2026] We release the EmotionCoT dataset on Hugging Face.
[Feb. 05, 2026] 🎉 EmotionThinker is selected for Oral Presentation at ICLR 2026.
[Jan. 26, 2026] 🎉 EmotionThinker is accepted to ICLR 2026. See you in Brazil! 🇧🇷

Inference with EmotionThinker

Step 0: Prepare invironment

conda create -n emotionthinker python=3.10
conda activate voicebench
pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt

Step 1: Download EmotionThinker Model

Download the pretrained EmotionThinker model from Hugging Face. Set the local model path accordingly.

Step 2: Run Inference Code

python scripts/emotionthinker_infer.py

EmotionCoT

The EmotionCoT section provide structured prosody labeling and Chain-of-Thought (CoT) emotion reasoning annotations for speech emotion understanding, and related automatic labeling pipeline.

EmotionCoT Datasets

Pre-request: EmotionCoT does not redistribute audio files. Please download the original datasets from the official sources:

EmotionCoT Annotations: We provide prosody labeling and CoT-style emotion reasoning annotations for: IEMOCAP, MELD, Expresso, EARS, MSP-Podcast (partial). Please download the EmotionCoT dataset on Hugging Face

Automatic Labeling Pipeline (Comming Soon)

To facilitate large-scale labeling and data augmentation, we provide an automated prosody labeling pipeline for EmotionCoT.

Step 0: Prepare invironment

Note: If you have already prepared the environment during EmotionThinker inference stage, you may skip this step.

conda create -n emotionthinker python=3.10
conda activate voicebench
pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt

Step 1: Download Required Models

Before running the pipeline, download the required models (e.g., pitch-energy extractor, gender classifier, etc.) and configure paths in the script.

Step 2: Prepare Input JSONL

Your input file must follow this format:

{
  "audio_path": "path/to/audio.wav",
  "transcription": "text transcription",
  "emotion": "emotion_label"
}

Step 3: Extract Prosody Labeling

python EmotionCoT/pipeline/prosody_labeling.py \
    --input_path /path/to/input.jsonl \
    --output_path /path/to/prosody_labeling.jsonl

The script will automatically extract and label:

pitch level: low / normal / high
energy_level: low / normal / high
speed_level: slow / normal / fast
stressed_words: stressed words from transcription
intonation: rising / falling / rising-falling / falling-rising / flat / expressive
gender: Male / Female
age_level: Child / Teenager / Young Adult / Middle-aged / Elderly

The output will be saved as a JSONL file with enriched prosody annotations.

4. (Optional) EmotionCoT-style Reasoning Augmentation

If you are interested in augmenting your dataset with EmotionCoT-like reasoning format, you can use the provided api_call.py script.

Configure your OpenAI API token in the script, then run:

python EmotionCoT/pipeline/api_call.py \
    --input_path /path/to/prosody_labeling.jsonl \
    --output_path /path/to/emotioncot_augmented.jsonl

This will generate structured emotion reasoning chains aligned with the EmotionCoT format.

Citation

If you find this work useful in your research, please kindly cite:

@inproceedings{wang2026emotionthinker,
  title={EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning},
  author={Wang, Dingdong and Liu, Shujie and Zhang, Tianhua and Chen, Youjun and Li, Jinyu and Meng, Helen},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
EmotionCoT		EmotionCoT
assets		assets
scripts		scripts
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EmotionThinker

Introduction

News

Inference with EmotionThinker

EmotionCoT

EmotionCoT Datasets

Automatic Labeling Pipeline (Comming Soon)

4. (Optional) EmotionCoT-style Reasoning Augmentation

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

EmotionThinker

Introduction

News

Inference with EmotionThinker

EmotionCoT

EmotionCoT Datasets

Automatic Labeling Pipeline (Comming Soon)

4. (Optional) EmotionCoT-style Reasoning Augmentation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages