Skip to content

dingdongwang/EmotionThinker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EmotionThinker

Official repository of the ICLR 2026 (Oral) paper "EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning". For details, please refer to our Paper.

EmotionThinker Overview

Introduction

EmotionThinker is the first RL–enhanced SpeechLLM framework for interpretable speech emotion reasoning.

Unlike conventional speech emotion recognition (SER) systems that treat emotion as a flat classification problem, EmotionThinker reframes SER as a deep reasoning problem, enabling models to jointly produce accurate emotion labels and structured, human-aligned explanations.

EmotionThinker offers the following advantages:

  • Higher emotion recognition accuracy compared to existing SpeechLLMs;
  • Deep reasoning ability to integrate emotion-related cues for justification;
  • Fine-grained audio caption covering speaker traits, prosodic cues and semantic information.

News

  • [Feb. 12, 2026] We open-source the EmotionThinker model on Hugging Face.

  • [Feb. 12, 2026] We release the EmotionCoT dataset on Hugging Face.

  • [Feb. 05, 2026] 🎉 EmotionThinker is selected for Oral Presentation at ICLR 2026.

  • [Jan. 26, 2026] 🎉 EmotionThinker is accepted to ICLR 2026. See you in Brazil! 🇧🇷

Inference with EmotionThinker

Step 0: Prepare invironment

conda create -n emotionthinker python=3.10
conda activate voicebench
pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt

Step 1: Download EmotionThinker Model

Download the pretrained EmotionThinker model from Hugging Face. Set the local model path accordingly.

Step 2: Run Inference Code

python scripts/emotionthinker_infer.py

EmotionCoT

The EmotionCoT section provide structured prosody labeling and Chain-of-Thought (CoT) emotion reasoning annotations for speech emotion understanding, and related automatic labeling pipeline.

EmotionCoT Datasets

Pre-request: EmotionCoT does not redistribute audio files. Please download the original datasets from the official sources:

EmotionCoT Annotations: We provide prosody labeling and CoT-style emotion reasoning annotations for: IEMOCAP, MELD, Expresso, EARS, MSP-Podcast (partial). Please download the EmotionCoT dataset on Hugging Face

Automatic Labeling Pipeline (Comming Soon)

To facilitate large-scale labeling and data augmentation, we provide an automated prosody labeling pipeline for EmotionCoT.

Step 0: Prepare invironment

Note: If you have already prepared the environment during EmotionThinker inference stage, you may skip this step.

conda create -n emotionthinker python=3.10
conda activate voicebench
pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt

Step 1: Download Required Models

Before running the pipeline, download the required models (e.g., pitch-energy extractor, gender classifier, etc.) and configure paths in the script.

Step 2: Prepare Input JSONL

Your input file must follow this format:

{
  "audio_path": "path/to/audio.wav",
  "transcription": "text transcription",
  "emotion": "emotion_label"
}

Step 3: Extract Prosody Labeling

python EmotionCoT/pipeline/prosody_labeling.py \
    --input_path /path/to/input.jsonl \
    --output_path /path/to/prosody_labeling.jsonl

The script will automatically extract and label:

  • pitch level: low / normal / high
  • energy_level: low / normal / high
  • speed_level: slow / normal / fast
  • stressed_words: stressed words from transcription
  • intonation: rising / falling / rising-falling / falling-rising / flat / expressive
  • gender: Male / Female
  • age_level: Child / Teenager / Young Adult / Middle-aged / Elderly

The output will be saved as a JSONL file with enriched prosody annotations.

4. (Optional) EmotionCoT-style Reasoning Augmentation

If you are interested in augmenting your dataset with EmotionCoT-like reasoning format, you can use the provided api_call.py script.

Configure your OpenAI API token in the script, then run:

python EmotionCoT/pipeline/api_call.py \
    --input_path /path/to/prosody_labeling.jsonl \
    --output_path /path/to/emotioncot_augmented.jsonl

This will generate structured emotion reasoning chains aligned with the EmotionCoT format.

Citation

If you find this work useful in your research, please kindly cite:

@inproceedings{wang2026emotionthinker,
  title={EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning},
  author={Wang, Dingdong and Liu, Shujie and Zhang, Tianhua and Chen, Youjun and Li, Jinyu and Meng, Helen},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2026}
}

About

ICLR 2026 (Oral) | EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages