Demo Videos: Watch demo videos on Google Drive
A modular, extensible ROS 2 framework for controlling humanoid robots with LLM-powered conversation, persona-aware behavior, and synchronized gesture-speech actions. Demonstrated using the Webots simulator with two humanoid robots: NAO and Robotis OP2.
- LLM-Powered Conversations β Natural dialogue driven by large language models with persona-specific behavior
- Multi-Persona Support β Easily switch between personalities (Angry Cab Driver, Polite Teacher, Polite Receptionist, etc.)
- Semantic Action Matching β Context-appropriate gestures selected via embedding-based intent matching
- Gesture-Speech Synchronization β Dynamic voice rate adjustment to sync actions with speech duration
- Anticipatory Action Module β Learns from failure to adapt action selection for short dialogues
- Highly Modular Design β Identical project structure across robots; only joint names and tuning differ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ROBOT BRAIN β
β β
β βββββββββββ βββββββββββ βββββββββββ ββββββββββββββββ β
β β VAD βββββΆβ STT βββββΆβ LLM βββββΆβ ActionSelect β β
β β (Silero)β β (Riva) β β (Llama) β β (Embeddings) β β
β βββββββββββ βββββββββββ βββββββββββ ββββββββ¬ββββββββ β
β β² β β β
β β βΌ βΌ β
β [Microphone] βββββββββββ ββββββββββββββ--β β
β β TTS β β Publish to β β
β β (Riva) β β/perform_actionβ β
β ββββββ¬βββββ ββββββββ¬ββββββ--- β
β β β β
β βΌ β β
β [Speaker] β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββ
β
ROS 2 Topic: /perform_actionβ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ROBOT DRIVER β
β β
β ββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
β β Action CallbackβββββΆβ Animation EngineβββββΆβ Webots Motors β β
β β (ROS Subscriber)β β (Sine Wave Math)β β (Joint Control) β β
β ββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
We demonstrate the framework with two humanoid robots. The modular design means the project structure is identical β only robot names, joint configurations, and action tuning differ.
| Robot | Description | Documentation |
|---|---|---|
| SoftBank NAO | Popular humanoid research robot | π NAO Controller README |
| Robotis OP2 | Open-source humanoid platform | π OP2 Controller README |
Comprehensive evaluation conducted over 10 runs, each with 20 conversation messages, testing 3 personas (Angry Cab Driver, Polite Teacher, Polite Receptionist).
| Metric | Score | Description |
|---|---|---|
| Persona Fidelity | 90% | Robot maintains persona-consistent language, tone, and behavior across interactions |
| Action Grounding & Synchronization | 83% Β± 2% | Correct action retrieval + well-timed gestureβspeech synchronization. Voice rate dynamically adjusted based on words-to-speak vs action duration |
| Emotional TTS Quality | 0.87 Β± 0.03 | High emotional expressiveness (Whisper Large + Magpie) |
| Interaction Latency | 3.2 Β± 0.3 sec | End-to-end STT β LLM β [TTS + Action] pipeline latency (15-20 words). Includes cloud API overhead |
| HRI User Study | 4.1 Β± 0.5 / 5 | MOS-equivalent rating from 20 participants for likeability, clarity, and perceived intelligence |
| Robustness & Reliability | 81% Β± 5% | Recovery from uncertainty/noise; avoidance of unsafe motions. Includes anticipatory module that learns from failure to adapt action selection for short dialogues |
The framework is designed for rapid adaptation and extension:
| Task | Effort | Details |
|---|---|---|
| Add New Persona | ~20 Β± 5 minutes | JSON-only configuration |
| Add New Action | ~30 Β± 3 LOC | Action-specific driver control implementation |
| Add New Robot | 150β200 LOC (~30-40 min) | Previously required 8-10 hours of extensive effort per robot |
- Wide Community Support β Extensive documentation, tutorials, and active development
- Robot Agnostic β Framework adapts to any humanoid robot with ROS 2 support
- Modular by Design β Nodes communicate via topics, enabling flexible system composition
- Simulation Ready β Seamless integration with Webots and other simulators
.
βββ ReadMe.md # This file
βββ my_nao_controller/ # NAO Robot Controller Package
β βββ config/
β β βββ config.py # API keys, paths, parameters
β βββ docs/
β β βββ README.md # Detailed NAO documentation
β βββ launch/
β β βββ robot_launch.py # ROS 2 launch file
β βββ my_nao_controller/
β β βββ nao_brain.py # AI node (STT β LLM β TTS β Action)
β β βββ nao_driver.py # Animation engine (Webots controller)
β β βββ nao_action_vocab.py # Action vocabulary definitions
β β βββ personas.py # Persona definitions
β β βββ generate_action_embeddings.py
β β βββ action_embeddings.pkl
β βββ resource/
β β βββ nao.urdf
β βββ worlds/
β β βββ nao_world.wbt
β βββ package.xml
β βββ setup.py
β βββ run.py
β
βββ op2_controller/ # Robotis OP2 Controller Package
βββ config/
β βββ config.py
βββ docs/
β βββ README.md # Detailed OP2 documentation
βββ launch/
β βββ robot_launch.py
βββ op2_controller/
β βββ op2_brain.py
β βββ op2_driver.py
β βββ op2_action_vocab.py
β βββ personas.py
β βββ generate_action_embeddings.py
β βββ action_embeddings.pkl
βββ resource/
β βββ op2.urdf
βββ worlds/
β βββ op2_world.wbt
βββ package.xml
βββ setup.py
βββ run.py
| Component | Version |
|---|---|
| OS | Ubuntu 22.04 LTS (Jammy Jellyfish) |
| ROS 2 | Humble Hawksbill |
| Simulator | Webots R2023b or newer |
| Python | 3.10+ |
-
Clone the repository into your ROS 2 workspace:
cd ~/ros2_ws/src git clone https://github.com/endeavorXx/ROS-Nao-Simulation-in-Webots.git
-
Follow robot-specific instructions:
-
Build the workspace:
cd ~/ros2_ws colcon build source install/setup.bash
For detailed setup, configuration, and usage instructions, refer to the robot-specific documentation:
This project is licensed under the MIT License.
Built with ROS 2 Humble, Webots, and various open-source AI/ML libraries including Silero VAD, NVIDIA Riva, and Sentence Transformers.