A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
-
Updated
Oct 6, 2024 - Python
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Audio Large Language Models
[NIPS2023] Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
Code for ICLR 2024 Paper: CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
Tamil learning and writing app for children with audio support, focused on simple input and clear pronunciation.
Configure audio settings on Windows using this graphical interface for the Equalizer APO system-wide parametric equalizer.
Provide Whisper-based audio transcription and translation with lightweight C++ libraries for easy integration into LLM projects.
Add a description, image, and links to the audio-language topic page so that developers can more easily learn about it.
To associate your repository with the audio-language topic, visit your repo's landing page and select "manage topics."