Summary
I have adapted FireRedLID for vLLM inference and submitted a PR to the vLLM project:
vllm-project/vllm#39290
The converted model weights are available on Hugging Face:
https://huggingface.co/PatchyTisa/FireRedLID-vllm
Architecture
FireRedLID in vLLM follows the Whisper-style encoder-decoder pattern:
- Encoder: ConformerEncoder (shared architecture with FireRedASR2)
- Decoder: TransformerDecoder (6-layer cross-attention)
- Vocabulary: 120 LID tokens (
dict.txt)
- Output: Up to 2 tokens per utterance (e.g.
"en", "zh mandarin")
Usage
Server:
vllm serve PatchyTisa/FireRedLID-vllm -tp=1 --dtype=float32
Client:
python examples/online_serving/openai_lid_client.py \
--audio_paths audio_en.wav audio_zh.wav audio_fr.wav
Summary
I have adapted FireRedLID for vLLM inference and submitted a PR to the vLLM project:
vllm-project/vllm#39290
The converted model weights are available on Hugging Face:
https://huggingface.co/PatchyTisa/FireRedLID-vllm
Architecture
FireRedLID in vLLM follows the Whisper-style encoder-decoder pattern:
dict.txt)"en","zh mandarin")Usage
Server:
Client:
python examples/online_serving/openai_lid_client.py \ --audio_paths audio_en.wav audio_zh.wav audio_fr.wav