Good starting point for this might be this model [hibiki](https://github.com/kyutai-labs/hibiki)
Good starting point for this might be this model hibiki