Skip to content

Initial Implementation of CC generation via HF models#18

Open
CosmoWorker wants to merge 2 commits into
PlanetRead:mainfrom
CosmoWorker:tmpcc-new
Open

Initial Implementation of CC generation via HF models#18
CosmoWorker wants to merge 2 commits into
PlanetRead:mainfrom
CosmoWorker:tmpcc-new

Conversation

@CosmoWorker
Copy link
Copy Markdown

@CosmoWorker CosmoWorker commented May 9, 2026

Below showcases the approach in a mermaid diagram of what I added. From extraction using ffmpeg and using a audio classification model like ast-finetuned dataset model from hugging face. Utilised Opencv visual motion for current behaviour which can be improved.
image
Here is the sample demo implementation using the models mentioned.

mergedcc.webm
  • A deepface model for visual validation is the next step in this process for a better quality and it would describe the emotion of the person as well.
  • Speaker diarization can be looked over to segregate segments for later visual score when generating confidedence scores. More such details would be towards proposal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant