Initial Implementation of CC generation via HF models by CosmoWorker · Pull Request #18 · PlanetRead/Intelligent-cc-generation

CosmoWorker · 2026-05-09T18:22:01Z

Below showcases the approach in a mermaid diagram of what I added. From extraction using ffmpeg and using a audio classification model like ast-finetuned dataset model from hugging face. Utilised Opencv visual motion for current behaviour which can be improved.

Here is the sample demo implementation using the models mentioned.

mergedcc.webm

A deepface model for visual validation is the next step in this process for a better quality and it would describe the emotion of the person as well.
Speaker diarization can be looked over to segregate segments for later visual score when generating confidedence scores. More such details would be towards proposal

…t muxing

CosmoWorker added 2 commits May 9, 2026 23:26

added single pipeline flow for detection of events via hf models & sr…

87e571c

…t muxing

add dependency necessary for srt generation

12da902

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial Implementation of CC generation via HF models#18

Initial Implementation of CC generation via HF models#18
CosmoWorker wants to merge 2 commits into
PlanetRead:mainfrom
CosmoWorker:tmpcc-new

CosmoWorker commented May 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

CosmoWorker commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CosmoWorker commented May 9, 2026 •

edited

Loading