Skip to content

feat: add audio/video file type support for structured extraction#3

Merged
saksham-nexla merged 1 commit intomainfrom
devin/1773874530-nextract-audio-video-support
Mar 23, 2026
Merged

feat: add audio/video file type support for structured extraction#3
saksham-nexla merged 1 commit intomainfrom
devin/1773874530-nextract-audio-video-support

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

Summary

Adds audio and video file handling to nextract's extraction pipeline. Audio (.mp3, .wav, .m4a, .ogg, .flac, .aac, .wma) and video (.mp4, .webm, .mov, .avi, .mkv, .wmv) files are now recognized and attached as BinaryContent with their native MIME types, following the same pattern as image file handling.

Changes:

  • mimetypes_map.py: New _AUDIO_EXTS/_VIDEO_EXTS sets and is_audio()/is_video() helpers (mirrors existing _IMAGE_EXTS/is_image())
  • files.py: Audio/video branches in _prepare_single_file(), placed between PDF and office-binary handling
  • README.md: Replaces "Not supported: Audio/Video" with documented audio/video support in both the scope and file-type-handling sections

This is Phase 2 of the audio/video structured extraction plan. Companion PR in veda-ai adds these extensions to SUPPORTED_FILE_TYPES (Phase 1).

Review & Testing Checklist for Human

  • No unit tests added — the new is_audio()/is_video() functions and the audio/video branches in _prepare_single_file() are untested. Consider whether tests should be required before merge.
  • MIME type guessing for .m4a/.wma — these rely on mimetypes.guess_type() from stdlib (no custom entries in _CUSTOM). Verify guess_mime(Path("test.m4a")) returns audio/mp4 or similar on your target environment, not application/octet-stream.
  • Memory usage for large video filespath.read_bytes() loads the entire file into memory. Acceptable for the existing image pattern, but video files can be 100MB+. Verify this aligns with expected file size limits upstream.
  • Test plan: Create a small .mp3 and .mp4 file, call _prepare_single_file(Path("test.mp3")) and verify the returned PreparedPart has binary set with the correct media_type. Also verify an end-to-end extract() call with a model that supports audio/video input (e.g. Gemini) actually produces structured output.

Notes

  • Audio/video are pure binary passthrough — no transcription or frame extraction. This relies on the downstream LLM supporting native audio/video input tokens.
  • The extension sets match what was added to veda-ai's SUPPORTED_FILE_TYPES in the Phase 1 PR.

Link to Devin session: https://app.devin.ai/sessions/66c886f2998044ce8279af8a4c5d8a51
Requested by: @mihir-nexla

Add audio extensions (mp3, wav, m4a, ogg, flac, aac, wma) and video
extensions (mp4, webm, mov, avi, mkv, wmv) support:

- Add _AUDIO_EXTS and _VIDEO_EXTS sets to mimetypes_map.py
- Add is_audio() and is_video() helper functions
- Add audio/video handling in _prepare_single_file() as BinaryContent
  with native audio/*/video/* MIME types
- Update README.md to document audio/video support and remove from
  'not supported' section

Co-Authored-By: mihir.pamnani <mihir.pamnani@nexla.com>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@saksham-nexla saksham-nexla merged commit 970488d into main Mar 23, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants