Skip to content

Speed up inference by overlapping I/O with GPU work#226

Open
kyroX-D wants to merge 3 commits intonikopueringer:mainfrom
kyroX-D:main
Open

Speed up inference by overlapping I/O with GPU work#226
kyroX-D wants to merge 3 commits intonikopueringer:mainfrom
kyroX-D:main

Conversation

@kyroX-D
Copy link
Copy Markdown

@kyroX-D kyroX-D commented Apr 14, 2026

What this does

Right now the inference loop processes frames one at a time read from disk, run GPU, write to disk, repeat. The GPU just sits there waiting whenever we're doing file I/O, which really adds up on longer sequences especially with 4K EXR files.

This PR fixes that by:

  • Reading frames ahead in a background thread (3 frame buffer) so the next frame is ready when the GPU needs it
  • Writing output files in a separate thread so the GPU doesn't have to wait for disk writes
  • Removing a torch.cuda.empty_cache() call that was running on every single frame, forcing the CUDA memory allocator to constantly free and re-allocate memory for no reason (the actual cleanup already happens when switching between models)
  • Tightening up some overly broad exception catches that could accidentally hide real errors

Files changed

  • CorridorKeyModule/inference_engine.py
  • backend/service.py

Testing

  • All 319 unit tests pass
  • All e2e workflow tests pass
  • The 9 failing tests are a pre-existing OpenCV EXR issue, not related to this change
  • Still need to benchmark with a real clip once the model checkpoint is publicly available

kyroX-D added 3 commits April 14, 2026 05:45
Introduce a prefetch thread that reads frames ahead of GPU processing and a background write pool for disk output. This prevents the GPU from idling during disk I/O, significantly improving throughput on 4K EXR sequences.
The CUDA allocator was forced to release and re-acquire memory blocks on every single frame, causing unnecessary overhead and fragmentation. Cache clearing is already handled at model-switch boundaries in service.py._ensure_model().
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant