Speed up inference by overlapping I/O with GPU work by kyroX-D · Pull Request #226 · nikopueringer/CorridorKey

kyroX-D · 2026-04-14T04:07:18Z

What this does

Right now the inference loop processes frames one at a time read from disk, run GPU, write to disk, repeat. The GPU just sits there waiting whenever we're doing file I/O, which really adds up on longer sequences especially with 4K EXR files.

This PR fixes that by:

Reading frames ahead in a background thread (3 frame buffer) so the next frame is ready when the GPU needs it
Writing output files in a separate thread so the GPU doesn't have to wait for disk writes
Removing a torch.cuda.empty_cache() call that was running on every single frame, forcing the CUDA memory allocator to constantly free and re-allocate memory for no reason (the actual cleanup already happens when switching between models)
Tightening up some overly broad exception catches that could accidentally hide real errors

Files changed

CorridorKeyModule/inference_engine.py
backend/service.py

Testing

All 319 unit tests pass
All e2e workflow tests pass
The 9 failing tests are a pre-existing OpenCV EXR issue, not related to this change
Still need to benchmark with a real clip once the model checkpoint is publicly available

Introduce a prefetch thread that reads frames ahead of GPU processing and a background write pool for disk output. This prevents the GPU from idling during disk I/O, significantly improving throughput on 4K EXR sequences.

The CUDA allocator was forced to release and re-acquire memory blocks on every single frame, causing unnecessary overhead and fragmentation. Cache clearing is already handled at model-switch boundaries in service.py._ensure_model().

kyroX-D added 3 commits April 14, 2026 05:45

Add I/O-GPU pipeline overlap to run_inference

5ec4e6a

Introduce a prefetch thread that reads frames ahead of GPU processing and a background write pool for disk output. This prevents the GPU from idling during disk I/O, significantly improving throughput on 4K EXR sequences.

Restrict exception handling to specific errors

196f833

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up inference by overlapping I/O with GPU work#226

Speed up inference by overlapping I/O with GPU work#226
kyroX-D wants to merge 3 commits intonikopueringer:mainfrom
kyroX-D:main

kyroX-D commented Apr 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kyroX-D commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this does

Files changed

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kyroX-D commented Apr 14, 2026 •

edited

Loading