Skip to content

AgiBot-World/VideoDataset

Repository files navigation

VideoDataset

A GPU-accelerated library that enables random frame access and efficient video decoding for data loading.

Documentation License SS Badge

CI CD CommitLint Renovate Semantic Release Coverage

Release PyPI PyPI - Python Version GitHub

pre-commit Checked with mypy Ruff Conventional Commits Copier Serious Scaffold Python

Warning

VideoDataset is in the Alpha phase. Frequent changes and instability should be anticipated. Any feedback, comments, suggestions and contributions are welcome!

Overview

VideoDataset is a high-performance video decoding multi-framework supporting library. It aims to provide framework-integrated solutions for working with video decoding tasks.

Key Features:

  • GPU-accelerated video decoding using NvCodec library
  • Support for common video formats (H.264, H.265, etc.)
  • Easy integration with multi-frameworks and multi-formats.

Installation

Prerequisites

  • NVIDIA GPU with CUDA support and CUDA Toolkit 12.0+ installed
  • Python 3.10 or later

Install from PyPI

pip install agibot-videodataset

Building from Source

pip install git+https://github.com/AgiBot-World/VideoDataset.git

Quick Start

The complete example can be found in the quickstart documentation.

Documentation

Please refer to full documentation here.

Also, a sphinx-based documentation can be generated by running the following command:

make dev-doc doc

It will generate the documentation in the docs/_build/html directory and serve it on http://localhost:8000.

Performance

VideoDataset is optimized for high-throughput video processing. Benchmark results show:

  • GPU Decoding: A decoding throughput of 20,000 FPS is achieved in a multiprocessing scenario.
  • Random Access: Minimal overhead for non-sequential frame access.
  • GPU Decoder Utilization: Over 90% GPU decoder utilization is achieved in a multiprocessing scenario.

See the benchmark documentation for detailed performance analysis.

Comparison with other CPU decoding solutions

In addition​, we conducted a comprehensive benchmark comparing it against mainstream CPU software decoding solutions, including OpenCV, Torchvision (PyAV), Torchvision (VideoReader), and TorchCodec (CPU).The results demonstrate that VideoDataset achieves a 3 to 4 times improvement in decoding throughput.

CPU Throughput

Furthermore, it also demonstrates outstanding performance in reducing CPU utilization.

CPU Utilization

Development Status

  • GPU acceleration via NvCodec
  • Random frame access
  • PyTorch integration
  • Compatibility with LeRobot
  • Asynchronous pipeline optimization

License

MIT License, for more details, see the LICENSE file.

About

A GPU-accelerated library that enables random frame access and efficient video decoding for data loading.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 5