A minimal, JSON-recipe-first data processing pipeline inspired by NVIDIA NeMo SDP.
MiniDP provides a lightweight, modality-agnostic spine for building data transformation pipelines. It is designed to be easily authored and edited by both humans and LLMs.
- Deterministic execution engine
- JSON recipe format (tool-calling friendly, human editable)
- Streaming JSONL manifest processing
- Composable processor API with drop/modify/expand semantics
- Optional multiprocessing support
- Zero external dependencies (stdlib only)
pip install -e .# Run a pipeline
minidp run examples/demo_recipe.json
# Preview output
minidp preview examples/demo_recipe.json -n 5
# List available processors
minidp list-processors- Configuration Guide - Recipe format and options
- Processors Guide - Built-in processors and creating custom ones
- CLI Reference - Command-line interface
MIT