Lightweight profiling decorator for ML & data pipelines Track execution time, real memory usage, and inspect data shapes — with zero boilerplate.
-
Execution time tracking
-
Python memory peak (via
tracemalloc) -
Real RSS peak (process-level, includes NumPy / Torch)
-
Smart argument logging
-
numpy→ shape + dtype -
pandas→ shape + columns -
polars→ shape + estimated size -
LazyFrame→ execution plan -
Async support
-
Production-ready (modular, low overhead)
-
Safe logging (truncated, no crashes on repr)
pip install ml-sentryimport logging
from ml_sentry import track
logging.basicConfig(level=logging.INFO)
@track(log_args=True)
def foo(n: int):
return [i * 2 for i in range(n)]
foo(1_000_000)foo | n=1000000 ::
0.08s | RSS peak 120.50 MB | RSS Δ 80.20 MB
@track(log_args=True)
def process(df):
return df.select("user_id")process | df=PolarsDF(shape=(1_000_000, 5), size=120.5MB) ::
0.42s | RSS peak 300.00 MB | RSS Δ 180.00 MB
@track(log_args=True)
def pipeline(df):
return (
df
.filter(pl.col("user_id") > 10)
.groupby("user_id")
.agg(pl.count())
.collect()
)pipeline | df=PolarsLazyFrame(plan=FILTER → GROUPBY → AGGREGATE ...) ::
0.12s | RSS peak 220.00 MB | RSS Δ 150.00 MB
@track(
use_tracemalloc=False, # track Python heap
rss_interval=0.01, # sampling frequency
log_args=True, # log arguments
max_arg_length=200, # truncate long repr
)| Metric | Description |
|---|---|
time |
Execution duration |
Py peak |
Python heap peak |
RSS peak |
Real process memory peak |
RSS Δ |
Memory growth |
- Python 3.12
- 1M rows dataset
- pandas + numpy pipeline
| Mode | Overhead |
|---|---|
| No profiling | baseline |
| RSS only | +1–2% |
| RSS + tracemalloc | +5–10% |
| RSS (0.001 interval) | +10–15% |
- Tracks only Python objects
- Adds overhead (~5–10%)
Use with caution for heavy NumPy / Torch:
- Uses background thread
- Accuracy depends on interval
| Interval | Accuracy | Overhead |
|---|---|---|
| 0.001 | high | high |
| 0.01 | balanced | recommended |
| 0.1 | low | minimal |
| Type | Output Example |
|---|---|
| numpy array | shape=(1000, 128) |
| pandas df | shape=(1000, 10) |
| polars df | size=120MB |
| LazyFrame | plan=FILTER → GROUPBY |
| list | len=1000 |
- GPU memory not tracked
- tracemalloc ignores NumPy internals
- RSS peak is sampled (not exact kernel-level peak)
- Prometheus integration
- JSON structured logging
- Airflow plugin
- Memory leak detection
- Torch / GPU support
PRs welcome. Focus areas:
- new formatters (torch, spark, etc.)
- performance improvements
- integrations
MIT