Production-grade feature and forward target engineering for quantitative research.
Rust core. Python API. Streaming and batch, same object.
Most feature engineering libraries take a full DataFrame and return a DataFrame. That works in research. In live trading, it forces you to keep a growing history in memory and recompute every feature on every new bar. This doesn't scale and isn't how production systems work.
A second, quieter problem: research code and live code diverge. Any inconsistency between them is a bug waiting to surface in production.
Oryon solves both. Every feature is a stateful object with a fixed memory footprint. You feed it one bar at a time in live trading, or pass the full dataset in research. Same object, same Rust code, same output.
pip install oryonNo Rust toolchain required. Pre-built wheels for Linux, macOS, and Windows.
Live trading, one bar at a time:
from oryon.features import Ema, ParkinsonVolatility
from oryon import FeaturePipeline
fp = FeaturePipeline(
features=[
Ema(["close"], window=20, outputs=["ema_20"]),
ParkinsonVolatility(["high", "low"], window=20, outputs=["pvol_20"]),
],
input_columns=["close", "high", "low"],
)
# on each new bar from your data feed
result = fp.update([bar.close, bar.high, bar.low])
# [nan, nan] during warm-up
# [102.4, 0.018] once readyResearch, full dataset at once:
The same feature pipeline (fp) defined above builds your training dataset. See the full quickstart for details.
import pandas as pd
from oryon import TargetPipeline
from oryon.adapters import run_features_pipeline_pandas, run_targets_pipeline_pandas
from oryon.targets import FutureReturn
# fp is the pipeline from the live trading example above
X = run_features_pipeline_pandas(fp, df)
y = run_targets_pipeline_pandas(
TargetPipeline(
targets=[FutureReturn(inputs=["close"], horizon=5, outputs=["ret_5"])],
input_columns=["close"],
),
df,
)
dataset = pd.concat([X, y], axis=1).dropna()
# ema_20 pvol_20 ret_5
# 2024-01-01 NaN NaN NaN <- warm-up
# ...
# 2024-01-21 102.4 0.01823 0.0312
# 2024-01-22 102.7 0.01754 0.0187
# ...
# 2024-12-27 118.2 0.02341 NaN <- forward periodRust core, Apple M-series. Python adds a constant ~150 ns per call on top.
Features: per update() call
| Feature | w=20 | w=200 |
|---|---|---|
Ema, SimpleReturn, LogReturn |
< 10 ns | < 10 ns |
Sma, ParkinsonVolatility, RogersSatchellVolatility |
< 30 ns | < 175 ns |
Skewness, Kurtosis, LinearSlope |
< 40 ns | < 510 ns |
Kama |
164 ns | 870 ns |
The goal is every feature under 1 µs at w=200.
Targets: per run_research() call over 1 000 bars
| Target | h=20 | h=200 |
|---|---|---|
FutureReturn |
1.9 µs | 1.7 µs |
FutureCTCVolatility |
28 µs | 280 µs |
FutureLinearSlope |
31 µs | 287 µs |
Custom pipelines accumulate silent bugs:
- Look-ahead bias. A feature that accidentally reads future data produces results impossible to replicate in live trading. It will never raise an error.
- State leakage between folds. In cross-validation, state from one fold contaminates the next if not explicitly reset. The numbers look plausible. The model is wrong.
- Research / live divergence. Batch and streaming implementations drift over time. A subtle difference in edge case handling is enough to break a live strategy.
Every feature and target in Oryon ships with contract tests that enforce warm_up_period, forward_period, None propagation, reset correctness, and instance independence. The test infrastructure is part of the public API. Contributions must pass the same contracts.
Full API reference, guides, and benchmarks at oryonlib.dev.
Contributions of features and targets are welcome. See the contributing guide for the full workflow and checklist.
MIT. See LICENSE.
Developed by Lucas Inglese
