Feat/vd 4347 outlier detection cleaning#128
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the backtesting engine by integrating robust data quality validation, including a new outlier detection mechanism. It streamlines the configuration of validation policies and improves the reliability and traceability of evaluation results through refactored data handling and persistent metadata. These changes aim to provide more explicit control over data quality and optimization feasibility, ensuring more dependable and reproducible backtest outcomes. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive validation and optimization policy for the backtesting system, refactoring the BacktestRunner to implement a multi-stage gating system for job and strategy execution. New data structures and SQLite-based caching (EvaluationCache, ResultStore) are added to manage evaluation results and run metadata, alongside updated dependency management. Documentation is expanded to cover the new validation rules and CLI options. A review comment suggests improving resource management in src/backtest/evaluation/store.py by using with statements for sqlite3 connections to ensure automatic closing and transaction handling, even during errors.
I am having trouble creating individual review comments. Click here to see my feedback.
src/backtest/evaluation/store.py (150-193)
For improved resource management and to make the code more idiomatic, consider using a with statement for handling sqlite3 connections. This ensures the connection is automatically closed and transactions are committed or rolled back, even if errors occur. This pattern can be applied to all methods in this file that interact with the database (_ensure, get, set in EvaluationCache, and all methods in ResultStore).
with sqlite3.connect(self.db_path) as con:
con.execute(
"""
INSERT OR REPLACE INTO evaluation_cache
(
collection,
symbol,
timeframe,
strategy,
params_json,
metric_name,
metric_value,
stats_json,
data_fingerprint,
fees,
slippage,
evaluation_mode,
mode_config_hash,
validation_config_hash,
engine_version
)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""",
(
collection,
symbol,
timeframe,
strategy,
params_json,
metric_name,
float(metric_value),
json.dumps(stats, sort_keys=True),
data_fingerprint,
fees,
slippage,
evaluation_mode,
mode_config_hash,
validation_config_hash,
EVALUATION_SCHEMA_VERSION,
),
)…cy checks explicit
… outlier settings
…rDetectionCleaning' into feat/VD-4347-OutlierDetectionCleaning
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 3 total unresolved issues (including 2 from previous reviews).
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Stale global policy reference used in per-collection merge
- After normalizing global validation policies, the function now refreshes the global policy references from validation_cfg before per-collection merges.
…iabilty-checks' into feat/VD-4347-OutlierDetectionCleaning
… price variance validation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Autofix Details
Bugbot Autofix resolved 1 of the 3 issues found in the latest run.
- ✅ Fixed: Stats dict self-references during trade meta construction
evaluatenow snapshots raw simulation stats and passes that immutable snapshot into_build_trade_metaso evaluator-injected keys are never read in trade meta construction.
…aning' into feat/VD-4347-OutlierDetectionCleaning
|
| metric_val = float(cached["metric_value"]) | ||
| plan.evaluations += 1 | ||
| if not np.isfinite(metric_val): | ||
| return float("-inf") |
There was a problem hiding this comment.
Cached non-finite evaluations silently lost from metrics tracking
Low Severity
In _apply_cached_evaluation, when the cached metric_value is non-finite (e.g., -inf from a previously invalid evaluation), the method returns early at line 2060 before incrementing result_cache_hits. Since result_cache_misses is only incremented when evaluation_cache.get() returns None, these cached-but-invalid evaluations are counted by neither counter. Previously, all cache hits were always counted. This creates a metrics gap where result_cache_hits + result_cache_misses < total_evaluations, which could mislead monitoring or observability consumers.
Triggered by team rule: Master Directive





Summary
This branch introduces outlier-detection as a first-class data-quality gate.
It also Refactors the validation/evaluation foundation to make validation behavior explicit, cache-safe, and dashboard-ready.
The branch simplifies validation config shape, adds run-level validation metadata persistence, and prevents evaluation-cache contamination by keying cache entries with an effective per-job validation profile hash.
Changes
Validation config simplification:
validation.data_quality.on_failis required whendata_qualityis configured.ResultStore.Storage:
ResultStore: newrun_metadatatable with:validation_profile_jsonactive_gates_jsoninactive_gates_jsonEvaluationCache: cache key now includesvalidation_config_hash.Documentation/config examples:
config/example.yamlto current validation schema.Tests:
Outlier detection introduced:
Adds
validation.data_quality.outlier_detectiongate with:max_outlier_pctmethod(zscoreormodified_zscore)zscore_thresholdIntegrates outlier checks into data-validation reliability reasons and gate decisions.
Handles indeterminate modified-zscore cases explicitly (e.g.
mad_zero) via structured rejection reason.Breaking changes:
validation.data_quality.min_data_points.min->validation.data_quality.min_data_pointsvalidation.data_quality.kurtosis.max->validation.data_quality.kurtosisvalidation.data_quality.on_failis now required whendata_qualityexists.validation_config_hash(old cache rows are not reused under new keying).How to Test
make testsmake runEVALUATION_RESULTS_SOURCE=result_storeRelevant config/env:
validation.data_qualityandvalidation.optimization.evaluation_modestill defaults tobacktest.Checklist (KISS)
pre-commit run --all-files).envvalues are excludedNotes:
make testspasses on this branch.Related Issues/Links
Note
Medium Risk
Touches core backtest gating, evaluation caching keys, and SQLite schemas; mistakes could invalidate cached results or incorrectly skip/reject jobs/results. Changes are covered by expanded tests, but require attention to migration/backward-compat behavior and policy resolution correctness.
Overview
RCA: Validation behavior was implicitly merged at runtime and evaluation caching was keyed only by mode/data fingerprints, allowing policy changes (or per-collection overrides) to contaminate cache correctness and making gate activation opaque.
The Fix: Restructures validation config into explicit modules (
data_quality.continuity,data_quality.outlier_detection,result_consistency.*) and resolves global-vs-collection overrides duringload_configviaresolve_validation_overrides. The runner now computes a per-jobvalidation_config_hashto keyEvaluationCacheentries, adds new data-quality outlier checks plus result-consistency gates (trade PnL concentration and execution fill price variance), enriches evaluation stats withtrade_meta, and persists run-level validation profiles + active/inactive gate IDs in a newResultStore.run_metadatatable (also surfaced in CLI/dashboard summary JSON).The Proof: Updates/adds unit tests across config parsing, runner gating, evaluator trade metadata, and store/cache behavior (including validation-hash cache isolation and run-metadata round-trip);
make testspasses and coverage remains strictly >80%.Telemetry Added: Run summaries and dashboard payloads now include resolved validation profiles plus
active_gates/inactive_gates, and the same metadata is persisted per-run inresult_store(run_metadata) for post-run inspection.Written by Cursor Bugbot for commit f4f32c8. This will update automatically on new commits. Configure here.