Feat/vd 4347 outlier detection cleaning by AlexanderPietsch · Pull Request #128 · VollcomDigital/quant-system

AlexanderPietsch · 2026-03-16T03:13:05Z

Summary

This branch introduces outlier-detection as a first-class data-quality gate.
It also Refactors the validation/evaluation foundation to make validation behavior explicit, cache-safe, and dashboard-ready.
The branch simplifies validation config shape, adds run-level validation metadata persistence, and prevents evaluation-cache contamination by keying cache entries with an effective per-job validation profile hash.

Changes

Validation config simplification:
- validation.data_quality.on_fail is required when data_quality is configured.
- Computes run-level validation metadata (resolved profile + active/inactive gates).
- Persists run metadata once per run to normalized ResultStore.
- Adds active validation gates to CLI run summary and summary payload.
Storage:
- ResultStore: new run_metadata table with:
  - validation_profile_json
  - active_gates_json
  - inactive_gates_json
- EvaluationCache: cache key now includes validation_config_hash.
- Runner computes effective per-job validation profile hash (global + collection override resolution) and passes it on cache get/set.
Documentation/config examples:
- Updated README validation section to reflect actual optionality/required behavior and continuity diagnostics nuance.
- Updated config/example.yaml to current validation schema.
Tests:
- Updated config and runner tests for scalar validation fields.
- Added evaluation-store run metadata round-trip test.
- Added evaluation-cache validation-hash isolation test.
Outlier detection introduced:
Adds validation.data_quality.outlier_detection gate with:
- max_outlier_pct
- method (zscore or modified_zscore)
- zscore_threshold
Integrates outlier checks into data-validation reliability reasons and gate decisions.
Handles indeterminate modified-zscore cases explicitly (e.g. mad_zero) via structured rejection reason.

Breaking changes:

Validation config shape changed:
- validation.data_quality.min_data_points.min -> validation.data_quality.min_data_points
- validation.data_quality.kurtosis.max -> validation.data_quality.kurtosis
validation.data_quality.on_fail is now required when data_quality exists.
Evaluation cache schema/key includes validation_config_hash (old cache rows are not reused under new keying).

How to Test

Full suite:
- make tests
Optional end-to-end run:
- make run
Optional dashboard source toggle:
- EVALUATION_RESULTS_SOURCE=result_store

Relevant config/env:

Validation config is under validation.data_quality and validation.optimization.
evaluation_mode still defaults to backtest.

Checklist (KISS)

Pre-commit passes locally (pre-commit run --all-files)
Tests added/updated where it makes sense (80% cov gate)
Docs/README updated if needed
No secrets committed; .env values are excluded
Backward compatibility considered (configs, CLI flags)

Notes:

Backward compatibility was considered, but this branch intentionally introduces config/cache key breakage for correctness and long-term maintainability.
make tests passes on this branch.

Related Issues/Links

Closes #
References #

Note

Medium Risk
Touches core backtest gating, evaluation caching keys, and SQLite schemas; mistakes could invalidate cached results or incorrectly skip/reject jobs/results. Changes are covered by expanded tests, but require attention to migration/backward-compat behavior and policy resolution correctness.

Overview
RCA: Validation behavior was implicitly merged at runtime and evaluation caching was keyed only by mode/data fingerprints, allowing policy changes (or per-collection overrides) to contaminate cache correctness and making gate activation opaque.

The Fix: Restructures validation config into explicit modules (data_quality.continuity, data_quality.outlier_detection, result_consistency.*) and resolves global-vs-collection overrides during load_config via resolve_validation_overrides. The runner now computes a per-job validation_config_hash to key EvaluationCache entries, adds new data-quality outlier checks plus result-consistency gates (trade PnL concentration and execution fill price variance), enriches evaluation stats with trade_meta, and persists run-level validation profiles + active/inactive gate IDs in a new ResultStore.run_metadata table (also surfaced in CLI/dashboard summary JSON).

The Proof: Updates/adds unit tests across config parsing, runner gating, evaluator trade metadata, and store/cache behavior (including validation-hash cache isolation and run-metadata round-trip); make tests passes and coverage remains strictly >80%.

Telemetry Added: Run summaries and dashboard payloads now include resolved validation profiles plus active_gates/inactive_gates, and the same metadata is persisted per-run in result_store (run_metadata) for post-run inspection.

^{Written by Cursor Bugbot for commit f4f32c8. This will update automatically on new commits. Configure here.}

…n outputs

gemini-code-assist · 2026-03-16T03:14:21Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the backtesting engine by integrating robust data quality validation, including a new outlier detection mechanism. It streamlines the configuration of validation policies and improves the reliability and traceability of evaluation results through refactored data handling and persistent metadata. These changes aim to provide more explicit control over data quality and optimization feasibility, ensuring more dependable and reproducible backtest outcomes.

Highlights

Outlier Detection: Introduced a new outlier_detection data quality gate with configurable max_outlier_pct, method (zscore/modified_zscore), and zscore_threshold to identify and handle anomalous data points.
Refactored Validation & Evaluation: The core validation and evaluation foundation has been refactored to make validation behavior explicit, ensure cache safety, and prepare for dashboard integration. This includes new data structures for job context, gate decisions, and evaluation outcomes.
Validation Configuration Simplification: Simplified the validation configuration shape, making validation.data_quality.on_fail a required field when data quality is configured. Configuration now supports global and per-collection overrides for validation policies.
Run-Level Validation Metadata Persistence: Implemented persistence of run-level validation metadata (resolved profile, active/inactive gates) to a new ResultStore table, providing a comprehensive record of validation policies applied during a run.
Evaluation Cache Contamination Prevention: Enhanced the evaluation cache key to include a validation_config_hash, preventing contamination by ensuring cache entries are unique to the effective per-job validation profile.
Breaking Changes: Introduced breaking changes to the validation config shape, specifically min_data_points and kurtosis fields, and made validation.data_quality.on_fail mandatory. The evaluation cache schema has also changed, meaning old cache rows will not be reused under the new keying.

Changelog

AGENTS.md
- Updated the 'How to run' section to recommend make targets for common operations.
- Added a new 'Design rules' section outlining cognitive complexity guidelines for functions.
DEVELOPMENT.md
- Added a new document detailing the backtest runner's high-level flow, gate model, evaluation model, and continuity score calendar behavior.
Makefile
- Added new refresh-image, refresh-image-nc, tests, coverage, and precommit-coverage targets.
- Removed poetry install and poetry run from most run commands for cleaner execution.
- Updated dashboard port mapping and removed git add poetry.lock from lock commands.
README.md
- Added documentation for the EVALUATION_RESULTS_SOURCE environment variable.
- Included the --evaluation-mode CLI option and updated run summary metrics to reflect fresh simulation/metric evaluations.
- Added a detailed 'Validation & Optimization Policy' section explaining the new configuration options for data quality and optimization gates.
config/example.yaml
- Added a new validation section with data_quality and optimization configurations.
- Introduced outlier_detection parameters within data_quality.
- Included commented-out examples for per-collection validation overrides.
poetry.lock
- Updated click and typer package versions.
- Added new dependencies: exchange-calendars, korean-lunar-calendar, pyluach, and toolz.
pyproject.toml
- Updated typer and click dependency versions.
- Added exchange-calendars as a new dependency.
src/backtest/evaluation/init.py
- Added a new initialization file for the evaluation module, exposing its core components.
src/backtest/evaluation/adapters.py
- Added a new module with a utility function normalized_rows_to_legacy_rows for data transformation.
src/backtest/evaluation/contracts.py
- Added a new module defining data contracts for evaluation requests, outcomes, and result records.
src/backtest/evaluation/evaluator.py
- Added a new module introducing Evaluator protocol and BacktestEvaluator for handling simulation and metric evaluation.
src/backtest/evaluation/store.py
- Added a new module implementing EvaluationCache and ResultStore for persistent storage of evaluation results and run metadata using SQLite.
src/backtest/results_cache.py
- Introduced ResultsCacheRecord dataclass for structured cache entries.
- Modified ResultsCache to include evaluation_mode and mode_config_hash in its primary key, along with migration logic for existing caches.
- Updated the set method to accept a ResultsCacheRecord object.
src/backtest/runner.py
- Refactored the BacktestRunner to incorporate new data structures for job context, gate decisions, and validation states.
- Implemented a multi-stage validation pipeline including collection, data fetching, data validation, execution context preparation, strategy plan validation, and strategy result validation.
- Integrated EvaluationCache and ResultStore for enhanced caching and persistence of evaluation data and run metadata.
- Extended _bars_per_year to support monthly timeframes and added new methods for continuity score calculation and outlier detection.
- Removed direct configuration of param_dof_multiplier and param_min_bars, now managed through validation policies.
- Added methods for serializing and hashing validation profiles to ensure cache isolation.
src/config.py
- Introduced new dataclasses for detailed validation configurations: ValidationCalendarConfig, ValidationDataQualityConfig, ValidationContinuityConfig, ValidationOutlierDetectionConfig, ValidationConfig, and OptimizationPolicyConfig.
- Removed param_dof_multiplier and param_min_bars from the main Config class.
- Added evaluation_mode and validation fields to the Config class.
- Implemented helper functions for merging and parsing validation configurations, including normalize_validation_defaults to apply default values and overrides.
src/main.py
- Defined SUMMARY_JSON_FILENAME constant for consistent file naming.
- Added an evaluation_mode CLI option to override the configuration's evaluation mode.
- Updated BacktestRunner instantiation to pass the configured evaluation_mode.
- Modified dashboard payload generation to optionally use the new ResultStore via an environment variable.
- Included validation metadata in the run summary and updated console output to display new metrics like fresh_simulation_runs and active validation gates.
tests/test_backtest_runner.py
- Added _StubEvaluationCache for testing purposes.
- Expanded tests for _bars_per_year and introduced tests for _timeframe_to_timedelta.
- Added comprehensive tests for compute_continuity_score covering various data scenarios and calendar types.
- Included tests for evaluation_cache persistence and isolation.
- Added tests to verify strategy skipping behavior based on validation gate failures, including min data points, continuity, kurtosis, and outlier detection.
- Introduced tests for collection-level validation overrides and the blocking of jobs within a collection due to validation failures.
- Added a test to confirm rejection of unimplemented walk-forward evaluation mode.
tests/test_config.py
- Added tests for loading and validating the new evaluation_mode configuration.
- Included extensive tests for parsing and validating the new validation configurations, covering data_quality (min data points, continuity, on_fail, calendar settings, outlier detection) and optimization policies.
tests/test_evaluation_store.py
- Added a new test file to verify the functionality of EvaluationCache and ResultStore, including tests for mode hash, validation hash, and run metadata round-trip persistence.
tests/test_main_cli.py
- Updated DummyRunner constructor to accommodate new parameters.
- Added tests to confirm the evaluation_mode CLI option correctly overrides the configuration and handles invalid inputs.
tests/test_results_cache.py
- Added a test to ensure the ResultsCache correctly distinguishes and stores entries based on different evaluation modes.

Activity

AlexanderPietsch initiated the pull request to introduce outlier detection and refactor the validation system.
Significant code changes were made across multiple modules to implement the new validation framework, including new data contracts, evaluation stores, and runner logic.
Configuration files and documentation were updated to reflect the new validation policies and breaking changes.
New unit tests were added and existing ones modified to ensure the correctness and robustness of the new features, especially around data quality gates and caching mechanisms.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

src/backtest/runner.py

src/backtest/evaluation/store.py

src/backtest/runner.py

…ation

gemini-code-assist

Code Review

This pull request introduces a comprehensive validation and optimization policy for the backtesting system, refactoring the BacktestRunner to implement a multi-stage gating system for job and strategy execution. New data structures and SQLite-based caching (EvaluationCache, ResultStore) are added to manage evaluation results and run metadata, alongside updated dependency management. Documentation is expanded to cover the new validation rules and CLI options. A review comment suggests improving resource management in src/backtest/evaluation/store.py by using with statements for sqlite3 connections to ensure automatic closing and transaction handling, even during errors.

I am having trouble creating individual review comments. Click here to see my feedback.

src/backtest/evaluation/store.py (150-193)

For improved resource management and to make the code more idiomatic, consider using a with statement for handling sqlite3 connections. This ensures the connection is automatically closed and transactions are committed or rolled back, even if errors occur. This pattern can be applied to all methods in this file that interact with the database (_ensure, get, set in EvaluationCache, and all methods in ResultStore).

        with sqlite3.connect(self.db_path) as con:
            con.execute(
                """
                INSERT OR REPLACE INTO evaluation_cache
                (
                    collection,
                    symbol,
                    timeframe,
                    strategy,
                    params_json,
                    metric_name,
                    metric_value,
                    stats_json,
                    data_fingerprint,
                    fees,
                    slippage,
                    evaluation_mode,
                    mode_config_hash,
                    validation_config_hash,
                    engine_version
                )
                VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
                """,
                (
                    collection,
                    symbol,
                    timeframe,
                    strategy,
                    params_json,
                    metric_name,
                    float(metric_value),
                    json.dumps(stats, sort_keys=True),
                    data_fingerprint,
                    fees,
                    slippage,
                    evaluation_mode,
                    mode_config_hash,
                    validation_config_hash,
                    EVALUATION_SCHEMA_VERSION,
                ),
            )

src/backtest/runner.py

… serializer

src/config.py

tests/test_config.py

…cy checks explicit

… outlier settings

…rDetectionCleaning' into feat/VD-4347-OutlierDetectionCleaning

src/backtest/runner.py

src/backtest/evaluation/store.py

…ract

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Stale global policy reference used in per-collection merge
- After normalizing global validation policies, the function now refreshes the global policy references from validation_cfg before per-collection merges.

src/config.py

…iabilty-checks' into feat/VD-4347-OutlierDetectionCleaning

… price variance validation

src/backtest/evaluation/evaluator.py

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Autofix Details

Bugbot Autofix resolved 1 of the 3 issues found in the latest run.

✅ Fixed: Stats dict self-references during trade meta construction
- evaluate now snapshots raw simulation stats and passes that immutable snapshot into _build_trade_meta so evaluator-injected keys are never read in trade meta construction.

src/backtest/evaluation/evaluator.py

src/backtest/evaluation/contracts.py

src/config.py

…aning' into feat/VD-4347-OutlierDetectionCleaning

src/backtest/runner.py

src/config.py

src/backtest/runner.py

sonarqubecloud · 2026-03-23T10:13:41Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
1.9% Duplication on New Code

See analysis details on SonarQube Cloud

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

cursor · 2026-03-23T10:25:10Z

src/backtest/runner.py

+        metric_val = float(cached["metric_value"])
+        plan.evaluations += 1
+        if not np.isfinite(metric_val):
+            return float("-inf")


Cached non-finite evaluations silently lost from metrics tracking

Low Severity

In _apply_cached_evaluation, when the cached metric_value is non-finite (e.g., -inf from a previously invalid evaluation), the method returns early at line 2060 before incrementing result_cache_hits. Since result_cache_misses is only incremented when evaluation_cache.get() returns None, these cached-but-invalid evaluations are counted by neither counter. Previously, all cache hits were always counted. This creates a metrics gap where result_cache_hits + result_cache_misses < total_evaluations, which could mislead monitoring or observability consumers.

^{Triggered by team rule: Master Directive}

AlexanderPietsch added 6 commits March 14, 2026 10:14

feat: added data validation gate outlier detection

e33656e

refactor: introduce reusable config parser helpers

577c00b

refactor: move outlier detection to module level

d6de2fc

refactor: simplify validation config schema and data-quality policy flow

4d8b803

feat: persist validation gate metadata and surface active gates in ru…

247598e

…n outputs

fix: key evaluation cache by effective validation profile hash

353866c

cursor bot reviewed Mar 16, 2026

View reviewed changes

src/backtest/runner.py Show resolved Hide resolved

src/backtest/evaluation/store.py Show resolved Hide resolved

src/backtest/runner.py Outdated Show resolved Hide resolved

AlexanderPietsch changed the base branch from dev to feat/VD-4344-data-collection-reliabilty-checks March 16, 2026 03:16

Fix outlier indeterminate gating and cleanup config metadata normaliz…

f4e42fa

…ation

gemini-code-assist bot reviewed Mar 16, 2026

View reviewed changes

cursor bot reviewed Mar 16, 2026

View reviewed changes

src/backtest/runner.py Outdated Show resolved Hide resolved

src/backtest/runner.py Outdated Show resolved Hide resolved

AlexanderPietsch and others added 4 commits March 16, 2026 10:26

fix: introduce EvaluationCacheRecord for evaluation cache

66274df

refactor: extract calendar payload construction in validation profile…

1a8bdf8

… serializer

refactor: removed always true statement in calender policy

a9ee5d5

Fix calendar auto branch and outlier indeterminate telemetry

07da6c4

cursor bot reviewed Mar 16, 2026

View reviewed changes

src/config.py Show resolved Hide resolved

Fix outlier config merge default override bug

4584368

cursor bot reviewed Mar 16, 2026

View reviewed changes

src/config.py Show resolved Hide resolved

tests/test_config.py Show resolved Hide resolved

AlexanderPietsch added 8 commits March 16, 2026 12:45

refactor: resolve validation overrides in config and keep runner poli…

e6929ec

…cy checks explicit

refactor: clarify validation override resolution and require explicit…

deaa907

… outlier settings

Merge remote-tracking branch 'refs/remotes/origin/feat/VD-4347-Outlie…

854d732

…rDetectionCleaning' into feat/VD-4347-OutlierDetectionCleaning

fix: prevent phantom continuity calendar during validation merge

b77f7e2

fix: test_load_config_reliability_thresholds_invalid_continuity_score

eb50ba2

refactor: deduplicate evaluation contracts via shared field base class

3bd3ae1

refactor: make validation override normalization explicitly in-place

cbc67b1

refactor: simplify validation override resolution flow in config

1fa8d33

cursor bot reviewed Mar 16, 2026

View reviewed changes

src/backtest/runner.py Outdated Show resolved Hide resolved

src/backtest/runner.py Show resolved Hide resolved

Fix continuity diagnostics gate and evaluation cache setter

a368d75

cursor bot reviewed Mar 16, 2026

View reviewed changes

src/backtest/evaluation/store.py Outdated Show resolved Hide resolved

refactor: evaluator trade-meta to use strict dataframe-based sim cont…

7a5c669

…ract

cursor bot reviewed Mar 19, 2026

View reviewed changes

src/config.py Show resolved Hide resolved

cursoragent and others added 7 commits March 19, 2026 06:11

Use normalized validation globals for collection merges

e5970af

refactor: cleanup resolve_validation_overrides

ef86e50

refactor: reduce complexity _build_trade_meta

beb9b53

ref

6ebb446

fix: reject unsupported outlier detection methods in runner

8c12950

Merge remote-tracking branch 'origin/feat/VD-4344-data-collection-rel…

26fa816

…iabilty-checks' into feat/VD-4347-OutlierDetectionCleaning

feat: refactor result consistency into nested gates and add execution…

f5665fe

… price variance validation