Feat/vd 4344 data collection reliabilty checks by AlexanderPietsch · Pull Request #123 · VollcomDigital/quant-system

AlexanderPietsch · 2026-03-07T07:33:08Z

Summary

Refactors the backtest core into a mode-aware evaluation platform foundation (backtest-only active), including a full gate-mechanism redesign for composable validations, explicit evaluation contracts, evaluator/store abstractions, and dual-write persistence. The change prepares the system for future walk_forward support and richer validation families while keeping current reporting flows stable.

Changes

Added evaluation platform modules under src/backtest/evaluation/:
- contracts.py: EvaluationRequest, EvaluationOutcome, EvaluationModeConfig, ResultRecord
- evaluator.py: BacktestEvaluator
- store.py: EvaluationCache, ResultStore
- adapters.py: normalized-to-legacy row adapters for reporting compatibility
Refactored BacktestRunner:
- mode-aware validation template at each stage (common, backtest, walk_forward hook)
- gate mechanism refactor into a composable validation pipeline:
  - stage-level ValidationContext
  - composed gate outcomes via _compose_gate_decisions
  - uniform gate structure across collection/data/preparation/strategy plan/result stages
- this establishes the foundation for adding future validation families (data quality, leakage checks, fold feasibility, robustness checks) without reworking orchestration
- evaluator dispatch via _get_evaluator()
- dual-write behavior to legacy ResultsCache + new ResultStore
Extended ResultsCache compatibility fields (evaluation_mode, mode_config_hash) and mode-aware get/set filtering.
Added config/CLI mode plumbing:
- evaluation_mode config field with validation (backtest|walk_forward)
- --evaluation-mode CLI override
- walk_forward currently fails fast at runtime (not implemented yet)
Added optional dashboard row source switch:
- EVALUATION_RESULTS_SOURCE=result_store
Improved runtime metric semantics:
- added fresh_simulation_runs, fresh_metric_evals
- kept param_evals as compatibility alias in runner internals
Added/updated tests:
- tests/test_evaluation_store.py
- mode/config/CLI updates and runner parity updates in existing test files

Breaking changes:

CLI run summary and metrics.prom now surface fresh_simulation_runs / fresh_metric_evals names.

How to Test

Full suite:
- make tests
Targeted run (backtest mode):
- docker-compose run --rm app bash -lc "poetry run python -m src.main run --config config/collections/crypto.yaml --evaluation-mode backtest"
Walk-forward fail-fast behavior:
- docker-compose run --rm app bash -lc "poetry run python -m src.main run --config config/collections/crypto.yaml --evaluation-mode walk_forward"
Optional dashboard source switch:
- EVALUATION_RESULTS_SOURCE=result_store

Checklist (KISS)

Pre-commit passes locally (pre-commit run --all-files)
Tests added/updated where it makes sense (80% cov gate)
Docs/README updated if needed
No secrets committed; .env values are excluded
Backward compatibility considered (configs, CLI flags)

Related Issues/Links

Closes #

https://vollcom-digital.atlassian.net/browse/VD-4349
https://vollcom-digital.atlassian.net/browse/VD-4344

References #

Note

High Risk
Large refactor of BacktestRunner execution flow plus new SQLite cache/store layers and config parsing can change backtest outcomes and persistence semantics. Also introduces new validation policies (data quality/optimization) that may skip jobs or disable optimization based on continuity/threshold checks.

Overview
RCA: Runner orchestration, caching, and validation were tightly coupled and hard to extend, making reliability checks and future evaluation modes risky to add without destabilizing execution.

The Fix: Refactors backtesting into a staged gate-based pipeline with explicit evaluation contracts, adds optional validation.data_quality and validation.optimization policies (including continuity scoring with optional exchange-calendars), and introduces a mode-aware evaluation cache + normalized ResultStore while continuing to dual-write to the legacy ResultsCache for reporting compatibility.

The Proof: Updates config/docs and adds new execution metrics (fresh_simulation_runs, fresh_metric_evals), plus Makefile targets for tests/coverage with a --cov-fail-under=80 gate to keep coverage strictly >80%.

Telemetry Added: Emits structured gate logs (e.g., data_validation_gate, strategy_optimization_gate) and surfaces new per-run counters in summary.json and metrics.prom; dashboard can optionally read normalized rows via EVALUATION_RESULTS_SOURCE=result_store.

^{Written by Cursor Bugbot for commit 84ded97. This will update automatically on new commits. Configure here.}

…icies

…-timeframe

…est runner - introduce typed gate/job/plan state models for run_all orchestration - make optimization skip routing plan-owned with explicit policy constraint application - switch reliability config/docs/tests to skip_job | skip_optimization and update stage expectations

add explicit image refresh targets and dashboard port mapping

…comes

…ayloads

…ting

… and backtest-ready gate templates Architectual refactor to prepare for Walkforward test

gemini-code-assist · 2026-03-07T07:33:39Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant architectural refactor to the backtesting system, transforming it into a more flexible and extensible evaluation platform. The primary goal is to enhance the reliability and robustness of data processing and strategy evaluation by implementing a structured gate mechanism for validations and a new dual-write persistence layer. These changes are crucial for supporting advanced evaluation modes like walk-forward analysis in the future, while maintaining stability for existing reporting workflows. The refactor also improves the clarity of runtime metrics and streamlines the development experience by simplifying command execution.

Highlights

Evaluation Platform Refactor: The backtest core has been refactored into a mode-aware evaluation platform, laying the groundwork for future features like walk-forward analysis. This includes new abstractions for evaluation contracts, evaluators, and result storage.
Gate Mechanism Redesign: A comprehensive gate mechanism has been introduced, allowing for composable validations at various stages of the backtesting process (e.g., collection, data fetch, data validation, strategy optimization). This enables more robust data quality and reliability checks.
Dual-Write Persistence: Evaluation results are now persisted using a dual-write approach, storing data in both the legacy ResultsCache for backward compatibility and a new ResultStore for normalized, mode-aware records.
Data Reliability Checks: New configuration options and logic have been added to define and enforce data reliability thresholds, including minimum data points and continuity scores. Policies can be configured to skip optimization, jobs, or collections based on these checks.
CLI and Configuration Enhancements: The CLI now supports an --evaluation-mode override, and configuration files can specify evaluation_mode and reliability_thresholds. Runtime metrics have been improved with fresh_simulation_runs and fresh_metric_evals.
Makefile and Poetry Streamlining: The Makefile has been updated to remove redundant poetry install && poetry run python commands, simplifying execution. New refresh-image and coverage targets were also added.

Changelog

Makefile
- Updated .PHONY targets to include new commands like refresh-image, coverage, and precommit-coverage.
- Removed poetry install && poetry run python prefixes from most run commands, simplifying execution.
- Added refresh-image and refresh-image-nc targets for pulling and building Docker images.
- Updated dashboard target to use -p 8000:8000 for port mapping.
- Added tests, coverage, and precommit-coverage targets for easier testing and code quality checks.
- Removed git add poetry.lock from lock and lock-update targets.
README.md
- Added a note about the optional EVALUATION_RESULTS_SOURCE environment variable for dashboard row source switching.
- Updated CLI options to include --evaluation-mode and clarified new runtime metric semantics (fresh_simulation_runs, fresh_metric_evals).
config/example.yaml
- Removed top-level param_search and param_trials as they are now handled differently or moved.
- Added Optimization settings section with param_search, param_trials, param_dof_multiplier, and param_min_bars.
- Introduced Optional Reliability policy section with reliability_thresholds for min_data_points, min_continuity_score, and on_fail.
- Modified the crypto collection to include only BTC/USDT and added commented-out reliability_thresholds for per-collection overrides.
- Updated the example strategy from YourStrategyName to RSIStrategy.
src/backtest/evaluation/init.py
- Added a new __init__.py file to define the evaluation package and expose its public interfaces.
src/backtest/evaluation/adapters.py
- Added a new module for utility functions to adapt ResultRecord objects and normalized rows to the legacy row format for reporting compatibility.
src/backtest/evaluation/contracts.py
- Added a new module defining data contracts for the evaluation platform, including EvaluationMode, EvaluationModeConfig, EvaluationRequest, EvaluationOutcome, and ResultRecord.
src/backtest/evaluation/evaluator.py
- Added a new module defining the Evaluator protocol and BacktestEvaluator class, which encapsulates the logic for running simulations and computing metrics.
src/backtest/evaluation/store.py
- Added a new module implementing EvaluationCache and ResultStore classes for persistent storage and retrieval of evaluation results, using SQLite databases.
src/backtest/results_cache.py
- Modified the results table schema to include evaluation_mode and mode_config_hash columns with default values.
- Updated get and set methods to filter and store results based on evaluation_mode and mode_config_hash.
src/backtest/runner.py
- Introduced new dataclasses for managing job context, gate decisions, job state, fetched/validated/prepared data, strategy plans, and evaluation outcomes.
- Integrated EvaluationCache and ResultStore for new evaluation persistence.
- Added evaluation_mode and mode_config_hash to the runner's state.
- Implemented _evaluation_cache_set and _result_store_insert methods for writing to the new stores.
- Added _get_evaluator method to resolve the appropriate evaluator at runtime.
- Introduced _timeframe_to_timedelta and compute_continuity_score for data reliability checks.
- Refactored the run_all method to incorporate a multi-stage validation pipeline with gate decisions, handling data fetching, validation, preparation, and strategy evaluation.
- Updated metric tracking to include fresh_simulation_runs and fresh_metric_evals.
- Modified strategy evaluation to use the new EvaluationRequest and EvaluationOutcome contracts and the BacktestEvaluator.
src/config.py
- Added reliability_thresholds to CollectionConfig to allow per-collection overrides.
- Introduced ReliabilityThresholdsConfig dataclass for defining data reliability parameters.
- Added evaluation_mode and reliability_thresholds fields to the main Config class.
- Implemented _parse_reliability_thresholds function for parsing and validating reliability configuration.
- Updated load_config to parse evaluation_mode and reliability_thresholds from YAML.
src/main.py
- Added --evaluation-mode CLI option to override the configured evaluation mode.
- Updated BacktestRunner instantiation to pass the evaluation_mode.
- Implemented logic to switch the dashboard payload source to ResultStore if EVALUATION_RESULTS_SOURCE environment variable is set.
- Updated dashboard payload and summary output to include evaluation_mode and new metrics like fresh_simulation_runs and fresh_metric_evals.
- Enhanced console output for run summaries, including failed strategies and collections.
tests/test_backtest_runner.py
- Added _StubEvaluationCache for testing the new evaluation cache logic.
- Updated _make_runner to initialize evaluation_cache and mode_config_hash.
- Added new test cases for compute_continuity_score covering various scenarios and invalid inputs.
- Updated existing tests to assert against new metrics (fresh_metric_evals) and the presence of data_reliability in stats.
- Added tests for reliability thresholds, including skipping optimization, skipping evaluation due to continuity, and collection-level overrides.
- Added tests for fetching behavior with multiple strategies and for skip_collection policy.
- Added a test to ensure walk_forward mode is rejected until implemented.
tests/test_config.py
- Added tests to verify loading of evaluation_mode from config.
- Added tests for loading and validating reliability_thresholds at both global and collection levels, including valid and invalid on_fail values.
tests/test_evaluation_store.py
- Added a new test file with tests for EvaluationCache to ensure mode-aware caching works correctly.
- Added tests for ResultStore to verify round-trip insertion and retrieval of ResultRecord objects.
tests/test_main_cli.py
- Updated DummyRunner and NoResultsRunner constructors to accept **kwargs for evaluation_mode.
- Added tests to verify the --evaluation-mode CLI override and its validation.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

src/backtest/evaluation/evaluator.py

src/backtest/results_cache.py

src/backtest/runner.py

gemini-code-assist

Code Review

This is a significant and well-executed refactoring of the backtest core. You've successfully introduced a mode-aware evaluation platform, a composable gate mechanism for validations, and a new persistence layer with dual-write capabilities. These changes establish a strong foundation for future extensions like walk-forward testing and enhance the system's reliability and maintainability. The code is well-structured, and the new test coverage is comprehensive. My review includes one suggestion to make the data retrieval logic in the new ResultStore more concise and robust.

src/backtest/evaluation/store.py

src/backtest/results_cache.py

src/backtest/runner.py

src/backtest/results_cache.py

src/backtest/runner.py

src/backtest/evaluation/adapters.py

src/backtest/runner.py

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: AlexanderPietsch <alexander.pietsch@vollcom-digital.de>

…iabilty-checks' into feat/VD-4344-data-collection-reliabilty-checks

Copilot

Pull request overview

Refactors the backtest pipeline into a mode-aware evaluation foundation with composable gating/validation, plus new persistence layers (evaluation cache + normalized result store) while maintaining legacy ResultsCache compatibility.

Changes:

Introduces src/backtest/evaluation/ contracts, evaluator, cache, and result store; adds adapters to preserve legacy reporting rows.
Refactors BacktestRunner into staged, composable gates (collection/data/plan/result) with data-quality + optimization feasibility policies and dual-write persistence.
Adds config/CLI plumbing for evaluation_mode, new metrics naming (fresh_simulation_runs, fresh_metric_evals), and updates tests/docs/workflows accordingly.

Reviewed changes

Copilot reviewed 22 out of 23 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/test_results_cache.py	Adds coverage for mode/mode-hash isolation in legacy cache.
tests/test_main_cli.py	Updates CLI runner stubs for new runner kwargs and tests evaluation-mode flag.
tests/test_evaluation_store.py	Adds tests for evaluation cache hashing and result-store idempotent inserts.
tests/test_config.py	Adds parsing/validation tests for evaluation mode and validation policies.
tests/test_backtest_runner.py	Expands runner tests for continuity, calendars, gated skipping behaviors, and new metrics.
src/main.py	Adds `--evaluation-mode`, optional dashboard row source switch, and updated summary/metrics output.
src/config.py	Adds validation policy schema + parsing and `evaluation_mode` validation.
src/backtest/runner.py	Major refactor: gate pipeline, evaluator dispatch, data-quality continuity scoring, and dual-write persistence.
src/backtest/results_cache.py	Adds typed record input, mode-aware PK columns, and migration to new PK.
src/backtest/evaluation/store.py	Implements SQLite-based evaluation cache + normalized result store.
src/backtest/evaluation/evaluator.py	Adds evaluator abstraction with explicit outcome contract.
src/backtest/evaluation/contracts.py	Adds explicit request/outcome/mode contracts and result record type.
src/backtest/evaluation/adapters.py	Adds normalized-to-legacy row adapter for existing reporting.
src/backtest/evaluation/init.py	Exposes evaluation API surface.
pyproject.toml	Updates Typer/Click versions and adds `exchange-calendars`.
config/example.yaml	Documents new validation policy configuration.
README.md	Documents CLI/mode changes and new validation policies.
Makefile	Adds test/coverage targets and adjusts docker-compose commands.
DEVELOPMENT.md	Adds internal docs for new gate pipeline and continuity logic.
AGENTS.md	Updates contributor guidance and adds complexity rule.
.github/workflows/release.yml	Normalizes GHCR owner for tags.
.github/workflows/daily-backtest.yml	Adjusts strategy repo checkout conditions using env.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/backtest/results_cache.py

src/main.py

src/backtest/results_cache.py

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: AlexanderPietsch <alexander.pietsch@vollcom-digital.de>

Copilot

Pull request overview

Refactors the backtest runner into a mode-aware evaluation foundation (backtest active; walk-forward fail-fast), adding composable validation gates, a normalized evaluation cache + result store, and updated CLI/config plumbing for evaluation mode + reliability/optimization policies.

Changes:

Introduces src/backtest/evaluation/* (contracts, evaluator, SQLite cache/store, legacy row adapter) and dual-write persistence (legacy ResultsCache + new ResultStore).
Refactors BacktestRunner into staged gate decisions (collection/data/prep/strategy plan/result), adds continuity/reliability checks, and updates metrics counters.
Adds config + CLI support for evaluation_mode and validation policies, plus new/updated tests across runner/cache/store/config/CLI.

Reviewed changes

Copilot reviewed 22 out of 23 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/test_results_cache.py	Adds regression tests for mode-aware `ResultsCache` keys + normalization.
tests/test_main_cli.py	Updates CLI tests for new runner ctor signature and `--evaluation-mode` validation.
tests/test_evaluation_store.py	Adds tests for new `EvaluationCache` hashing and `ResultStore` identity/idempotency.
tests/test_config.py	Adds coverage for `evaluation_mode` and new validation policy parsing/validation.
tests/test_backtest_runner.py	Extends runner tests for continuity scoring, gate actions, metrics, and dual-write behavior.
src/main.py	Adds `--evaluation-mode`, walk-forward fail-fast handling, summary output tweaks, and optional dashboard source switch.
src/config.py	Adds `evaluation_mode` + `validation` schema and parsing helpers for reliability/optimization policies.
src/backtest/runner.py	Major runner refactor: staged gates, continuity scoring, evaluator/cache/store plumbing, and updated metrics.
src/backtest/results_cache.py	Extends legacy cache schema with mode + mode-hash and adds typed `ResultsCacheRecord`.
src/backtest/evaluation/store.py	Implements SQLite-backed `EvaluationCache` and normalized `ResultStore` with identity index repair.
src/backtest/evaluation/evaluator.py	Adds `BacktestEvaluator` to centralize simulation + metric evaluation semantics.
src/backtest/evaluation/contracts.py	Defines evaluation contracts (`EvaluationRequest/Outcome`, mode config, normalized `ResultRecord`).
src/backtest/evaluation/adapters.py	Converts normalized store rows into legacy dashboard/report row shape.
src/backtest/evaluation/init.py	Exposes evaluation platform public API symbols.
pyproject.toml	Updates CLI deps (typer/click) and adds `exchange-calendars`.
config/example.yaml	Documents new `validation` policy knobs and per-collection overrides.
README.md	Documents new `--evaluation-mode`, metrics counters, and validation/optimization policy semantics.
Makefile	Updates run/test/coverage targets and adds image refresh helpers.
DEVELOPMENT.md	Adds internal design notes for gate flow and continuity score rules.
AGENTS.md	Updates contributor guidance and adds complexity guidelines.
.github/workflows/release.yml	Normalizes GHCR owner casing via env var for image tags.
.github/workflows/daily-backtest.yml	Fixes checkout condition by routing secrets through env vars.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/main.py

AGENTS.md

This reverts commit c490a13.

This reverts commit c530a20.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: AlexanderPietsch <alexander.pietsch@vollcom-digital.de>

…iabilty-checks' into feat/VD-4344-data-collection-reliabilty-checks

Copilot

Pull request overview

Refactors the backtest runner into a staged, mode-aware evaluation pipeline and adds new persistence layers to improve reliability checks and prepare for future walk_forward support.

Changes:

Introduces evaluation contracts, evaluator abstraction, an evaluation cache, and a normalized result store (SQLite), with dual-write alongside the legacy ResultsCache.
Redesigns runner orchestration into composable gate stages (collection/data/validation/prep/strategy plan/result validation) with new data-quality/optimization policies (continuity, min bars, etc.).
Extends CLI/config plumbing for evaluation_mode, adds dashboard row source switch, and updates metrics/summary outputs and tests.

Reviewed changes

Copilot reviewed 22 out of 23 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/test_results_cache.py	Adds coverage for mode-aware ResultsCache behavior and evaluation_mode normalization
tests/test_main_cli.py	Adds CLI coverage for dashboard row source switching + evaluation mode overrides
tests/test_evaluation_store.py	Adds tests for EvaluationCache hashing, ResultStore round-trip/idempotency/index repair
tests/test_config.py	Adds parsing/validation tests for evaluation_mode and validation policies
tests/test_backtest_runner.py	Expands runner tests for new gates, continuity scoring, policy behaviors, and metrics
src/main.py	Adds `--evaluation-mode`, result_store dashboard adapter, richer summary/metrics outputs
src/config.py	Adds validation policy dataclasses + YAML parsing/validation + evaluation_mode parsing
src/backtest/runner.py	Major refactor: gate pipeline, continuity scoring, optimization policy, evaluator + dual-write persistence
src/backtest/results_cache.py	Adds typed record wrapper + mode-aware PK/migrations + mode filtering
src/backtest/evaluation/store.py	Implements SQLite EvaluationCache and ResultStore with unique identity index + repair
src/backtest/evaluation/evaluator.py	Adds BacktestEvaluator contract for simulation + metric computation outcomes
src/backtest/evaluation/contracts.py	Adds evaluation contracts (requests/outcomes/mode config/result record)
src/backtest/evaluation/adapters.py	Adds adapter to map normalized result-store rows to legacy dashboard rows
src/backtest/evaluation/init.py	Exposes evaluation module public API
pyproject.toml	Updates CLI deps and adds `exchange-calendars`
config/example.yaml	Documents validation policy configuration in example config
README.md	Documents new CLI options, validation policy semantics, and result source switch
Makefile	Adds test/coverage targets and simplifies run commands
DEVELOPMENT.md	Adds detailed runner/gate/continuity implementation notes
AGENTS.md	Updates contributor guidance and adds complexity guidelines
.github/workflows/release.yml	Normalizes GHCR owner casing via env var
.github/workflows/daily-backtest.yml	Moves secret presence checks to env vars to satisfy workflow condition constraints

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/backtest/results_cache.py

src/backtest/runner.py

src/backtest/evaluation/store.py

src/main.py

src/backtest/runner.py

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Continuity check always runs causing new skip conditions
- Data validation now only converts continuity ValueErrors into skip_job when a data-quality policy is configured, preserving prior behavior when data_quality is unset.

src/backtest/runner.py

…ates

…4344-data-collection-reliabilty-checks

Copilot

Pull request overview

Refactors the backtest runner into a mode-aware evaluation foundation, adding composable gate-based validations, a normalized SQLite-backed result store/evaluation cache, and config/CLI plumbing for evaluation_mode and validation policies.

Changes:

Added src/backtest/evaluation/ modules (contracts/evaluator/store/adapters) and dual-write persistence (legacy ResultsCache + new ResultStore).
Refactored BacktestRunner into staged gates (collection_validation, data_fetch, data_validation, data_preparation, strategy_optimization, strategy_validation) with continuity/data-quality + optimization feasibility policies.
Extended CLI/config/tests to support evaluation_mode, validation policies, and updated metrics (fresh_simulation_runs, fresh_metric_evals).

Reviewed changes

Copilot reviewed 22 out of 23 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/test_results_cache.py	Adds coverage for mode-aware cache isolation and evaluation_mode normalization.
tests/test_main_cli.py	Updates runner stubs for new runner kwargs; tests dashboard row source switch + CLI mode override validation.
tests/test_evaluation_store.py	Adds tests for new `EvaluationCache` and `ResultStore` behaviors (hashing, idempotency, repair paths).
tests/test_config.py	Adds tests for `evaluation_mode` and new `validation` parsing (data quality + optimization policy).
tests/test_backtest_runner.py	Expands runner tests for new gating pipeline, continuity scoring, and policy-driven behaviors.
src/main.py	Adds `--evaluation-mode`, fail-fast `walk_forward` handling, optional dashboard source switch, and updated summary/metrics output.
src/config.py	Adds validation config dataclasses + parsing/validation, and new `evaluation_mode` config field.
src/backtest/runner.py	Major refactor: mode-aware staged gates, continuity scoring, evaluator abstraction, dual-write persistence, and new metrics semantics.
src/backtest/results_cache.py	Adds mode-aware primary key fields, record model, migration improvements, and mode filtering.
src/backtest/evaluation/store.py	Introduces SQLite `EvaluationCache` and normalized `ResultStore` with identity repair/deduplication.
src/backtest/evaluation/evaluator.py	Adds `BacktestEvaluator` wrapper to standardize simulation + metric evaluation outcomes.
src/backtest/evaluation/contracts.py	Adds evaluation contracts (`EvaluationRequest`, `EvaluationOutcome`, etc.).
src/backtest/evaluation/adapters.py	Adds adapter to convert normalized rows to legacy dashboard/reporting row format.
src/backtest/evaluation/init.py	Exposes evaluation module public API.
pyproject.toml	Updates Typer/Click versions and adds `exchange-calendars`.
config/example.yaml	Documents new `validation` policies and per-collection overrides.
README.md	Documents new CLI flags, new metrics, and validation/optimization policy semantics.
Makefile	Adds test/coverage/precommit targets and adjusts docker-compose invocations.
DEVELOPMENT.md	Adds internal documentation for the new gate model and continuity scoring implementation details.
AGENTS.md	Updates contribution guidance (prefer `make` targets) and adds complexity guideline.
.github/workflows/release.yml	Normalizes GHCR owner casing for tagging.
.github/workflows/daily-backtest.yml	Adjusts conditional checkout of strategies repo using env indirection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/test_evaluation_store.py

src/backtest/results_cache.py

sonarqubecloud · 2026-03-23T03:14:03Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
2.3% Duplication on New Code

See analysis details on SonarQube Cloud

AlexanderPietsch added 16 commits February 25, 2026 17:24

test: fix backtest runner nan-metric test with min-bars guard

2e3b4b9

feat: add reliability thresholds and continuity scoring with skip pol…

7eb6902

…icies

chore: add coverage and precommit coverage make targets

02a6362

test: parametrize continuity and reliability threshold validation cases

23970dc

refactor: run fetch and reliability checks once per collection-symbol…

8dc6863

…-timeframe

fix: avoid runtime poetry install in container workflows

7a9c753

add explicit image refresh targets and dashboard port mapping

feat: add short cli summary

06dc07d

fix: make strategy validation explicitly detect missing candidate out…

73af2f8

…comes

refactor: improve runner readability and normalize strategy failure p…

f484365

…ayloads

feat: add collection-level reliability threshold overrides

c2ea867

feat: add reliability skip_collection policy with collection blacklis…

27e8265

…ting

test: add coverage for reliability skip_collection behavior

02cbb27

feat: add failed collections list to summary

8706b3a

feat: introduce mode-aware evaluation platform with dual-write stores…

cabbb23

… and backtest-ready gate templates Architectual refactor to prepare for Walkforward test

doc: Readme updated

8ec4867

github-code-quality bot found potential problems Mar 7, 2026

View reviewed changes

src/backtest/evaluation/evaluator.py Fixed Show fixed Hide fixed

src/backtest/results_cache.py Fixed Show fixed Hide fixed

src/backtest/results_cache.py Fixed Show fixed Hide fixed

cursor bot reviewed Mar 7, 2026

View reviewed changes

src/backtest/runner.py Show resolved Hide resolved

src/backtest/runner.py Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Mar 7, 2026

View reviewed changes

src/backtest/evaluation/store.py Show resolved Hide resolved

Fix cache failure counters and reliability on_fail merge

daf1758

cursor bot reviewed Mar 7, 2026

View reviewed changes

src/backtest/results_cache.py Outdated Show resolved Hide resolved

src/backtest/runner.py Show resolved Hide resolved

Fix mode-aware legacy cache key and plan gate handling

7b5dc3a

github-code-quality bot found potential problems Mar 7, 2026

View reviewed changes

src/backtest/results_cache.py Fixed Show fixed Hide fixed

src/backtest/results_cache.py Fixed Show fixed Hide fixed

cursor bot reviewed Mar 7, 2026

View reviewed changes

src/backtest/runner.py Show resolved Hide resolved

src/backtest/runner.py Outdated Show resolved Hide resolved

src/backtest/evaluation/adapters.py Outdated Show resolved Hide resolved

AlexanderPietsch and others added 2 commits March 7, 2026 15:09

fix: reduce cognitive complexity of _data_validation_common

2e1d87f

Fix evaluation stats enrichment and remove dead adapter

4adf723

cursor bot reviewed Mar 7, 2026

View reviewed changes

src/backtest/runner.py Show resolved Hide resolved

AlexanderPietsch added 2 commits March 7, 2026 15:18

refactor: extract strategy search helpers to simplify runner flow

29990d0

refactor: centralize summary json filename constant in main CLI

e703807

Copilot AI review requested due to automatic review settings March 19, 2026 10:03

AlexanderPietsch and others added 4 commits March 19, 2026 17:03

Potential fix for pull request finding

19e96c2

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: AlexanderPietsch <alexander.pietsch@vollcom-digital.de>

fix: normalize evaluation mode

57cf46c

fix: validate and repair result store identity index before upsert

8aa3342

Merge remote-tracking branch 'origin/feat/VD-4344-data-collection-rel…

bc220c5

…iabilty-checks' into feat/VD-4344-data-collection-reliabilty-checks

Copilot AI reviewed Mar 19, 2026

View reviewed changes

src/backtest/results_cache.py Outdated Show resolved Hide resolved

src/main.py Show resolved Hide resolved

src/backtest/results_cache.py Show resolved Hide resolved

Potential fix for pull request finding

c490a13

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: AlexanderPietsch <alexander.pietsch@vollcom-digital.de>

Copilot AI review requested due to automatic review settings March 19, 2026 10:14

AlexanderPietsch and others added 2 commits March 19, 2026 17:14

Potential fix for pull request finding

c530a20

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: AlexanderPietsch <alexander.pietsch@vollcom-digital.de>

Potential fix for pull request finding

16b5e26

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: AlexanderPietsch <alexander.pietsch@vollcom-digital.de>

Copilot AI reviewed Mar 19, 2026

View reviewed changes

src/main.py Show resolved Hide resolved

AGENTS.md Outdated Show resolved Hide resolved

AlexanderPietsch and others added 3 commits March 19, 2026 17:22

Revert "Potential fix for pull request finding"

4ab1086

This reverts commit c490a13.

Revert "Potential fix for pull request finding"

76e4944

This reverts commit c530a20.

Potential fix for pull request finding

233d313

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: AlexanderPietsch <alexander.pietsch@vollcom-digital.de>

Copilot AI review requested due to automatic review settings March 19, 2026 10:25

AlexanderPietsch added 3 commits March 19, 2026 17:26

fix: tighten results cache migration error handling

65e818d

test: cover result_store dashboard adapter path in run CLI

ba2cb86

Merge remote-tracking branch 'origin/feat/VD-4344-data-collection-rel…

b08c158

…iabilty-checks' into feat/VD-4344-data-collection-reliabilty-checks

Copilot AI reviewed Mar 19, 2026

View reviewed changes

src/backtest/results_cache.py Show resolved Hide resolved

src/backtest/runner.py Outdated Show resolved Hide resolved

src/backtest/evaluation/store.py Show resolved Hide resolved

src/main.py Show resolved Hide resolved

src/backtest/runner.py Show resolved Hide resolved

cursor bot reviewed Mar 19, 2026

View reviewed changes

src/backtest/runner.py Show resolved Hide resolved

cursoragent and others added 6 commits March 19, 2026 10:41

Guard continuity validation when data quality is unset

a17ca42

refactor: use sqlite Row mapping in results cache list_by_run

dc3549c

refactor: avoid exception-driven results cache column migrations

a20ddab

fix: use resolved close column for data fingerprint generation

06fae0b

fix: preserve latest rows when repairing result store identity duplic…

04f0ec7

…ates

Merge commit 'a17ca4283a80623cb4ccf08902d0e131945c30f1' into feat/VD-…

62f66b0

…4344-data-collection-reliabilty-checks

Copilot AI review requested due to automatic review settings March 19, 2026 11:26

Copilot AI reviewed Mar 19, 2026

View reviewed changes

tests/test_evaluation_store.py Show resolved Hide resolved

src/backtest/results_cache.py Show resolved Hide resolved

fix: normalize evaluation mode when persisting ResultsCacheRecord

84ded97

Conversation

AlexanderPietsch commented Mar 7, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

How to Test

Checklist (KISS)

Related Issues/Links

Uh oh!

gemini-code-assist bot commented Mar 7, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud bot commented Mar 23, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

AlexanderPietsch commented Mar 7, 2026 •

edited by cursor bot

Loading

cursor bot left a comment •

edited

Loading