Feat/vd 4344 data collection reliabilty checks#123
Feat/vd 4344 data collection reliabilty checks#123AlexanderPietsch wants to merge 103 commits intodevfrom
Conversation
…est runner - introduce typed gate/job/plan state models for run_all orchestration - make optimization skip routing plan-owned with explicit policy constraint application - switch reliability config/docs/tests to skip_job | skip_optimization and update stage expectations
add explicit image refresh targets and dashboard port mapping
… and backtest-ready gate templates Architectual refactor to prepare for Walkforward test
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant architectural refactor to the backtesting system, transforming it into a more flexible and extensible evaluation platform. The primary goal is to enhance the reliability and robustness of data processing and strategy evaluation by implementing a structured gate mechanism for validations and a new dual-write persistence layer. These changes are crucial for supporting advanced evaluation modes like walk-forward analysis in the future, while maintaining stability for existing reporting workflows. The refactor also improves the clarity of runtime metrics and streamlines the development experience by simplifying command execution. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This is a significant and well-executed refactoring of the backtest core. You've successfully introduced a mode-aware evaluation platform, a composable gate mechanism for validations, and a new persistence layer with dual-write capabilities. These changes establish a strong foundation for future extensions like walk-forward testing and enhance the system's reliability and maintainability. The code is well-structured, and the new test coverage is comprehensive. My review includes one suggestion to make the data retrieval logic in the new ResultStore more concise and robust.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: AlexanderPietsch <alexander.pietsch@vollcom-digital.de>
…iabilty-checks' into feat/VD-4344-data-collection-reliabilty-checks
There was a problem hiding this comment.
Pull request overview
Refactors the backtest pipeline into a mode-aware evaluation foundation with composable gating/validation, plus new persistence layers (evaluation cache + normalized result store) while maintaining legacy ResultsCache compatibility.
Changes:
- Introduces
src/backtest/evaluation/contracts, evaluator, cache, and result store; adds adapters to preserve legacy reporting rows. - Refactors
BacktestRunnerinto staged, composable gates (collection/data/plan/result) with data-quality + optimization feasibility policies and dual-write persistence. - Adds config/CLI plumbing for
evaluation_mode, new metrics naming (fresh_simulation_runs,fresh_metric_evals), and updates tests/docs/workflows accordingly.
Reviewed changes
Copilot reviewed 22 out of 23 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_results_cache.py | Adds coverage for mode/mode-hash isolation in legacy cache. |
| tests/test_main_cli.py | Updates CLI runner stubs for new runner kwargs and tests evaluation-mode flag. |
| tests/test_evaluation_store.py | Adds tests for evaluation cache hashing and result-store idempotent inserts. |
| tests/test_config.py | Adds parsing/validation tests for evaluation mode and validation policies. |
| tests/test_backtest_runner.py | Expands runner tests for continuity, calendars, gated skipping behaviors, and new metrics. |
| src/main.py | Adds --evaluation-mode, optional dashboard row source switch, and updated summary/metrics output. |
| src/config.py | Adds validation policy schema + parsing and evaluation_mode validation. |
| src/backtest/runner.py | Major refactor: gate pipeline, evaluator dispatch, data-quality continuity scoring, and dual-write persistence. |
| src/backtest/results_cache.py | Adds typed record input, mode-aware PK columns, and migration to new PK. |
| src/backtest/evaluation/store.py | Implements SQLite-based evaluation cache + normalized result store. |
| src/backtest/evaluation/evaluator.py | Adds evaluator abstraction with explicit outcome contract. |
| src/backtest/evaluation/contracts.py | Adds explicit request/outcome/mode contracts and result record type. |
| src/backtest/evaluation/adapters.py | Adds normalized-to-legacy row adapter for existing reporting. |
| src/backtest/evaluation/init.py | Exposes evaluation API surface. |
| pyproject.toml | Updates Typer/Click versions and adds exchange-calendars. |
| config/example.yaml | Documents new validation policy configuration. |
| README.md | Documents CLI/mode changes and new validation policies. |
| Makefile | Adds test/coverage targets and adjusts docker-compose commands. |
| DEVELOPMENT.md | Adds internal docs for new gate pipeline and continuity logic. |
| AGENTS.md | Updates contributor guidance and adds complexity rule. |
| .github/workflows/release.yml | Normalizes GHCR owner for tags. |
| .github/workflows/daily-backtest.yml | Adjusts strategy repo checkout conditions using env. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: AlexanderPietsch <alexander.pietsch@vollcom-digital.de>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: AlexanderPietsch <alexander.pietsch@vollcom-digital.de>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: AlexanderPietsch <alexander.pietsch@vollcom-digital.de>
There was a problem hiding this comment.
Pull request overview
Refactors the backtest runner into a mode-aware evaluation foundation (backtest active; walk-forward fail-fast), adding composable validation gates, a normalized evaluation cache + result store, and updated CLI/config plumbing for evaluation mode + reliability/optimization policies.
Changes:
- Introduces
src/backtest/evaluation/*(contracts, evaluator, SQLite cache/store, legacy row adapter) and dual-write persistence (legacyResultsCache+ newResultStore). - Refactors
BacktestRunnerinto staged gate decisions (collection/data/prep/strategy plan/result), adds continuity/reliability checks, and updates metrics counters. - Adds config + CLI support for
evaluation_modeand validation policies, plus new/updated tests across runner/cache/store/config/CLI.
Reviewed changes
Copilot reviewed 22 out of 23 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_results_cache.py | Adds regression tests for mode-aware ResultsCache keys + normalization. |
| tests/test_main_cli.py | Updates CLI tests for new runner ctor signature and --evaluation-mode validation. |
| tests/test_evaluation_store.py | Adds tests for new EvaluationCache hashing and ResultStore identity/idempotency. |
| tests/test_config.py | Adds coverage for evaluation_mode and new validation policy parsing/validation. |
| tests/test_backtest_runner.py | Extends runner tests for continuity scoring, gate actions, metrics, and dual-write behavior. |
| src/main.py | Adds --evaluation-mode, walk-forward fail-fast handling, summary output tweaks, and optional dashboard source switch. |
| src/config.py | Adds evaluation_mode + validation schema and parsing helpers for reliability/optimization policies. |
| src/backtest/runner.py | Major runner refactor: staged gates, continuity scoring, evaluator/cache/store plumbing, and updated metrics. |
| src/backtest/results_cache.py | Extends legacy cache schema with mode + mode-hash and adds typed ResultsCacheRecord. |
| src/backtest/evaluation/store.py | Implements SQLite-backed EvaluationCache and normalized ResultStore with identity index repair. |
| src/backtest/evaluation/evaluator.py | Adds BacktestEvaluator to centralize simulation + metric evaluation semantics. |
| src/backtest/evaluation/contracts.py | Defines evaluation contracts (EvaluationRequest/Outcome, mode config, normalized ResultRecord). |
| src/backtest/evaluation/adapters.py | Converts normalized store rows into legacy dashboard/report row shape. |
| src/backtest/evaluation/init.py | Exposes evaluation platform public API symbols. |
| pyproject.toml | Updates CLI deps (typer/click) and adds exchange-calendars. |
| config/example.yaml | Documents new validation policy knobs and per-collection overrides. |
| README.md | Documents new --evaluation-mode, metrics counters, and validation/optimization policy semantics. |
| Makefile | Updates run/test/coverage targets and adds image refresh helpers. |
| DEVELOPMENT.md | Adds internal design notes for gate flow and continuity score rules. |
| AGENTS.md | Updates contributor guidance and adds complexity guidelines. |
| .github/workflows/release.yml | Normalizes GHCR owner casing via env var for image tags. |
| .github/workflows/daily-backtest.yml | Fixes checkout condition by routing secrets through env vars. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…iabilty-checks' into feat/VD-4344-data-collection-reliabilty-checks
There was a problem hiding this comment.
Pull request overview
Refactors the backtest runner into a staged, mode-aware evaluation pipeline and adds new persistence layers to improve reliability checks and prepare for future walk_forward support.
Changes:
- Introduces evaluation contracts, evaluator abstraction, an evaluation cache, and a normalized result store (SQLite), with dual-write alongside the legacy
ResultsCache. - Redesigns runner orchestration into composable gate stages (collection/data/validation/prep/strategy plan/result validation) with new data-quality/optimization policies (continuity, min bars, etc.).
- Extends CLI/config plumbing for
evaluation_mode, adds dashboard row source switch, and updates metrics/summary outputs and tests.
Reviewed changes
Copilot reviewed 22 out of 23 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_results_cache.py | Adds coverage for mode-aware ResultsCache behavior and evaluation_mode normalization |
| tests/test_main_cli.py | Adds CLI coverage for dashboard row source switching + evaluation mode overrides |
| tests/test_evaluation_store.py | Adds tests for EvaluationCache hashing, ResultStore round-trip/idempotency/index repair |
| tests/test_config.py | Adds parsing/validation tests for evaluation_mode and validation policies |
| tests/test_backtest_runner.py | Expands runner tests for new gates, continuity scoring, policy behaviors, and metrics |
| src/main.py | Adds --evaluation-mode, result_store dashboard adapter, richer summary/metrics outputs |
| src/config.py | Adds validation policy dataclasses + YAML parsing/validation + evaluation_mode parsing |
| src/backtest/runner.py | Major refactor: gate pipeline, continuity scoring, optimization policy, evaluator + dual-write persistence |
| src/backtest/results_cache.py | Adds typed record wrapper + mode-aware PK/migrations + mode filtering |
| src/backtest/evaluation/store.py | Implements SQLite EvaluationCache and ResultStore with unique identity index + repair |
| src/backtest/evaluation/evaluator.py | Adds BacktestEvaluator contract for simulation + metric computation outcomes |
| src/backtest/evaluation/contracts.py | Adds evaluation contracts (requests/outcomes/mode config/result record) |
| src/backtest/evaluation/adapters.py | Adds adapter to map normalized result-store rows to legacy dashboard rows |
| src/backtest/evaluation/init.py | Exposes evaluation module public API |
| pyproject.toml | Updates CLI deps and adds exchange-calendars |
| config/example.yaml | Documents validation policy configuration in example config |
| README.md | Documents new CLI options, validation policy semantics, and result source switch |
| Makefile | Adds test/coverage targets and simplifies run commands |
| DEVELOPMENT.md | Adds detailed runner/gate/continuity implementation notes |
| AGENTS.md | Updates contributor guidance and adds complexity guidelines |
| .github/workflows/release.yml | Normalizes GHCR owner casing via env var |
| .github/workflows/daily-backtest.yml | Moves secret presence checks to env vars to satisfy workflow condition constraints |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Continuity check always runs causing new skip conditions
- Data validation now only converts continuity ValueErrors into skip_job when a data-quality policy is configured, preserving prior behavior when data_quality is unset.
…4344-data-collection-reliabilty-checks
There was a problem hiding this comment.
Pull request overview
Refactors the backtest runner into a mode-aware evaluation foundation, adding composable gate-based validations, a normalized SQLite-backed result store/evaluation cache, and config/CLI plumbing for evaluation_mode and validation policies.
Changes:
- Added
src/backtest/evaluation/modules (contracts/evaluator/store/adapters) and dual-write persistence (legacyResultsCache+ newResultStore). - Refactored
BacktestRunnerinto staged gates (collection_validation,data_fetch,data_validation,data_preparation,strategy_optimization,strategy_validation) with continuity/data-quality + optimization feasibility policies. - Extended CLI/config/tests to support
evaluation_mode, validation policies, and updated metrics (fresh_simulation_runs,fresh_metric_evals).
Reviewed changes
Copilot reviewed 22 out of 23 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_results_cache.py | Adds coverage for mode-aware cache isolation and evaluation_mode normalization. |
| tests/test_main_cli.py | Updates runner stubs for new runner kwargs; tests dashboard row source switch + CLI mode override validation. |
| tests/test_evaluation_store.py | Adds tests for new EvaluationCache and ResultStore behaviors (hashing, idempotency, repair paths). |
| tests/test_config.py | Adds tests for evaluation_mode and new validation parsing (data quality + optimization policy). |
| tests/test_backtest_runner.py | Expands runner tests for new gating pipeline, continuity scoring, and policy-driven behaviors. |
| src/main.py | Adds --evaluation-mode, fail-fast walk_forward handling, optional dashboard source switch, and updated summary/metrics output. |
| src/config.py | Adds validation config dataclasses + parsing/validation, and new evaluation_mode config field. |
| src/backtest/runner.py | Major refactor: mode-aware staged gates, continuity scoring, evaluator abstraction, dual-write persistence, and new metrics semantics. |
| src/backtest/results_cache.py | Adds mode-aware primary key fields, record model, migration improvements, and mode filtering. |
| src/backtest/evaluation/store.py | Introduces SQLite EvaluationCache and normalized ResultStore with identity repair/deduplication. |
| src/backtest/evaluation/evaluator.py | Adds BacktestEvaluator wrapper to standardize simulation + metric evaluation outcomes. |
| src/backtest/evaluation/contracts.py | Adds evaluation contracts (EvaluationRequest, EvaluationOutcome, etc.). |
| src/backtest/evaluation/adapters.py | Adds adapter to convert normalized rows to legacy dashboard/reporting row format. |
| src/backtest/evaluation/init.py | Exposes evaluation module public API. |
| pyproject.toml | Updates Typer/Click versions and adds exchange-calendars. |
| config/example.yaml | Documents new validation policies and per-collection overrides. |
| README.md | Documents new CLI flags, new metrics, and validation/optimization policy semantics. |
| Makefile | Adds test/coverage/precommit targets and adjusts docker-compose invocations. |
| DEVELOPMENT.md | Adds internal documentation for the new gate model and continuity scoring implementation details. |
| AGENTS.md | Updates contribution guidance (prefer make targets) and adds complexity guideline. |
| .github/workflows/release.yml | Normalizes GHCR owner casing for tagging. |
| .github/workflows/daily-backtest.yml | Adjusts conditional checkout of strategies repo using env indirection. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|




Summary
Refactors the backtest core into a mode-aware evaluation platform foundation (backtest-only active), including a full gate-mechanism redesign for composable validations, explicit evaluation contracts, evaluator/store abstractions, and dual-write persistence. The change prepares the system for future
walk_forwardsupport and richer validation families while keeping current reporting flows stable.Changes
src/backtest/evaluation/:contracts.py:EvaluationRequest,EvaluationOutcome,EvaluationModeConfig,ResultRecordevaluator.py:BacktestEvaluatorstore.py:EvaluationCache,ResultStoreadapters.py: normalized-to-legacy row adapters for reporting compatibilityBacktestRunner:common,backtest,walk_forwardhook)ValidationContext_compose_gate_decisions_get_evaluator()ResultsCache+ newResultStoreResultsCachecompatibility fields (evaluation_mode,mode_config_hash) and mode-aware get/set filtering.evaluation_modeconfig field with validation (backtest|walk_forward)--evaluation-modeCLI overridewalk_forwardcurrently fails fast at runtime (not implemented yet)EVALUATION_RESULTS_SOURCE=result_storefresh_simulation_runs,fresh_metric_evalsparam_evalsas compatibility alias in runner internalstests/test_evaluation_store.pyBreaking changes:
metrics.promnow surfacefresh_simulation_runs/fresh_metric_evalsnames.How to Test
make testsdocker-compose run --rm app bash -lc "poetry run python -m src.main run --config config/collections/crypto.yaml --evaluation-mode backtest"docker-compose run --rm app bash -lc "poetry run python -m src.main run --config config/collections/crypto.yaml --evaluation-mode walk_forward"EVALUATION_RESULTS_SOURCE=result_storeChecklist (KISS)
pre-commit run --all-files).envvalues are excludedRelated Issues/Links
https://vollcom-digital.atlassian.net/browse/VD-4349
https://vollcom-digital.atlassian.net/browse/VD-4344
Note
High Risk
Large refactor of
BacktestRunnerexecution flow plus new SQLite cache/store layers and config parsing can change backtest outcomes and persistence semantics. Also introduces new validation policies (data quality/optimization) that may skip jobs or disable optimization based on continuity/threshold checks.Overview
RCA: Runner orchestration, caching, and validation were tightly coupled and hard to extend, making reliability checks and future evaluation modes risky to add without destabilizing execution.
The Fix: Refactors backtesting into a staged gate-based pipeline with explicit evaluation contracts, adds optional
validation.data_qualityandvalidation.optimizationpolicies (including continuity scoring with optionalexchange-calendars), and introduces a mode-aware evaluation cache + normalizedResultStorewhile continuing to dual-write to the legacyResultsCachefor reporting compatibility.The Proof: Updates config/docs and adds new execution metrics (
fresh_simulation_runs,fresh_metric_evals), plusMakefiletargets fortests/coveragewith a--cov-fail-under=80gate to keep coverage strictly >80%.Telemetry Added: Emits structured gate logs (e.g.,
data_validation_gate,strategy_optimization_gate) and surfaces new per-run counters insummary.jsonandmetrics.prom; dashboard can optionally read normalized rows viaEVALUATION_RESULTS_SOURCE=result_store.Written by Cursor Bugbot for commit 84ded97. This will update automatically on new commits. Configure here.