Skip to content

fix: add retry logic, size guards, and quality gates to facts pipeline#52

Merged
madjin merged 1 commit intomainfrom
fix/resilient-facts-pipeline
Feb 10, 2026
Merged

fix: add retry logic, size guards, and quality gates to facts pipeline#52
madjin merged 1 commit intomainfrom
fix/resilient-facts-pipeline

Conversation

@madjin
Copy link
Copy Markdown
Contributor

@madjin madjin commented Feb 10, 2026

Summary

  • extract-facts.py: Add retry loop (2 attempts, 5s delay), completion token sanity check (<200 tokens = truncated), detailed failure logging with response preview, and debug sidecar file on final failure
  • aggregate-sources.py: Cap user_summaries at 200KB to prevent token explosion (2026-01-15 was 984KB/392K tokens), warn when estimated tokens >100K, fix missing import sys
  • extract_daily_facts.yml: Quality-gated daily.json permalink (only overwrite on success), workflow-level retry with 30s delay (4 total attempts combined with script retry), Discord alert on error stubs
  • Backfill: All 9 error stubs regenerated successfully — retry logic recovered 2 dates on second attempt (2026-01-26: JSON parse error, 2025-07-22: invalid control character)

Test plan

  • Verify all 9 previously-failed dates now have status: success (392 facts files, 0 errors)
  • Verify 2026-01-15 re-aggregation reduced tokens from 282K to 90K
  • Validate Python syntax for both scripts
  • Validate workflow YAML syntax
  • Next daily run exercises the full workflow (retry + quality gate + alert)

🤖 Generated with Claude Code

The facts extraction pipeline had zero retry logic, no response logging
on failure, and no aggregation size guards. This caused 9 error stubs
(2.3% of 393 files) to be committed and propagated downstream.

Pipeline resilience (extract-facts.py):
- Add retry loop (2 attempts, 5s delay) for all failure modes
- Add completion token sanity check (<200 = truncated, skip parsing)
- Log missing fields and response preview on validation failure
- Save raw LLM response to .debug/ sidecar on final failure

Aggregation guard (aggregate-sources.py):
- Cap user_summaries at 200KB to prevent token explosion
  (2026-01-15 was 984KB / 392K tokens, now capped at 200KB / ~50K)
- Warn when estimated_tokens > 100K
- Add missing import sys

Workflow improvements (extract_daily_facts.yml):
- Quality-gated daily.json permalink (only overwrite on success)
- Workflow-level retry with 30s delay (4 total attempts with script retry)
- Data quality alert to Discord on error stubs

Backfill results: 9/9 error stubs regenerated successfully.
Retry logic recovered 2 dates (2026-01-26, 2025-07-22) on second attempt.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 10, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/resilient-facts-pipeline

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@madjin madjin merged commit 9a18764 into main Feb 10, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant