VER-298: Refactor Stage 3 prompts: add verification fields and fix web search tools by quancao-ea · Pull Request #59 · PublicDataWorks/verdad

quancao-ea · 2026-02-09T02:57:52Z

Important

Refactors Stage 3 prompts and processing logic to include web-based verification and structured evidence, updating schemas and tests accordingly.

Behavior:
- Adds mandatory web-based verification workflow and structured verification evidence in analysis outputs.
- Introduces verification_status field with values: verified_false, verified_true, uncertain, insufficient_evidence.
Configuration:
- Adds a timeout setting for the external search service in .gemini/settings.json.
Breaking Changes:
- Removed older stage prompts and timestamped-transcription schema; Stage 3 processing restructured.
Prompts and Schemas:
- Updates prompts/stage_3/analysis_prompt.md, system_instruction.md, and output_schema.json for new verification logic.
Code Structure:
- Refactors Stage 3 into separate modules: executors.py, flows.py, tasks.py, and models.py.
- Removes src/processing_pipeline/stage_3.py and timestamped_transcription_generator.py.
Tests:
- Updates tests in test_stage_1.py and test_stage_3.py to reflect Stage 3 restructuring and new verification logic.

^{This description was created by}^{for 2639987. You can customize this summary. It will automatically update as commits are pushed.}

Summary by CodeRabbit

New Features
- Adds mandatory web-based verification workflow, structured verification evidence in outputs, and verification_status (verified_false, verified_true, uncertain, insufficient_evidence).
Configuration
- Adds a timeout setting for the external search service.
Breaking Changes
- Restructures Stage 3 processing and removes the older timestamped-transcription schema/workflow.
Tests
- Updates tests to align with the Stage 3 restructuring.

Add guidance for recognizing and correctly handling events that occurred after the model's training cutoff date, preventing false positives when web search results conflict with pre-training knowledge.

Remove tool tracking from CLI method and extract verification evidence from validated responses instead of API metadata. Streamline the analysis flow to focus on structured output validation.

Move all Stage 3 prompt files from the root prompts directory to a new stage_3 subdirectory for better organization and consistency with the project structure.

Split Stage 3 processing pipeline into separate modules for better organization and maintainability. Extract executor logic, flow definitions, and task functions into dedicated files while maintaining backward compatibility through __init__.py exports.

linear · 2026-02-09T02:57:56Z

VER-298 Refactor Stage 3 prompts: add verification fields and fix web search tools

coderabbitai · 2026-02-09T02:58:09Z

Walkthrough

Refactors Stage 3 into a package (executors, flows, tasks, models), adds verification_evidence and verification_status to prompts/schemas, removes legacy timestamped-transcription artifacts, updates imports/tests to the new layout, and adjusts .gemini/settings.json to include a searxng timeout.

Changes

Cohort / File(s)	Summary
Prompts & System `prompts/stage_3/analysis_prompt.md`, `prompts/stage_3/system_instruction.md`, `prompts/stage_3/output_schema.json`	Adds mandatory web-search verification workflow, introduces `verification_evidence` and `verification_status`, expands schema (breaking-news rules, self-review, evidence recording).
Removed Legacy Prompts `prompts/Stage_3_system_instruction.md`, `prompts/Timestamped_transcription_generation_output_schema.json`, `prompts/Timestamped_transcription_generation_prompt.md`	Deletes legacy Stage 3 system instruction and timestamped-transcription prompt/schema files.
Stage 3 Package (new) `src/processing_pipeline/stage_3/executors.py`, `src/processing_pipeline/stage_3/flows.py`, `src/processing_pipeline/stage_3/tasks.py`, `src/processing_pipeline/stage_3/models.py`, `src/processing_pipeline/stage_3/__init__.py`	Splits monolith into modules: Executor (Stage3Executor with Gemini CLI + Google grounding fallback), Flows (in_depth_analysis, reset hook), Tasks (Supabase/S3 interactions, processing), Models (Pydantic verification types); exports updated.
Removed Monolith `src/processing_pipeline/stage_3.py`	Removes previous monolithic Stage 3 implementation; functionality redistributed to package modules.
Tests `tests/processing_pipeline/test_stage_3.py`, `tests/processing_pipeline/test_stage_1.py`	Updates mocks/import paths to new nested module layout; removes tests tied to removed timestamped-transcription generator.
Stage 1 changes `src/processing_pipeline/stage_1/flows.py`, `src/processing_pipeline/stage_1/tasks.py`, `src/processing_pipeline/timestamped_transcription_generator.py`	Removes custom timestamped transcription generator and related tests; Stage 1 flows now use GeminiModel enum and Gemini transcription path.
Constants & Scripts `src/processing_pipeline/constants.py`, `src/scripts/import_prompts_to_db.py`	Updates prompt path lookups to `prompts/stage_/` and removes timestamped-transcription helper getters; updates PROMPT_MAPPING for Stage 3.
Configuration `.gemini/settings.json`	Adds `timeout: 60000` to searxng MCP server configuration.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Flow as in_depth_analysis
    participant Executor as Stage3Executor
    participant Supabase
    participant S3
    participant GeminiCLI as Gemini CLI
    participant GeminiSDK as Gemini SDK
    participant GoogleAPI as Google Search

    Client->>Flow: start in_depth_analysis
    Flow->>Supabase: fetch snippet & mark PROCESSING
    Flow->>S3: download audio file
    Flow->>Executor: run(gemini_key, model, audio, metadata, prompt_version)

    Executor->>GeminiCLI: analyze via CLI (custom search)
    alt CLI success
        GeminiCLI-->>Executor: analysis (streamed JSON)
    else CLI fails
        Executor->>GeminiSDK: upload audio
        Executor->>GoogleAPI: perform grounding search
        GoogleAPI-->>Executor: search results
        GeminiSDK-->>Executor: grounded analysis
    end

    Executor->>Executor: validate with Pydantic (Stage3Output)
    alt validation fails
        Executor->>GeminiSDK: restructure into required schema
        GeminiSDK-->>Executor: formatted JSON
    end

    Executor-->>Flow: analysis + verification_evidence
    Flow->>Supabase: update snippet with results
    Flow->>S3: delete local audio
    Flow-->>Client: complete

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

[f] VER-268: Add web search tool to Stage 3: In-depth Analysis #24 — Overlapping Stage 3 restructuring and grounding/verification metadata changes.
VER-277: Update prompts in Stage 3 to utilize Google Search result better #34 — Related changes to Stage 3 prompts/schemas for web verification and evidence structures.
Integrate Gemini 2.5 into the Analysis Pipeline #16 — Related Gemini integration and timestamped-transcription alterations.

Suggested reviewers

nhphong

Poem

🐰 I hopped through files and split the stack,
Carrots of schema in a tidy pack.
Searches and proofs I nibbled through,
Modular hops and tests updated too.
Verification crunch — a joyful chew!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.69% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately describes the main changes: refactoring Stage 3 prompts, adding verification fields, and fixing web search tools integration.
Linked Issues check	✅ Passed	All objectives from VER-298 are met: prompts reorganized to prompts/stage_3/, verification fields and status added to schema, web search tools integrated, Stage 3 refactored into modular components, and tests updated.
Out of Scope Changes check	✅ Passed	The PR removes custom timestamped transcription generator and related prompt files unrelated to Stage 3 verification. These appear justified by modernizing the pipeline architecture.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/stage-3-prompts-add-verification-fields-and-web-search

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Pylint (4.0.4)

src/processing_pipeline/constants.py

************* Module .pylintrc
.pylintrc:1:0: F0011: error while parsing the configuration: File contains no section headers.
file: '.pylintrc', line: 1
'disable=C0116\n' (config-parse-error)
[
{
"type": "convention",
"module": "src.processing_pipeline.constants",
"obj": "",
"line": 94,
"column": 0,
"endLine": null,
"endColumn": null,
"path": "src/processing_pipeline/constants.py",
"symbol": "line-too-long",
"message": "Line too long (101/100)",
"message-id": "C0301"
},
{
"type": "convention",
"module": "src.processing_pipeline.constants",
"obj": "",
"line": 98,
"column": 0,
"endLine": null,
"endColumn": null,
"path": "src/processing_pipeline/constants.py",
"symbol": "line-too-long",
"message": "Line too long (115/100)",
"message-id": "C0301"
},
{
"type": "convention",
"module"

... [truncated 7209 characters] ...

ini_timestamped_transcription_generation_prompt",
"line": 64,
"column": 0,
"endLine": 64,
"endColumn": 58,
"path": "src/processing_pipeline/constants.py",
"symbol": "missing-function-docstring",
"message": "Missing function or method docstring",
"message-id": "C0116"
},
{
"type": "warning",
"module": "src.processing_pipeline.constants",
"obj": "get_gemini_timestamped_transcription_generation_prompt",
"line": 65,
"column": 11,
"endLine": 65,
"endColumn": 85,
"path": "src/processing_pipeline/constants.py",
"symbol": "unspecified-encoding",
"message": "Using open without explicitly specifying an encoding",
"message-id": "W1514"
}
]

src/processing_pipeline/stage_1/flows.py

************* Module .pylintrc
.pylintrc:1:0: F0011: error while parsing the configuration: File contains no section headers.
file: '.pylintrc', line: 1
'disable=C0116\n' (config-parse-error)
[
{
"type": "convention",
"module": "src.processing_pipeline.stage_1.flows",
"obj": "",
"line": 41,
"column": 0,
"endLine": null,
"endColumn": null,
"path": "src/processing_pipeline/stage_1/flows.py",
"symbol": "line-too-long",
"message": "Line too long (116/100)",
"message-id": "C0301"
},
{
"type": "convention",
"module": "src.processing_pipeline.stage_1.flows",
"obj": "",
"line": 64,
"column": 0,
"endLine": null,
"endColumn": null,
"path": "src/processing_pipeline/stage_1/flows.py",
"symbol": "line-too-long",
"message": "Line too long (116/100)",
"message-id": "C0301"
},
{
"type": "convention",

... [truncated 13107 characters] ...

module": "src.processing_pipeline.stage_1.flows",
"obj": "regenerate_timestamped_transcript",
"line": 193,
"column": 0,
"endLine": 193,
"endColumn": 37,
"path": "src/processing_pipeline/stage_1/flows.py",
"symbol": "too-many-locals",
"message": "Too many local variables (16/15)",
"message-id": "R0914"
},
{
"type": "warning",
"module": "src.processing_pipeline.stage_1.flows",
"obj": "regenerate_timestamped_transcript",
"line": 216,
"column": 8,
"endLine": 216,
"endColumn": 10,
"path": "src/processing_pipeline/stage_1/flows.py",
"symbol": "redefined-builtin",
"message": "Redefining built-in 'id'",
"message-id": "W0622"
}
]

src/scripts/import_prompts_to_db.py

************* Module .pylintrc
.pylintrc:1:0: F0011: error while parsing the configuration: File contains no section headers.
file: '.pylintrc', line: 1
'disable=C0116\n' (config-parse-error)
[
{
"type": "convention",
"module": "import_prompts_to_db",
"obj": "",
"line": 6,
"column": 0,
"endLine": null,
"endColumn": null,
"path": "src/scripts/import_prompts_to_db.py",
"symbol": "line-too-long",
"message": "Line too long (111/100)",
"message-id": "C0301"
},
{
"type": "convention",
"module": "import_prompts_to_db",
"obj": "",
"line": 7,
"column": 0,
"endLine": null,
"endColumn": null,
"path": "src/scripts/import_prompts_to_db.py",
"symbol": "line-too-long",
"message": "Line too long (120/100)",
"message-id": "C0301"
},
{
"type": "convention",
"module": "import_prompts_to_db",

... [truncated 7167 characters] ...

essage-id": "R0912"
},
{
"type": "refactor",
"module": "import_prompts_to_db",
"obj": "import_prompts",
"line": 104,
"column": 0,
"endLine": 104,
"endColumn": 18,
"path": "src/scripts/import_prompts_to_db.py",
"symbol": "too-many-statements",
"message": "Too many statements (72/50)",
"message-id": "R0915"
},
{
"type": "convention",
"module": "import_prompts_to_db",
"obj": "main",
"line": 266,
"column": 0,
"endLine": 266,
"endColumn": 8,
"path": "src/scripts/import_prompts_to_db.py",
"symbol": "missing-function-docstring",
"message": "Missing function or method docstring",
"message-id": "C0116"
}
]

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to 9db4a22 in 16 seconds. Click for details.

Reviewed 2153 lines of code in 18 files
Skipped 0 files when reviewing.
Skipped posting 0 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

Workflow ID: wflow_BnW9dJuHfIFj6bsd

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

gemini-code-assist · 2026-02-09T02:58:26Z

Summary of Changes

Hello @quancao-ea, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Stage 3 disinformation analysis pipeline by introducing a more robust and transparent factual verification process. It refactors prompts to include detailed instructions for web search, mandates comprehensive documentation of verification evidence, and updates the output schema to capture this information. The underlying Python codebase has also been restructured for improved modularity and to support the new verification mechanisms, ensuring that analyses are grounded in current, verifiable information and explicitly account for the model's knowledge limitations.

Highlights

Prompt Refactoring and Organization: Stage 3 prompts, including system instructions, analysis prompts, and output schemas, have been moved into a new, dedicated prompts/stage_3 subdirectory. Obsolete timestamped transcription prompts and schemas have been removed.
Enhanced Factual Verification Protocol: The analysis_prompt.md now includes a detailed, two-step verification process using searxng_web_search and web_url_read tools. This protocol mandates comprehensive documentation of all search activities, including queries, results, source types, and direct excerpts, along with clear source prioritization guidelines.
Knowledge Cutoff and Breaking News Handling: New critical instructions have been added to guide the model on how to handle its knowledge cutoff, emphasizing that current web search results from reliable sources must override pre-training knowledge. A specific protocol for 'Breaking News and Recent Events' (within 72 hours) has been introduced, including confidence score caps for unverified recent claims.
New Verification Fields in Output Schema: The output schema for Stage 3 has been extended to include a verification_status within confidence_scores and a comprehensive verification_evidence object. This new object meticulously records all web searches performed, their intent, detailed results (URLs, source information, relevant excerpts), and a summary of verification findings.
Codebase Restructuring: The Python code for Stage 3 processing has been refactored from a single file (src/processing_pipeline/stage_3.py) into a more modular package structure under src/processing_pipeline/stage_3/, with separate modules for executors, flows, models, and tasks. This improves maintainability and clarity.
Web Search Tool Integration Update: The Stage3Executor now explicitly supports searxng_web_search for custom search operations, passing the SEARXNG_URL environment variable. The grounding_metadata now captures the detailed verification_evidence directly from the model's output.

Changelog

prompts/Stage_3_analysis_prompt.md
- Renamed to prompts/stage_3/analysis_prompt.md.
- Expanded 'Ensure Factual Accuracy' section with detailed two-step verification process using searxng_web_search and web_url_read.
- Added mandatory 'C.1 Verification Evidence Documentation' section, detailing how to record search queries, results, source types, and excerpts.
- Introduced 'Source Priority Guidelines' for web search results.
- Added critical 'C.2 Knowledge Cutoff and Post-Training Events' section, instructing the model to prioritize web search results over pre-training knowledge.
- Simplified 'Verification Requirement' statement.
- Added new 'H.1 Breaking News and Recent Events Protocol' for handling time-sensitive claims with confidence caps.
- Added a mandatory check to the self-review process, reinforcing that supporting web search results from tier-1/tier-2 sources should lead to a score of 0.
- Added new 'Common Error Check' items related to web search result interpretation and source reputation.
- Included 'Breaking News Verification Checklist' and 'Evidence Documentation Check' in the self-review section.
- Added 'M. Verification Evidence' section with a JSON schema example for documenting all web searches.
- Updated the output schema example within the prompt to include verification_evidence and verification_status fields.
- Updated instruction for verifying factual claims to explicitly mention web_url_read.
prompts/Stage_3_heuristics.md
- Renamed to prompts/stage_3/heuristics.md.
prompts/Stage_3_output_schema.json
- Renamed to prompts/stage_3/output_schema.json.
- Added verification_evidence to the list of required fields.
- Added verification_status field to the confidence_scores object with an enum of possible values.
- Added full schema definition for verification_evidence, including searches_performed and verification_summary.
prompts/Stage_3_system_instruction.md
- Removed old system instruction file.
prompts/Stage_4_output_schema.json
- Renamed to prompts/stage_4/output_schema.json.
prompts/Stage_4_review_prompt.md
- Renamed to prompts/stage_4/review_prompt.md.
prompts/Stage_4_system_instruction.md
- Renamed to prompts/stage_4/system_instruction.md.
prompts/Timestamped_transcription_generation_output_schema.json
- Removed old timestamped transcription output schema.
prompts/Timestamped_transcription_generation_prompt.md
- Removed old timestamped transcription generation prompt.
src/processing_pipeline/stage_3.py
- Removed old Stage 3 processing file, replaced by new package structure.
src/processing_pipeline/stage_3/init.py
- Added __init__.py to create stage_3 as a Python package.
- Imported Stage3Executor, in_depth_analysis, and various tasks from new submodules.
src/processing_pipeline/stage_3/executors.py
- Added new file containing the Stage3Executor class, refactored from the old stage_3.py.
- Modified run method to extract verification_evidence from model output and use it for grounding_metadata.
- Updated __analyze_with_custom_search to include SEARXNG_URL in environment variables for custom search.
- Adjusted __analyze_with_google_search_grounding to no longer return grounding_metadata directly, as it's now part of the model's JSON output.
src/processing_pipeline/stage_3/flows.py
- Added new file containing the Prefect flow in_depth_analysis and reset_snippet_status_hook, refactored from the old stage_3.py.
src/processing_pipeline/stage_3/tasks.py
- Added new file containing Prefect tasks (fetch_a_specific_snippet_from_supabase, fetch_a_new_snippet_from_supabase, download_audio_file_from_s3, update_snippet_in_supabase, get_metadata, analyze_snippet, process_snippet), refactored from the old stage_3.py.
- Updated update_snippet_in_supabase to correctly store verification_evidence in grounding_metadata.
src/processing_pipeline/stage_3_models.py
- Renamed to src/processing_pipeline/stage_3/models.py.
- Added verification_status field to the ConfidenceScores Pydantic model.
- Added new Pydantic models for SearchResult, SearchPerformed, VerificationSummary, and VerificationEvidence.
- Integrated the VerificationEvidence model into the main Stage3Output model.
tests/processing_pipeline/test_stage_3.py
- Updated import paths for SupabaseClient, postprocess_snippet, and process_snippet to reflect the new modular structure of the stage_3 package.

Ignored Files

Ignored by pattern: .gemini/** (1)
- .gemini/settings.json

Activity

The pull request author, quancao-ea, initiated significant refactoring of the Stage 3 prompts and associated Python code.
The core logic for Stage 3 analysis, including web search integration and output schema validation, was moved and reorganized into a new package structure.
New detailed instructions for factual verification, including handling of knowledge cutoff and breaking news, were added to the analysis prompt.
The data models and output schemas were updated to support the new verification evidence fields, ensuring structured reporting of fact-checking activities.
Test files were adjusted to align with the new module and package structure.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant and valuable refactoring of the Stage 3 processing pipeline. The changes enhance disinformation detection by adding detailed, structured verification instructions to the prompts, including guidance on using web search tools, documenting evidence, and handling knowledge cutoffs. The codebase is also improved by splitting the monolithic stage_3.py into a more modular structure with separate files for executors, flows, tasks, and models, which improves maintainability.

My review has identified a critical inconsistency in the new prompts that will likely cause schema validation errors, and a medium-severity issue where potentially useful debugging information from the Gemini SDK is being discarded. Addressing these points will help ensure the new pipeline is robust and reliable.

gemini-code-assist · 2026-02-09T03:00:53Z

+        if not response.text:
+            finish_reason = response.candidates[0].finish_reason if response.candidates else None
+
+            if finish_reason == FinishReason.MAX_TOKENS:
+                raise ValueError("The response from Gemini was too long and was cut off in step 1.")
+
+            print(f"Response finish reason: {finish_reason}")
+            raise ValueError("No response from Gemini in step 1.")
+
+        return {
+            "text": response.text,
+            "thought_summaries": thoughts,
+        }


The grounding_metadata available from the Google Search tool in the SDK response is being discarded here. While the new approach relies on the LLM generating the verification_evidence field, the structured grounding metadata from the API can be very valuable for logging, debugging, or as a fallback.

Consider capturing and returning this metadata. The calling run method could then decide whether to log it or handle it otherwise.

grounding_metadata = ( response.candidates[0].grounding_metadata.model_dump_json(indent=2) if response.candidates else None ) if not response.text: finish_reason = response.candidates[0].finish_reason if response.candidates else None if finish_reason == FinishReason.MAX_TOKENS: raise ValueError("The response from Gemini was too long and was cut off in step 1.") print(f"Response finish reason: {finish_reason}") raise ValueError("No response from Gemini in step 1.") return { "text": response.text, "thought_summaries": thoughts, "grounding_metadata_sdk": grounding_metadata, }

coderabbitai

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

tests/processing_pipeline/test_stage_3.py (2)
265-277: ⚠️ Potential issue | 🔴 Critical

Test test_stage_3_executor mismatches the actual Stage3Executor.run() signature and return type.

Two issues:

Missing prompt_version parameter (line 265–270): Stage3Executor.run() requires prompt_version: dict, but the test call omits it — this will raise a TypeError.

Return type mismatch (lines 273–276): The test asserts result is a tuple of length 2, but run() returns a dict with keys "response", "grounding_metadata", "thought_summaries". The assertion will fail.
Proposed fix sketch
         result = Stage3Executor.run(
             gemini_key="test-key",
             model_name=GeminiModel.GEMINI_FLASH_LATEST,
             audio_file="test.mp3",
             metadata={"test": "metadata"},
+            prompt_version={"user_prompt": "test", "system_instruction": "test", "output_schema": {}},
         )

-        # Result should be a tuple (response, grounding_metadata)
-        assert isinstance(result, tuple)
-        assert len(result) == 2
-        response, grounding_metadata = result
-        assert isinstance(response, dict)
-        assert grounding_metadata is not None
+        # Result should be a dict with response, grounding_metadata, thought_summaries
+        assert isinstance(result, dict)
+        assert "response" in result
+        assert "grounding_metadata" in result
+        assert "thought_summaries" in result
153-160: ⚠️ Potential issue | 🔴 Critical

Update mocks to return dict instead of tuple—Stage3Executor.run() returns {"response": ..., "grounding_metadata": ..., "thought_summaries": ...}, not a tuple.

The mocks at lines 158, 188, and 349 set mock_run.return_value = (mock_gemini_response, "test_grounding_metadata") (tuple), but Stage3Executor.run() returns a dict. This breaks analyze_snippet() which unpacks the return value with **analyzing_response and later accesses dict keys like analyzing_response["response"] and analyzing_response["thought_summaries"].

Update all three mocks to:
mock_run.return_value = {
    "response": mock_gemini_response,
    "grounding_metadata": "test_grounding_metadata",
    "thought_summaries": []
}

🤖 Fix all issues with AI agents

In `@prompts/stage_3/analysis_prompt.md`:
- Around line 105-118: Add a language identifier to the fenced code block that
contains the workflow example starting with searxng_web_search("Maduro captured
US forces 2026") and the subsequent web_url_read and verification_evidence
entries; change the opening fence from ``` to ```text (or ```plaintext) so the
block is lint-compliant and renders correctly.
- Around line 993-1003: The prose mandates that "publication_date" and "title"
are required but the JSON Schema's "required" array omits them; update the
schema by adding "publication_date" and "title" to the "required" array so the
schema and prose align, and either remove or document the "content_fetched"
property in the prose (or explicitly mark it optional in prose) to resolve the
mismatch between schema and documentation.
- Around line 317-324: The Breaking News table's `verification_status` entries
use enums that don't match the JSON schema; update the table rows so
`verification_status` only uses the schema's allowed values ("verified_false",
"verified_true", "uncertain", "insufficient_evidence") and replace the current
entries `VERIFIED_FALSE`, `PARTIALLY_VERIFIABLE`, `UNVERIFIABLE_BREAKING`,
`UNVERIFIABLE_RECENT`, and `UNVERIFIABLE_STALE` accordingly in the table under
the `verification_status` column in prompts/stage_3/analysis_prompt.md so the
model outputs valid enum values.

In `@prompts/stage_3/output_schema.json`:
- Around line 16-17: The schema currently requires "thought_summaries", which
forces Gemini in __structure_with_schema (executors.py) to fabricate that field
when the original analysis lacks it; remove "thought_summaries" from the
required array in prompts/stage_3/output_schema.json so it becomes optional,
leaving the property in properties if needed, and rely on run() and
thought_summaries_from_api to populate summaries separately; update any
comments/tests that assumed it was required.

In `@prompts/stage_3/system_instruction.md`:
- Line 29: The wording "maximum score is 30%" and "(20% for claims within 24
hours)" is ambiguous given confidence_scores.overall is an integer 0–100; update
the text in system_instruction.md to state explicit integer values/out-of-100
wording (e.g., "maximum score is 30 (out of 100)" and "(20 out of 100 for claims
within 24 hours)") so the policy refers to confidence_scores.overall
unambiguously.

In `@src/processing_pipeline/stage_3/executors.py`:
- Around line 161-166: The CLI env dict currently hardcodes
os.environ["GOOGLE_GEMINI_KEY"] causing mismatch when run() receives a different
gemini_key; update the flow to pass the gemini_key parameter into
__analyze_with_custom_search (add gemini_key to its signature and all callers
from run()), and use that gemini_key when building the env dict (set
"GEMINI_API_KEY" from the gemini_key parameter instead of
os.environ["GOOGLE_GEMINI_KEY"]) so the SDK client and CLI subprocesss use the
same credentials.
- Around line 128-131: The decorators are in the wrong order for the methods
__analyze_with_custom_search and __analyze_with_google_search_grounding: move
`@classmethod` to be the topmost decorator (apply it last) and place
`@optional_task`(log_prints=True, retries=3) below it so Prefect's task wrapper
receives a plain function; swap the two decorators for both named methods
accordingly.
- Around line 76-82: The code constructs user_prompt_with_file using
os.path.basename(audio_file) which relies on the subprocess's working directory
to find the file; instead, pass an unambiguous path to the Gemini CLI or ensure
the subprocess's cwd is explicitly set. Update the call site that builds
user_prompt_with_file in __analyze_with_custom_search (and any caller like
download_audio_file_from_s3) to use the full audio_file path (not
os.path.basename) and modify the subprocess.run invocation in the Gemini CLI
executor to set cwd to the directory containing audio_file if you must pass a
basename; this ensures the CLI can always locate the file regardless of working
directory changes. Ensure references to audio_file, user_prompt_with_file,
download_audio_file_from_s3, and subprocess.run are updated consistently.

In `@src/processing_pipeline/stage_3/flows.py`:
- Around line 65-78: The downloaded local file returned by
download_audio_file_from_s3 is only removed on the happy path, so if
process_snippet (or any code between download and delete) raises the file leaks;
wrap the snippet processing and deletion in a try/finally around the block that
calls process_snippet (and mirror the change in the alternate branch that also
deletes local_file) so that os.remove(local_file) always runs in the finally,
referencing the local_file variable, the process_snippet(...) call, and
download_audio_file_from_s3(...) to locate the code to change.

In `@src/processing_pipeline/stage_3/tasks.py`:
- Around line 219-221: The except Exception block in process_snippet currently
swallows failures; change it to re-raise after updating status: catch Exception
as e, attempt to call supabase_client.set_snippet_status(snippet["id"],
ProcessingStatus.ERROR, str(e)) inside its own try/except so failures updating
Supabase are logged but do not suppress the original error, then re-raise the
original exception (raise) so the orchestrator sees the task as failed;
reference process_snippet, supabase_client.set_snippet_status,
ProcessingStatus.ERROR, local_file, and snippet in your changes.
- Around line 96-129: In __get_metadata: replace the fragile
.encode("latin-1").decode("unicode-escape") approach with a safe decode path
that first checks transcription type and attempts decoding via
codecs.decode(..., "unicode_escape") (or use a no-op if decoding raises) while
preserving the original transcription on error; use metadata.pop("start_time",
None), metadata.pop("end_time", None), metadata.pop("explanation", None) and
metadata.pop("keywords_detected", None) instead of del to avoid KeyError; and
parse recorded_at with a tolerant ISO parser (e.g.,
datetime.fromisoformat(snippet["recorded_at"].replace("Z", "+00:00")) or
dateutil.parser.parse) before formatting so timestamps with "Z" or different
offsets don't break. Ensure references to metadata["transcription"],
metadata.pop(...) and recorded_at parsing are updated in __get_metadata.

🧹 Nitpick comments (7)

src/processing_pipeline/stage_3/executors.py (4)
83-84: Unused exception variable e.

The caught RuntimeError is assigned to e but never referenced in the except block (e.g., for logging). Either log it or drop the binding.
Proposed fix
-        except RuntimeError as e:
-            print("Falling back to Google Search grounding with SDK...")
+        except RuntimeError:
+            print("Falling back to Google Search grounding with SDK...")
Or, better, log the original error for debuggability:
-        except RuntimeError as e:
-            print("Falling back to Google Search grounding with SDK...")
+        except RuntimeError as e:
+            print(f"CLI failed ({e}), falling back to Google Search grounding with SDK...")
196-201: Silently swallowing JSON parse errors may hide CLI issues.

Lines 200–201 catch json.JSONDecodeError with a bare pass, so malformed CLI output lines are silently ignored. Consider at least logging a debug message for troubleshooting.
Proposed fix
                 except json.JSONDecodeError:
-                    pass
+                    print(f"Skipping non-JSON CLI output line: {line[:200]}")
296-309: Restructuring step uses a hardcoded model (GEMINI_FLASH_LATEST) instead of the caller's model_name.

__structure_with_schema always uses GeminiModel.GEMINI_FLASH_LATEST (line 299) rather than the model_name the caller selected. If this is intentional (cheaper model for JSON restructuring), a brief comment would help. Otherwise, consider threading through the model name or making it configurable.

29-40: Consider whether a class with only @classmethod methods is the right abstraction.

Stage3Executor has no instance state — every method is a @classmethod. A plain module with top-level functions (or a namespace class without instantiation) would be simpler and avoid the decorator-ordering pitfalls with @classmethod + @optional_task.
src/processing_pipeline/stage_3/flows.py (1)
60-60: id shadows the Python built-in.

Using id as a loop variable shadows the built-in id() function. Consider renaming to snippet_id.
Proposed fix
-        for id in snippet_ids:
-            snippet = fetch_a_specific_snippet_from_supabase(supabase_client, id)
+        for snippet_id in snippet_ids:
+            snippet = fetch_a_specific_snippet_from_supabase(supabase_client, snippet_id)
prompts/stage_3/analysis_prompt.md (1)

575-609: Verification evidence template references claims_supported count in the summary section — it's missing.

The verification_summary template (Lines 601–606) includes claims_contradicted and claims_unverifiable, but not claims_supported or claims_verified. Yet Line 168 describes a scenario where results support a claim. A claims_supported count would complete the picture and make the summary internally consistent with the scoring framework.

This also applies to the schema at Lines 1009–1017, which similarly lacks a claims_supported field.
src/processing_pipeline/stage_3/tasks.py (1)
132-181: DRY: Stage3Executor.run(...) call is repeated three times with identical arguments.

The same call with the same four keyword arguments appears at Lines 139–145, 153–159, and 171–177. Extract a helper to reduce duplication and make future changes less error-prone.
Suggested refactor
 `@optional_task`(log_prints=True)
 def analyze_snippet(gemini_key, audio_file, metadata, prompt_version: dict):
     main_model = GeminiModel.GEMINI_2_5_PRO
     fallback_model = GeminiModel.GEMINI_FLASH_LATEST
 
+    def _run(model):
+        return {
+            **Stage3Executor.run(
+                gemini_key=gemini_key,
+                model_name=model,
+                audio_file=audio_file,
+                metadata=metadata,
+                prompt_version=prompt_version,
+            ),
+            "analyzed_by": model,
+        }
+
     try:
         print(f"Attempting analysis with {main_model}")
-        analyzing_response = Stage3Executor.run(
-            gemini_key=gemini_key,
-            model_name=main_model,
-            audio_file=audio_file,
-            metadata=metadata,
-            prompt_version=prompt_version,
-        )
-        return {
-            **analyzing_response,
-            "analyzed_by": main_model,
-        }
+        return _run(main_model)
     except errors.ServerError as e:
         print(f"Server error with {main_model} (code {e.code}): {e.message}")
         print(f"Falling back to {fallback_model}")
-        analyzing_response = Stage3Executor.run(
-            gemini_key=gemini_key,
-            model_name=fallback_model,
-            audio_file=audio_file,
-            metadata=metadata,
-            prompt_version=prompt_version,
-        )
-        return {
-            **analyzing_response,
-            "analyzed_by": fallback_model,
-        }
+        return _run(fallback_model)
     except errors.ClientError as e:
         if e.code in [HTTPStatus.UNAUTHORIZED, HTTPStatus.FORBIDDEN]:
             print(f"Auth error with {main_model} (code {e.code}): {e.message}")
             raise
         else:
             print(f"Client error with {main_model} (code {e.code}): {e.message}")
             print(f"Falling back to {fallback_model}")
-            analyzing_response = Stage3Executor.run(
-                gemini_key=gemini_key,
-                model_name=fallback_model,
-                audio_file=audio_file,
-                metadata=metadata,
-                prompt_version=prompt_version,
-            )
-            return {
-                **analyzing_response,
-                "analyzed_by": fallback_model,
-            }
+            return _run(fallback_model)

coderabbitai · 2026-02-09T03:04:03Z

+  ```
+  1. searxng_web_search("Maduro captured US forces 2026")
+     -> Found: https://reuters.com/article/..., https://apnews.com/...
+
+  2. web_url_read("https://reuters.com/article/...")
+     -> Extract: "Reuters reports that as of [date], Venezuelan President Nicolás Maduro remains in power..."
+
+  3. Document in verification_evidence:
+     - url: "https://reuters.com/article/..."
+     - source_name: "Reuters"
+     - source_type: "tier1_wire_service"
+     - relevant_excerpt: "[exact quote from article]"
+     - relevance_to_claim: "contradicts_claim"
+  ```


⚠️ Potential issue | 🟡 Minor

Missing language identifier on fenced code block.

The code block at Line 105 has no language specified. Since this block illustrates a workflow example (not a specific language), adding a language like text or plaintext would satisfy linting and improve rendering.

Suggested fix

- ``` + ```text 1. searxng_web_search("Maduro captured US forces 2026")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

```

1. searxng_web_search("Maduro captured US forces 2026")

-> Found: https://reuters.com/article/..., https://apnews.com/...

2. web_url_read("https://reuters.com/article/...")

-> Extract: "Reuters reports that as of [date], Venezuelan President Nicolás Maduro remains in power..."

3. Document in verification_evidence:

- url: "https://reuters.com/article/..."

- source_name: "Reuters"

- source_type: "tier1_wire_service"

- relevant_excerpt: "[exact quote from article]"

- relevance_to_claim: "contradicts_claim"

```

🧰 Tools

🪛 markdownlint-cli2 (0.20.0)

[warning] 105-105: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

In `@prompts/stage_3/analysis_prompt.md` around lines 105 - 118, Add a language identifier to the fenced code block that contains the workflow example starting with searxng_web_search("Maduro captured US forces 2026") and the subsequent web_url_read and verification_evidence entries; change the opening fence from ``` to ```text (or ```plaintext) so the block is lint-compliant and renders correctly.

coderabbitai · 2026-02-09T03:04:04Z

+        "thought_summaries",
+        "verification_evidence"


⚠️ Potential issue | 🟡 Minor

thought_summaries being required in output_schema may cause issues in the restructuring fallback.

In executors.py __structure_with_schema, this schema is passed to Gemini to restructure free-text analysis into JSON. The original analysis text may not contain reasoning-process summaries, forcing the model to fabricate a thought_summaries value. Consider making it optional in this schema (remove from required) since run() already handles thought summaries separately via thought_summaries_from_api.

🤖 Prompt for AI Agents

In `@prompts/stage_3/output_schema.json` around lines 16 - 17, The schema currently requires "thought_summaries", which forces Gemini in __structure_with_schema (executors.py) to fabricate that field when the original analysis lacks it; remove "thought_summaries" from the required array in prompts/stage_3/output_schema.json so it becomes optional, leaving the property in properties if needed, and rely on run() and thought_summaries_from_api to populate summaries separately; update any comments/tests that assumed it was required.

coderabbitai · 2026-02-09T03:04:04Z

+    @optional_task(log_prints=True, retries=3)
+    @classmethod
+    def __analyze_with_custom_search(
+        cls,


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Find the definition of optional_task rg "def optional_task" --type py -A 5

Repository: PublicDataWorks/verdad

Length of output: 408

🏁 Script executed:

# Search for imports of optional_task in the file rg "optional_task|from|import" src/processing_pipeline/stage_3/executors.py | head -20

Repository: PublicDataWorks/verdad

Length of output: 914

🏁 Script executed:

# Check the full context around lines 128-131 and 217-218 sed -n '125,135p' src/processing_pipeline/stage_3/executors.py sed -n '214,224p' src/processing_pipeline/stage_3/executors.py

Repository: PublicDataWorks/verdad

Length of output: 740

🌐 Web query:

Prefect @task decorator classmethod compatibility Python

💡 Result:

Prefect’s @task is compatible with @classmethod / @staticmethod only if the method decorator is applied on top of @task (i.e., @task must see a plain function, not a classmethod/staticmethod object). Prefect will raise a TypeError otherwise: “@classmethod should be applied on top of @task``” / same for @staticmethod. [1]

from prefect import task class MyTasks: `@classmethod` `@task` def build(cls, x: int) -> int: return x + 1 `@staticmethod` `@task` def clean(x: int) -> int: return x * 2

Notes:

For flows, Prefect explicitly supports decorating instance/class/static methods (example shown in docs). [2]

For tasks, follow the decorator ordering rule above to avoid the TypeError. [1]

Sources: Prefect task decorator implementation/API reference [1]; Prefect workflow docs showing class/static methods with decorators [2].

🏁 Script executed:

# Verify all instances of `@optional_task` with `@classmethod` combo rg "@optional_task" -A 1 src/processing_pipeline/stage_3/executors.py

Repository: PublicDataWorks/verdad

Length of output: 196

🏁 Script executed:

# Check utils.py to understand optional_task full implementation sed -n '1,50p' src/utils.py | head -30

Repository: PublicDataWorks/verdad

Length of output: 1126

Incorrect decorator ordering: @optional_task before @classmethod will cause a runtime TypeError.

Decorators are applied bottom-up. Currently, @classmethod is applied first (creating a descriptor), then @optional_task wraps that descriptor. Prefect's internal @task decorator requires a plain function, not a classmethod descriptor, and will raise TypeError: "@classmethod should be applied on top of @task" at runtime.

Swap the order on lines 128–129 and 217–218 so @classmethod is applied last:

Required fix

- `@optional_task`(log_prints=True, retries=3) - `@classmethod` + `@classmethod` + `@optional_task`(log_prints=True, retries=3) def __analyze_with_custom_search(

And similarly on lines 217–218 for __analyze_with_google_search_grounding.

🤖 Prompt for AI Agents

In `@src/processing_pipeline/stage_3/executors.py` around lines 128 - 131, The decorators are in the wrong order for the methods __analyze_with_custom_search and __analyze_with_google_search_grounding: move `@classmethod` to be the topmost decorator (apply it last) and place `@optional_task`(log_prints=True, retries=3) below it so Prefect's task wrapper receives a plain function; swap the two decorators for both named methods accordingly.

coderabbitai · 2026-02-09T03:04:04Z

+        env = {
+            "PATH": os.environ.get("PATH", ""),
+            "HOME": os.environ.get("HOME", ""),
+            "GEMINI_API_KEY": os.environ["GOOGLE_GEMINI_KEY"],
+            "GEMINI_SYSTEM_MD": system_instruction_path,
+            "SEARXNG_URL": os.environ.get("SEARXNG_URL", ""),


⚠️ Potential issue | 🟠 Major

API key inconsistency: gemini_key parameter vs. os.environ["GOOGLE_GEMINI_KEY"].

run() accepts gemini_key as a parameter and uses it to construct the SDK client (line 62), but the CLI env dict hardcodes os.environ["GOOGLE_GEMINI_KEY"] (line 164). If a caller passes a different key, the CLI and SDK paths will use different credentials. Either thread gemini_key through or document that the CLI always uses the env var.

Proposed fix — use the parameter

env = { "PATH": os.environ.get("PATH", ""), "HOME": os.environ.get("HOME", ""), - "GEMINI_API_KEY": os.environ["GOOGLE_GEMINI_KEY"], + "GEMINI_API_KEY": gemini_key, "GEMINI_SYSTEM_MD": system_instruction_path, "SEARXNG_URL": os.environ.get("SEARXNG_URL", ""), }

Note: gemini_key is not currently available inside __analyze_with_custom_search because it's not passed as a parameter. You'll need to add it to the method signature and pass it from run().

🤖 Prompt for AI Agents

In `@src/processing_pipeline/stage_3/executors.py` around lines 161 - 166, The CLI env dict currently hardcodes os.environ["GOOGLE_GEMINI_KEY"] causing mismatch when run() receives a different gemini_key; update the flow to pass the gemini_key parameter into __analyze_with_custom_search (add gemini_key to its signature and all callers from run()), and use that gemini_key when building the env dict (set "GEMINI_API_KEY" from the gemini_key parameter instead of os.environ["GOOGLE_GEMINI_KEY"]) so the SDK client and CLI subprocesss use the same credentials.

coderabbitai · 2026-02-09T03:04:04Z

+                local_file = download_audio_file_from_s3(s3_client, R2_BUCKET_NAME, snippet["file_path"])
+
+                # Process the snippet
+                process_snippet(
+                    supabase_client,
+                    snippet,
+                    local_file,
+                    GEMINI_KEY,
+                    skip_review=skip_review,
+                    prompt_version=prompt_version,
+                )
+
+                print(f"Delete the downloaded snippet clip: {local_file}")
+                os.remove(local_file)


⚠️ Potential issue | 🟠 Major

Downloaded audio file leaks if process_snippet raises.

os.remove(local_file) on lines 78 and 97 runs only on the happy path. If process_snippet (or anything before the os.remove) throws, the file remains on disk. Wrap in try/finally:

Proposed fix (for the snippet_ids branch; apply analogously to the else branch)

local_file = download_audio_file_from_s3(s3_client, R2_BUCKET_NAME, snippet["file_path"]) - # Process the snippet - process_snippet( - supabase_client, - snippet, - local_file, - GEMINI_KEY, - skip_review=skip_review, - prompt_version=prompt_version, - ) - - print(f"Delete the downloaded snippet clip: {local_file}") - os.remove(local_file) + try: + # Process the snippet + process_snippet( + supabase_client, + snippet, + local_file, + GEMINI_KEY, + skip_review=skip_review, + prompt_version=prompt_version, + ) + finally: + print(f"Delete the downloaded snippet clip: {local_file}") + os.remove(local_file)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

local_file = download_audio_file_from_s3(s3_client, R2_BUCKET_NAME, snippet["file_path"])

# Process the snippet

process_snippet(

supabase_client,

snippet,

local_file,

GEMINI_KEY,

skip_review=skip_review,

prompt_version=prompt_version,

)

print(f"Delete the downloaded snippet clip: {local_file}")

os.remove(local_file)

local_file = download_audio_file_from_s3(s3_client, R2_BUCKET_NAME, snippet["file_path"])

try:

# Process the snippet

process_snippet(

supabase_client,

snippet,

local_file,

GEMINI_KEY,

skip_review=skip_review,

prompt_version=prompt_version,

)

finally:

print(f"Delete the downloaded snippet clip: {local_file}")

os.remove(local_file)

🤖 Prompt for AI Agents

In `@src/processing_pipeline/stage_3/flows.py` around lines 65 - 78, The downloaded local file returned by download_audio_file_from_s3 is only removed on the happy path, so if process_snippet (or any code between download and delete) raises the file leaks; wrap the snippet processing and deletion in a try/finally around the block that calls process_snippet (and mirror the change in the alternate branch that also deletes local_file) so that os.remove(local_file) always runs in the finally, referencing the local_file variable, the process_snippet(...) call, and download_audio_file_from_s3(...) to locate the code to change.

coderabbitai · 2026-02-09T03:04:04Z

+def __get_metadata(snippet):
+    snippet_uuid = snippet["id"]
+    flagged_snippets = snippet["stage_1_llm_response"]["detection_result"]["flagged_snippets"]
+    metadata = {}
+    for flagged_snippet in flagged_snippets:
+        if flagged_snippet["uuid"] == snippet_uuid:
+            metadata = flagged_snippet
+            try:
+                # Handle escaped unicode characters in the transcription
+                metadata["transcription"] = flagged_snippet["transcription"].encode("latin-1").decode("unicode-escape")
+            except (UnicodeError, AttributeError) as e:
+                # Fallback to original transcription if decoding fails
+                print(f"Warning: Failed to decode transcription: {e}")
+                metadata["transcription"] = flagged_snippet["transcription"]
+
+    audio_file = snippet["audio_file"]
+    recorded_at = datetime.strptime(snippet["recorded_at"], "%Y-%m-%dT%H:%M:%S+00:00")
+    audio_file["recorded_at"] = recorded_at.strftime("%B %-d, %Y %-I:%M %p")
+    audio_file["recording_day_of_week"] = recorded_at.strftime("%A")
+    audio_file["time_zone"] = "UTC"
+    metadata["additional_info"] = audio_file
+
+    del metadata["start_time"]
+    del metadata["end_time"]
+
+    # TODO: Add these fields back once we've fixed the pipeline
+    del metadata["explanation"]
+    del metadata["keywords_detected"]
+
+    metadata["start_time"] = snippet["start_time"].split(":", 1)[1]
+    metadata["end_time"] = snippet["end_time"].split(":", 1)[1]
+    metadata["duration"] = snippet["duration"].split(":", 1)[1]
+
+    return metadata


⚠️ Potential issue | 🟡 Minor

Fragile encoding hack and unsafe del on keys that may not exist.

Several concerns in __get_metadata:

Line 105: .encode("latin-1").decode("unicode-escape") — This will throw UnicodeEncodeError for any non-Latin-1 character (e.g., Arabic, CJK), which would be caught but could mask data issues. Given the pipeline handles Spanish and Arabic audio, Arabic transcriptions would routinely hit this path.

Lines 118–123: del metadata["start_time"], del metadata["end_time"], del metadata["explanation"], del metadata["keywords_detected"] — These will raise KeyError if the flagged snippet lacks any of these fields. Consider using metadata.pop("key", None) for safer removal.

Line 112: The datetime format string "%Y-%m-%dT%H:%M:%S+00:00" hardcodes the UTC offset. If the timestamp ever uses Z suffix or a different offset, parsing will fail.

Suggested fix for safe key removal (Lines 118-123)

- del metadata["start_time"] - del metadata["end_time"] - - # TODO: Add these fields back once we've fixed the pipeline - del metadata["explanation"] - del metadata["keywords_detected"] + metadata.pop("start_time", None) + metadata.pop("end_time", None) + + # TODO: Add these fields back once we've fixed the pipeline + metadata.pop("explanation", None) + metadata.pop("keywords_detected", None)

Suggested fix for datetime parsing (Line 112)

- recorded_at = datetime.strptime(snippet["recorded_at"], "%Y-%m-%dT%H:%M:%S+00:00") + from datetime import timezone + from dateutil.parser import isoparse + recorded_at = isoparse(snippet["recorded_at"])

Alternatively, if you want to stay in the stdlib:

- recorded_at = datetime.strptime(snippet["recorded_at"], "%Y-%m-%dT%H:%M:%S+00:00") + recorded_at = datetime.fromisoformat(snippet["recorded_at"])

🤖 Prompt for AI Agents

In `@src/processing_pipeline/stage_3/tasks.py` around lines 96 - 129, In __get_metadata: replace the fragile .encode("latin-1").decode("unicode-escape") approach with a safe decode path that first checks transcription type and attempts decoding via codecs.decode(..., "unicode_escape") (or use a no-op if decoding raises) while preserving the original transcription on error; use metadata.pop("start_time", None), metadata.pop("end_time", None), metadata.pop("explanation", None) and metadata.pop("keywords_detected", None) instead of del to avoid KeyError; and parse recorded_at with a tolerant ISO parser (e.g., datetime.fromisoformat(snippet["recorded_at"].replace("Z", "+00:00")) or dateutil.parser.parse) before formatting so timestamps with "Z" or different offsets don't break. Ensure references to metadata["transcription"], metadata.pop(...) and recorded_at parsing are updated in __get_metadata.

coderabbitai · 2026-02-09T03:04:04Z

+    except Exception as e:
+        print(f"Failed to process {local_file}: {e}")
+        supabase_client.set_snippet_status(snippet["id"], ProcessingStatus.ERROR, str(e))


⚠️ Potential issue | 🟠 Major

Bare Exception catch swallows errors silently — task appears to succeed on failure.

The except Exception at Line 219 catches all errors, sets the snippet status to ERROR, but never re-raises. This means:

The process_snippet task will report success to the orchestrator even when processing failed.

If supabase_client.set_snippet_status itself fails, both the original error and the status-update error are lost.

At minimum, re-raise after setting the error status so the caller/framework knows the task failed.

Suggested fix

except Exception as e: print(f"Failed to process {local_file}: {e}") - supabase_client.set_snippet_status(snippet["id"], ProcessingStatus.ERROR, str(e)) + try: + supabase_client.set_snippet_status(snippet["id"], ProcessingStatus.ERROR, str(e)) + except Exception as status_err: + print(f"Failed to set error status for snippet {snippet['id']}: {status_err}") + raise

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

except Exception as e:

print(f"Failed to process {local_file}: {e}")

supabase_client.set_snippet_status(snippet["id"], ProcessingStatus.ERROR, str(e))

except Exception as e:

print(f"Failed to process {local_file}: {e}")

try:

supabase_client.set_snippet_status(snippet["id"], ProcessingStatus.ERROR, str(e))

except Exception as status_err:

print(f"Failed to set error status for snippet {snippet['id']}: {status_err}")

raise

🧰 Tools

🪛 Ruff (0.14.14)

[warning] 219-219: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

In `@src/processing_pipeline/stage_3/tasks.py` around lines 219 - 221, The except Exception block in process_snippet currently swallows failures; change it to re-raise after updating status: catch Exception as e, attempt to call supabase_client.set_snippet_status(snippet["id"], ProcessingStatus.ERROR, str(e)) inside its own try/except so failures updating Supabase are logged but do not suppress the original error, then re-raise the original exception (raise) so the orchestrator sees the task as failed; reference process_snippet, supabase_client.set_snippet_status, ProcessingStatus.ERROR, local_file, and snippet in your changes.

… enum The H.1 Breaking News Protocol table used SCREAMING_CASE values (VERIFIED_FALSE, PARTIALLY_VERIFIABLE, UNVERIFIABLE_*) that don't exist in the verification_status enum, which would cause validation failures.

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 8991dcc in 18 seconds. Click for details.

Reviewed 21 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 0 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

Workflow ID: wflow_WGamei1noZpb8GaV

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

prompts/stage_3/analysis_prompt.md (1)
1775-1948: ⚠️ Potential issue | 🟠 Major

Missing verification_evidence field in the complete example.

The schema requires verification_evidence (Line 638), and extensive documentation explains how to populate it (Lines 122-169, 575-609). However, the complete example output (Lines 1775-1948) does not include the verification_evidence field. This omission could result in the AI failing to provide this critical field in actual outputs.

The example should demonstrate a complete verification_evidence structure showing:

At least one entry in searches_performed with actual search queries

Corresponding results array with source details

Populated verification_summary
Example structure to add

Add after Line 1945 (before "thought_summaries"):
  "verification_evidence": {
    "searches_performed": [
      {
        "query": "vaccines mind control government",
        "search_intent": "Verify claim that vaccines are used for mind control",
        "result_status": "results_found",
        "results": [
          {
            "url": "https://www.factcheck.org/vaccine-myths/",
            "source_name": "FactCheck.org",
            "source_type": "tier1_factchecker",
            "publication_date": "2025-12-15",
            "title": "Debunking Vaccine Myths",
            "relevant_excerpt": "There is no scientific evidence that vaccines can control minds or contain mind-altering substances.",
            "relevance_to_claim": "contradicts_claim",
            "content_fetched": true
          }
        ]
      }
    ],
    "verification_summary": {
      "total_searches": 1,
      "claims_contradicted": 1,
      "claims_unverifiable": 0,
      "key_findings": "Fact-checking sources confirm that the mind control claim is false and has been repeatedly debunked."
    }
  },

🤖 Fix all issues with AI agents

In `@prompts/stage_3/analysis_prompt.md`:
- Line 1002: The schema adds a boolean field content_fetched but the prose never
explains its usage; update the prose (e.g., section C.1 around where
web_url_read and snippet vs full-content behavior is discussed) to state that
content_fetched should be true when the agent used web_url_read to retrieve the
full page content and false when only search snippet/metadata was available, or
alternatively remove the content_fetched property from the schema if you decide
it’s unnecessary; ensure references to content_fetched appear alongside the
web_url_read instructions and examples so callers know when to set it.
- Around line 321-325: The "THE GOLDEN RULE" sentence is ambiguous because it
can be read to override the 20% cap for claims under 24 hours; update that
sentence to explicitly reference the existing tiered caps in the table (the rows
with "MAX 20%" for "within 24 hours" and "MAX 30%" for "24-72 hours") and state
that the 30% maximum applies only to claims aged 24–72 hours while claims within
24 hours remain capped at 20%; keep the phrase "THE GOLDEN RULE" but replace the
current line with a clear, tiered statement that mentions "MAX 20% (0–24h)" and
"MAX 30% (24–72h)" to remove ambiguity.

The prose documentation mandates these fields but the JSON schemas and Pydantic model had them as optional. publication_date allows null for cases where the date is unavailable. Also documents content_fetched as explicitly optional in the prose.

…fidence scores confidence_scores.overall is an integer 0-100, not a percentage. Updated all score references in analysis_prompt.md and system_instruction.md to use explicit integer values (e.g., "30 (out of 100)") instead of "30%".

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 03e05c1 in 16 seconds. Click for details.

Reviewed 64 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 0 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

Workflow ID: wflow_vSMdebzXzECu0JxA

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

Update import script to reference stage 3 prompts from their new location in the prompts/stage_3/ subdirectory, aligning with the reorganized prompt directory structure.

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 2639987 in 39 seconds. Click for details.

Reviewed 406 lines of code in 5 files
Skipped 0 files when reviewing.
Skipped posting 0 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

Workflow ID: wflow_UmF9gOMqvIB3LXUd

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/scripts/import_prompts_to_db.py (1)
47-51: ⚠️ Potential issue | 🟠 Major

Stage 4 paths in import_prompts_to_db.py are incorrect and will fail—use nested paths to match constants.py and existing files.

The mapping at lines 47–51 references old flat paths (prompts/Stage_4_*.md) that do not exist on disk. The runtime code in src/processing_pipeline/constants.py (lines 54–62) reads from nested paths (prompts/stage_4/*.md), which do exist. All other stages in import_prompts_to_db.py (STAGE_1, STAGE_3, GEMINI_TIMESTAMPED_TRANSCRIPTION) use nested paths. Update Stage 4 to match:
Fix
     PromptStage.STAGE_4: {
-        "system_instruction": "prompts/Stage_4_system_instruction.md",
-        "user_prompt": "prompts/Stage_4_review_prompt.md",
-        "output_schema": "prompts/Stage_4_output_schema.json",
+        "system_instruction": "prompts/stage_4/system_instruction.md",
+        "user_prompt": "prompts/stage_4/review_prompt.md",
+        "output_schema": "prompts/stage_4/output_schema.json",
     },

🤖 Fix all issues with AI agents

In `@prompts/stage_3/analysis_prompt.md`:
- Around line 977-1022: The verification_evidence schema is missing descriptive
guidance; update the verification_evidence object in analysis_prompt.md by
adding the missing descriptions from prompts/stage_3/output_schema.json: add
"Record of all web searches performed during fact-checking." to
searches_performed, "Individual search results with full details." to results,
and per-result property descriptions (e.g., source_type: "Classification of
source reliability tier.", publication_date: "Publication date or null if
unavailable.", relevant_excerpt: "Excerpt demonstrating relevance to the
claim.", relevance_to_claim: "How the result relates to the claim.") and add
descriptive text for verification_summary and its properties (total_searches,
claims_contradicted, claims_unverifiable, key_findings) so the
verification_evidence, searches_performed, results, source_type, and
verification_summary blocks include the same descriptions as in
output_schema.json.

🧹 Nitpick comments (3)

src/processing_pipeline/constants.py (2)
43-62: File handles are never closed.

Every helper uses a bare open(...) without a with statement or .close(), leaking file descriptors. Since you're already touching these lines, consider wrapping in with:
Example fix for one function
 def get_user_prompt_for_stage_3():
-    return open("prompts/stage_3/analysis_prompt.md", "r").read()
+    with open("prompts/stage_3/analysis_prompt.md", "r") as f:
+        return f.read()
Same applies to all six functions (lines 44, 48, 51, 55, 58, 62) and the pre-existing one on line 65.

93-99: Dead references to removed function in __main__ block.

Lines 94–99 reference get_timestamped_transcription_generation_output_schema, which was removed in this PR. While it's commented out so there's no runtime impact, it adds confusion. Consider cleaning up or removing these stale comments.
src/processing_pipeline/stage_1/flows.py (1)

224-224: Good refactor using get_audio_file_metadata.

This eliminates the duplicated metadata construction logic.

Note: redo_main_detection (lines 158–166) still manually constructs the same metadata dict. Consider using get_audio_file_metadata(audio_file) there too for consistency.

coderabbitai · 2026-02-09T11:17:45Z

+        "verification_evidence": {
+            "type": "object",
+            "required": ["searches_performed", "verification_summary"],
+            "description": "Complete documentation of all web searches performed during fact-checking.",
+            "properties": {
+                "searches_performed": {
+                    "type": "array",
+                    "items": {
+                        "type": "object",
+                        "required": ["query", "search_intent", "result_status", "results"],
+                        "properties": {
+                            "query": { "type": "string" },
+                            "search_intent": { "type": "string" },
+                            "result_status": { "type": "string", "enum": ["results_found", "no_results", "results_inconclusive"] },
+                            "results": {
+                                "type": "array",
+                                "items": {
+                                    "type": "object",
+                                    "required": ["url", "source_name", "source_type", "publication_date", "title", "relevant_excerpt", "relevance_to_claim"],
+                                    "properties": {
+                                        "url": { "type": "string" },
+                                        "source_name": { "type": "string" },
+                                        "source_type": { "type": "string", "enum": ["tier1_wire_service", "tier1_factchecker", "tier2_major_news", "tier3_regional_news", "official_source", "other"] },
+                                        "publication_date": { "type": ["string", "null"] },
+                                        "title": { "type": "string" },
+                                        "relevant_excerpt": { "type": "string" },
+                                        "relevance_to_claim": { "type": "string", "enum": ["supports_claim", "contradicts_claim", "provides_context", "inconclusive"] },
+                                        "content_fetched": { "type": "boolean" }
+                                    }
+                                }
+                            }
+                        }
+                    }
+                },
+                "verification_summary": {
+                    "type": "object",
+                    "required": ["total_searches", "claims_contradicted", "claims_unverifiable", "key_findings"],
+                    "properties": {
+                        "total_searches": { "type": "integer" },
+                        "claims_contradicted": { "type": "integer" },
+                        "claims_unverifiable": { "type": "integer" },
+                        "key_findings": { "type": "string" }
+                    }
+                }
+            }
        }


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find output_schema.json fd "output_schema.json" prompts/stage_3

Repository: PublicDataWorks/verdad

Length of output: 100

🏁 Script executed:

#!/bin/bash # Read the output_schema.json file cat prompts/stage_3/output_schema.json

Repository: PublicDataWorks/verdad

Length of output: 21501

Add descriptions to verification_evidence schema to match output_schema.json.

The verification_evidence schema in analysis_prompt.md (lines 977-1022) is missing descriptions for multiple fields that are documented in prompts/stage_3/output_schema.json. The structural definitions match, but the prompt version lacks guidance descriptions for searches_performed, results, and verification_summary properties. These missing descriptions reduce clarity for the model about field purposes and requirements.

Add the descriptions from output_schema.json to ensure the prompt provides complete guidance. Key missing descriptions:

searches_performed: "Record of all web searches performed during fact-checking."

results: "Individual search results with full details."

Each result item property needs its description (e.g., source_type: "Classification of source reliability tier.")

verification_summary and all its properties need descriptions

🤖 Prompt for AI Agents

In `@prompts/stage_3/analysis_prompt.md` around lines 977 - 1022, The verification_evidence schema is missing descriptive guidance; update the verification_evidence object in analysis_prompt.md by adding the missing descriptions from prompts/stage_3/output_schema.json: add "Record of all web searches performed during fact-checking." to searches_performed, "Individual search results with full details." to results, and per-result property descriptions (e.g., source_type: "Classification of source reliability tier.", publication_date: "Publication date or null if unavailable.", relevant_excerpt: "Excerpt demonstrating relevance to the claim.", relevance_to_claim: "How the result relates to the claim.") and add descriptive text for verification_summary and its properties (total_searches, claims_contradicted, claims_unverifiable, key_findings) so the verification_evidence, searches_performed, results, source_type, and verification_summary blocks include the same descriptions as in output_schema.json.

quancao-ea added 7 commits February 8, 2026 17:37

Refactor Stage 3 prompts: add verification fields and web search tools

8ae07f8

Fix MCP connection issue for Gemini CLI

efe3c08

Enhance Stage 3 prompts with post-cutoff event handling

687c8de

Add guidance for recognizing and correctly handling events that occurred after the model's training cutoff date, preventing false positives when web search results conflict with pre-training knowledge.

Refactor Stage 3: simplify grounding metadata handling

141fa68

Remove tool tracking from CLI method and extract verification evidence from validated responses instead of API metadata. Streamline the analysis flow to focus on structured output validation.

Reorganize Stage 3 prompts into dedicated subdirectory

c783625

Move all Stage 3 prompt files from the root prompts directory to a new stage_3 subdirectory for better organization and consistency with the project structure.

Reorganize prompts directory

9db4a22

ellipsis-dev Bot reviewed Feb 9, 2026

View reviewed changes

gemini-code-assist Bot reviewed Feb 9, 2026

View reviewed changes

coderabbitai Bot reviewed Feb 9, 2026

View reviewed changes

ellipsis-dev Bot reviewed Feb 9, 2026

View reviewed changes

coderabbitai Bot reviewed Feb 9, 2026

View reviewed changes

Comment thread prompts/stage_3/analysis_prompt.md Outdated

Comment thread prompts/stage_3/analysis_prompt.md

quancao-ea added 2 commits February 9, 2026 15:46

ellipsis-dev Bot reviewed Feb 9, 2026

View reviewed changes

quancao-ea added 3 commits February 9, 2026 18:01

Add more instruction on content_fetched field

5d45977

Fix stage 3 prompt paths to match new directory structure

6d4a6e1

Update import script to reference stage 3 prompts from their new location in the prompts/stage_3/ subdirectory, aligning with the reorganized prompt directory structure.

Remove unused custom transcription generator

2639987

ellipsis-dev Bot reviewed Feb 9, 2026

View reviewed changes

coderabbitai Bot reviewed Feb 9, 2026

View reviewed changes

quancao-ea merged commit e1d27ae into main Feb 9, 2026
2 checks passed

quancao-ea deleted the fix/stage-3-prompts-add-verification-fields-and-web-search branch March 17, 2026 02:42

coderabbitai Bot mentioned this pull request Mar 20, 2026

VER-306: Fix crashing issue due to out of memory in Stage 3 machine #69

Merged

Conversation

quancao-ea commented Feb 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

linear Bot commented Feb 9, 2026

Uh oh!

coderabbitai Bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot commented Feb 9, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist Bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

quancao-ea commented Feb 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Feb 9, 2026 •

edited

Loading