Skip to content

VER-298: Refactor Stage 3 prompts: add verification fields and fix web search tools#59

Merged
quancao-ea merged 13 commits intomainfrom
fix/stage-3-prompts-add-verification-fields-and-web-search
Feb 9, 2026
Merged

VER-298: Refactor Stage 3 prompts: add verification fields and fix web search tools#59
quancao-ea merged 13 commits intomainfrom
fix/stage-3-prompts-add-verification-fields-and-web-search

Conversation

@quancao-ea
Copy link
Copy Markdown
Collaborator

@quancao-ea quancao-ea commented Feb 9, 2026

Important

Refactors Stage 3 prompts and processing logic to include web-based verification and structured evidence, updating schemas and tests accordingly.

  • Behavior:
    • Adds mandatory web-based verification workflow and structured verification evidence in analysis outputs.
    • Introduces verification_status field with values: verified_false, verified_true, uncertain, insufficient_evidence.
  • Configuration:
    • Adds a timeout setting for the external search service in .gemini/settings.json.
  • Breaking Changes:
    • Removed older stage prompts and timestamped-transcription schema; Stage 3 processing restructured.
  • Prompts and Schemas:
    • Updates prompts/stage_3/analysis_prompt.md, system_instruction.md, and output_schema.json for new verification logic.
  • Code Structure:
    • Refactors Stage 3 into separate modules: executors.py, flows.py, tasks.py, and models.py.
    • Removes src/processing_pipeline/stage_3.py and timestamped_transcription_generator.py.
  • Tests:
    • Updates tests in test_stage_1.py and test_stage_3.py to reflect Stage 3 restructuring and new verification logic.

This description was created by Ellipsis for 2639987. You can customize this summary. It will automatically update as commits are pushed.


Summary by CodeRabbit

  • New Features
    • Adds mandatory web-based verification workflow, structured verification evidence in outputs, and verification_status (verified_false, verified_true, uncertain, insufficient_evidence).
  • Configuration
    • Adds a timeout setting for the external search service.
  • Breaking Changes
    • Restructures Stage 3 processing and removes the older timestamped-transcription schema/workflow.
  • Tests
    • Updates tests to align with the Stage 3 restructuring.

Add guidance for recognizing and correctly handling events that occurred after the model's training cutoff date, preventing false positives when web search results conflict with pre-training knowledge.
Remove tool tracking from CLI method and extract verification
evidence from validated responses instead of API metadata.
Streamline the analysis flow to focus on structured output
validation.
Move all Stage 3 prompt files from the root prompts directory
to a new stage_3 subdirectory for better organization and
consistency with the project structure.
Split Stage 3 processing pipeline into separate modules for better
organization and maintainability. Extract executor logic, flow
definitions, and task functions into dedicated files while maintaining
backward compatibility through __init__.py exports.
@linear
Copy link
Copy Markdown

linear Bot commented Feb 9, 2026

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 9, 2026

Walkthrough

Refactors Stage 3 into a package (executors, flows, tasks, models), adds verification_evidence and verification_status to prompts/schemas, removes legacy timestamped-transcription artifacts, updates imports/tests to the new layout, and adjusts .gemini/settings.json to include a searxng timeout.

Changes

Cohort / File(s) Summary
Prompts & System
prompts/stage_3/analysis_prompt.md, prompts/stage_3/system_instruction.md, prompts/stage_3/output_schema.json
Adds mandatory web-search verification workflow, introduces verification_evidence and verification_status, expands schema (breaking-news rules, self-review, evidence recording).
Removed Legacy Prompts
prompts/Stage_3_system_instruction.md, prompts/Timestamped_transcription_generation_output_schema.json, prompts/Timestamped_transcription_generation_prompt.md
Deletes legacy Stage 3 system instruction and timestamped-transcription prompt/schema files.
Stage 3 Package (new)
src/processing_pipeline/stage_3/executors.py, src/processing_pipeline/stage_3/flows.py, src/processing_pipeline/stage_3/tasks.py, src/processing_pipeline/stage_3/models.py, src/processing_pipeline/stage_3/__init__.py
Splits monolith into modules: Executor (Stage3Executor with Gemini CLI + Google grounding fallback), Flows (in_depth_analysis, reset hook), Tasks (Supabase/S3 interactions, processing), Models (Pydantic verification types); exports updated.
Removed Monolith
src/processing_pipeline/stage_3.py
Removes previous monolithic Stage 3 implementation; functionality redistributed to package modules.
Tests
tests/processing_pipeline/test_stage_3.py, tests/processing_pipeline/test_stage_1.py
Updates mocks/import paths to new nested module layout; removes tests tied to removed timestamped-transcription generator.
Stage 1 changes
src/processing_pipeline/stage_1/flows.py, src/processing_pipeline/stage_1/tasks.py, src/processing_pipeline/timestamped_transcription_generator.py
Removes custom timestamped transcription generator and related tests; Stage 1 flows now use GeminiModel enum and Gemini transcription path.
Constants & Scripts
src/processing_pipeline/constants.py, src/scripts/import_prompts_to_db.py
Updates prompt path lookups to prompts/stage_*/* and removes timestamped-transcription helper getters; updates PROMPT_MAPPING for Stage 3.
Configuration
.gemini/settings.json
Adds timeout: 60000 to searxng MCP server configuration.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Flow as in_depth_analysis
    participant Executor as Stage3Executor
    participant Supabase
    participant S3
    participant GeminiCLI as Gemini CLI
    participant GeminiSDK as Gemini SDK
    participant GoogleAPI as Google Search

    Client->>Flow: start in_depth_analysis
    Flow->>Supabase: fetch snippet & mark PROCESSING
    Flow->>S3: download audio file
    Flow->>Executor: run(gemini_key, model, audio, metadata, prompt_version)

    Executor->>GeminiCLI: analyze via CLI (custom search)
    alt CLI success
        GeminiCLI-->>Executor: analysis (streamed JSON)
    else CLI fails
        Executor->>GeminiSDK: upload audio
        Executor->>GoogleAPI: perform grounding search
        GoogleAPI-->>Executor: search results
        GeminiSDK-->>Executor: grounded analysis
    end

    Executor->>Executor: validate with Pydantic (Stage3Output)
    alt validation fails
        Executor->>GeminiSDK: restructure into required schema
        GeminiSDK-->>Executor: formatted JSON
    end

    Executor-->>Flow: analysis + verification_evidence
    Flow->>Supabase: update snippet with results
    Flow->>S3: delete local audio
    Flow-->>Client: complete
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • nhphong

Poem

🐰 I hopped through files and split the stack,
Carrots of schema in a tidy pack.
Searches and proofs I nibbled through,
Modular hops and tests updated too.
Verification crunch — a joyful chew!

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.69% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main changes: refactoring Stage 3 prompts, adding verification fields, and fixing web search tools integration.
Linked Issues check ✅ Passed All objectives from VER-298 are met: prompts reorganized to prompts/stage_3/, verification fields and status added to schema, web search tools integrated, Stage 3 refactored into modular components, and tests updated.
Out of Scope Changes check ✅ Passed The PR removes custom timestamped transcription generator and related prompt files unrelated to Stage 3 verification. These appear justified by modernizing the pipeline architecture.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/stage-3-prompts-add-verification-fields-and-web-search

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Pylint (4.0.4)
src/processing_pipeline/constants.py

************* Module .pylintrc
.pylintrc:1:0: F0011: error while parsing the configuration: File contains no section headers.
file: '.pylintrc', line: 1
'disable=C0116\n' (config-parse-error)
[
{
"type": "convention",
"module": "src.processing_pipeline.constants",
"obj": "",
"line": 94,
"column": 0,
"endLine": null,
"endColumn": null,
"path": "src/processing_pipeline/constants.py",
"symbol": "line-too-long",
"message": "Line too long (101/100)",
"message-id": "C0301"
},
{
"type": "convention",
"module": "src.processing_pipeline.constants",
"obj": "",
"line": 98,
"column": 0,
"endLine": null,
"endColumn": null,
"path": "src/processing_pipeline/constants.py",
"symbol": "line-too-long",
"message": "Line too long (115/100)",
"message-id": "C0301"
},
{
"type": "convention",
"module"

... [truncated 7209 characters] ...

ini_timestamped_transcription_generation_prompt",
"line": 64,
"column": 0,
"endLine": 64,
"endColumn": 58,
"path": "src/processing_pipeline/constants.py",
"symbol": "missing-function-docstring",
"message": "Missing function or method docstring",
"message-id": "C0116"
},
{
"type": "warning",
"module": "src.processing_pipeline.constants",
"obj": "get_gemini_timestamped_transcription_generation_prompt",
"line": 65,
"column": 11,
"endLine": 65,
"endColumn": 85,
"path": "src/processing_pipeline/constants.py",
"symbol": "unspecified-encoding",
"message": "Using open without explicitly specifying an encoding",
"message-id": "W1514"
}
]

src/processing_pipeline/stage_1/flows.py

************* Module .pylintrc
.pylintrc:1:0: F0011: error while parsing the configuration: File contains no section headers.
file: '.pylintrc', line: 1
'disable=C0116\n' (config-parse-error)
[
{
"type": "convention",
"module": "src.processing_pipeline.stage_1.flows",
"obj": "",
"line": 41,
"column": 0,
"endLine": null,
"endColumn": null,
"path": "src/processing_pipeline/stage_1/flows.py",
"symbol": "line-too-long",
"message": "Line too long (116/100)",
"message-id": "C0301"
},
{
"type": "convention",
"module": "src.processing_pipeline.stage_1.flows",
"obj": "",
"line": 64,
"column": 0,
"endLine": null,
"endColumn": null,
"path": "src/processing_pipeline/stage_1/flows.py",
"symbol": "line-too-long",
"message": "Line too long (116/100)",
"message-id": "C0301"
},
{
"type": "convention",

... [truncated 13107 characters] ...

module": "src.processing_pipeline.stage_1.flows",
"obj": "regenerate_timestamped_transcript",
"line": 193,
"column": 0,
"endLine": 193,
"endColumn": 37,
"path": "src/processing_pipeline/stage_1/flows.py",
"symbol": "too-many-locals",
"message": "Too many local variables (16/15)",
"message-id": "R0914"
},
{
"type": "warning",
"module": "src.processing_pipeline.stage_1.flows",
"obj": "regenerate_timestamped_transcript",
"line": 216,
"column": 8,
"endLine": 216,
"endColumn": 10,
"path": "src/processing_pipeline/stage_1/flows.py",
"symbol": "redefined-builtin",
"message": "Redefining built-in 'id'",
"message-id": "W0622"
}
]

src/scripts/import_prompts_to_db.py

************* Module .pylintrc
.pylintrc:1:0: F0011: error while parsing the configuration: File contains no section headers.
file: '.pylintrc', line: 1
'disable=C0116\n' (config-parse-error)
[
{
"type": "convention",
"module": "import_prompts_to_db",
"obj": "",
"line": 6,
"column": 0,
"endLine": null,
"endColumn": null,
"path": "src/scripts/import_prompts_to_db.py",
"symbol": "line-too-long",
"message": "Line too long (111/100)",
"message-id": "C0301"
},
{
"type": "convention",
"module": "import_prompts_to_db",
"obj": "",
"line": 7,
"column": 0,
"endLine": null,
"endColumn": null,
"path": "src/scripts/import_prompts_to_db.py",
"symbol": "line-too-long",
"message": "Line too long (120/100)",
"message-id": "C0301"
},
{
"type": "convention",
"module": "import_prompts_to_db",

... [truncated 7167 characters] ...

essage-id": "R0912"
},
{
"type": "refactor",
"module": "import_prompts_to_db",
"obj": "import_prompts",
"line": 104,
"column": 0,
"endLine": 104,
"endColumn": 18,
"path": "src/scripts/import_prompts_to_db.py",
"symbol": "too-many-statements",
"message": "Too many statements (72/50)",
"message-id": "R0915"
},
{
"type": "convention",
"module": "import_prompts_to_db",
"obj": "main",
"line": 266,
"column": 0,
"endLine": 266,
"endColumn": 8,
"path": "src/scripts/import_prompts_to_db.py",
"symbol": "missing-function-docstring",
"message": "Missing function or method docstring",
"message-id": "C0116"
}
]


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to 9db4a22 in 16 seconds. Click for details.
  • Reviewed 2153 lines of code in 18 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

Workflow ID: wflow_BnW9dJuHfIFj6bsd

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @quancao-ea, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Stage 3 disinformation analysis pipeline by introducing a more robust and transparent factual verification process. It refactors prompts to include detailed instructions for web search, mandates comprehensive documentation of verification evidence, and updates the output schema to capture this information. The underlying Python codebase has also been restructured for improved modularity and to support the new verification mechanisms, ensuring that analyses are grounded in current, verifiable information and explicitly account for the model's knowledge limitations.

Highlights

  • Prompt Refactoring and Organization: Stage 3 prompts, including system instructions, analysis prompts, and output schemas, have been moved into a new, dedicated prompts/stage_3 subdirectory. Obsolete timestamped transcription prompts and schemas have been removed.
  • Enhanced Factual Verification Protocol: The analysis_prompt.md now includes a detailed, two-step verification process using searxng_web_search and web_url_read tools. This protocol mandates comprehensive documentation of all search activities, including queries, results, source types, and direct excerpts, along with clear source prioritization guidelines.
  • Knowledge Cutoff and Breaking News Handling: New critical instructions have been added to guide the model on how to handle its knowledge cutoff, emphasizing that current web search results from reliable sources must override pre-training knowledge. A specific protocol for 'Breaking News and Recent Events' (within 72 hours) has been introduced, including confidence score caps for unverified recent claims.
  • New Verification Fields in Output Schema: The output schema for Stage 3 has been extended to include a verification_status within confidence_scores and a comprehensive verification_evidence object. This new object meticulously records all web searches performed, their intent, detailed results (URLs, source information, relevant excerpts), and a summary of verification findings.
  • Codebase Restructuring: The Python code for Stage 3 processing has been refactored from a single file (src/processing_pipeline/stage_3.py) into a more modular package structure under src/processing_pipeline/stage_3/, with separate modules for executors, flows, models, and tasks. This improves maintainability and clarity.
  • Web Search Tool Integration Update: The Stage3Executor now explicitly supports searxng_web_search for custom search operations, passing the SEARXNG_URL environment variable. The grounding_metadata now captures the detailed verification_evidence directly from the model's output.
Changelog
  • prompts/Stage_3_analysis_prompt.md
    • Renamed to prompts/stage_3/analysis_prompt.md.
    • Expanded 'Ensure Factual Accuracy' section with detailed two-step verification process using searxng_web_search and web_url_read.
    • Added mandatory 'C.1 Verification Evidence Documentation' section, detailing how to record search queries, results, source types, and excerpts.
    • Introduced 'Source Priority Guidelines' for web search results.
    • Added critical 'C.2 Knowledge Cutoff and Post-Training Events' section, instructing the model to prioritize web search results over pre-training knowledge.
    • Simplified 'Verification Requirement' statement.
    • Added new 'H.1 Breaking News and Recent Events Protocol' for handling time-sensitive claims with confidence caps.
    • Added a mandatory check to the self-review process, reinforcing that supporting web search results from tier-1/tier-2 sources should lead to a score of 0.
    • Added new 'Common Error Check' items related to web search result interpretation and source reputation.
    • Included 'Breaking News Verification Checklist' and 'Evidence Documentation Check' in the self-review section.
    • Added 'M. Verification Evidence' section with a JSON schema example for documenting all web searches.
    • Updated the output schema example within the prompt to include verification_evidence and verification_status fields.
    • Updated instruction for verifying factual claims to explicitly mention web_url_read.
  • prompts/Stage_3_heuristics.md
    • Renamed to prompts/stage_3/heuristics.md.
  • prompts/Stage_3_output_schema.json
    • Renamed to prompts/stage_3/output_schema.json.
    • Added verification_evidence to the list of required fields.
    • Added verification_status field to the confidence_scores object with an enum of possible values.
    • Added full schema definition for verification_evidence, including searches_performed and verification_summary.
  • prompts/Stage_3_system_instruction.md
    • Removed old system instruction file.
  • prompts/Stage_4_output_schema.json
    • Renamed to prompts/stage_4/output_schema.json.
  • prompts/Stage_4_review_prompt.md
    • Renamed to prompts/stage_4/review_prompt.md.
  • prompts/Stage_4_system_instruction.md
    • Renamed to prompts/stage_4/system_instruction.md.
  • prompts/Timestamped_transcription_generation_output_schema.json
    • Removed old timestamped transcription output schema.
  • prompts/Timestamped_transcription_generation_prompt.md
    • Removed old timestamped transcription generation prompt.
  • src/processing_pipeline/stage_3.py
    • Removed old Stage 3 processing file, replaced by new package structure.
  • src/processing_pipeline/stage_3/init.py
    • Added __init__.py to create stage_3 as a Python package.
    • Imported Stage3Executor, in_depth_analysis, and various tasks from new submodules.
  • src/processing_pipeline/stage_3/executors.py
    • Added new file containing the Stage3Executor class, refactored from the old stage_3.py.
    • Modified run method to extract verification_evidence from model output and use it for grounding_metadata.
    • Updated __analyze_with_custom_search to include SEARXNG_URL in environment variables for custom search.
    • Adjusted __analyze_with_google_search_grounding to no longer return grounding_metadata directly, as it's now part of the model's JSON output.
  • src/processing_pipeline/stage_3/flows.py
    • Added new file containing the Prefect flow in_depth_analysis and reset_snippet_status_hook, refactored from the old stage_3.py.
  • src/processing_pipeline/stage_3/tasks.py
    • Added new file containing Prefect tasks (fetch_a_specific_snippet_from_supabase, fetch_a_new_snippet_from_supabase, download_audio_file_from_s3, update_snippet_in_supabase, get_metadata, analyze_snippet, process_snippet), refactored from the old stage_3.py.
    • Updated update_snippet_in_supabase to correctly store verification_evidence in grounding_metadata.
  • src/processing_pipeline/stage_3_models.py
    • Renamed to src/processing_pipeline/stage_3/models.py.
    • Added verification_status field to the ConfidenceScores Pydantic model.
    • Added new Pydantic models for SearchResult, SearchPerformed, VerificationSummary, and VerificationEvidence.
    • Integrated the VerificationEvidence model into the main Stage3Output model.
  • tests/processing_pipeline/test_stage_3.py
    • Updated import paths for SupabaseClient, postprocess_snippet, and process_snippet to reflect the new modular structure of the stage_3 package.
Ignored Files
  • Ignored by pattern: .gemini/** (1)
    • .gemini/settings.json
Activity
  • The pull request author, quancao-ea, initiated significant refactoring of the Stage 3 prompts and associated Python code.
  • The core logic for Stage 3 analysis, including web search integration and output schema validation, was moved and reorganized into a new package structure.
  • New detailed instructions for factual verification, including handling of knowledge cutoff and breaking news, were added to the analysis prompt.
  • The data models and output schemas were updated to support the new verification evidence fields, ensuring structured reporting of fact-checking activities.
  • Test files were adjusted to align with the new module and package structure.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and valuable refactoring of the Stage 3 processing pipeline. The changes enhance disinformation detection by adding detailed, structured verification instructions to the prompts, including guidance on using web search tools, documenting evidence, and handling knowledge cutoffs. The codebase is also improved by splitting the monolithic stage_3.py into a more modular structure with separate files for executors, flows, tasks, and models, which improves maintainability.

My review has identified a critical inconsistency in the new prompts that will likely cause schema validation errors, and a medium-severity issue where potentially useful debugging information from the Gemini SDK is being discarded. Addressing these points will help ensure the new pipeline is robust and reliable.

Comment thread prompts/stage_3/analysis_prompt.md Outdated
Comment on lines +246 to +258
if not response.text:
finish_reason = response.candidates[0].finish_reason if response.candidates else None

if finish_reason == FinishReason.MAX_TOKENS:
raise ValueError("The response from Gemini was too long and was cut off in step 1.")

print(f"Response finish reason: {finish_reason}")
raise ValueError("No response from Gemini in step 1.")

return {
"text": response.text,
"thought_summaries": thoughts,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The grounding_metadata available from the Google Search tool in the SDK response is being discarded here. While the new approach relies on the LLM generating the verification_evidence field, the structured grounding metadata from the API can be very valuable for logging, debugging, or as a fallback.

Consider capturing and returning this metadata. The calling run method could then decide whether to log it or handle it otherwise.

        grounding_metadata = (
            response.candidates[0].grounding_metadata.model_dump_json(indent=2) if response.candidates else None
        )

        if not response.text:
            finish_reason = response.candidates[0].finish_reason if response.candidates else None

            if finish_reason == FinishReason.MAX_TOKENS:
                raise ValueError("The response from Gemini was too long and was cut off in step 1.")

            print(f"Response finish reason: {finish_reason}")
            raise ValueError("No response from Gemini in step 1.")

        return {
            "text": response.text,
            "thought_summaries": thoughts,
            "grounding_metadata_sdk": grounding_metadata,
        }

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tests/processing_pipeline/test_stage_3.py (2)

265-277: ⚠️ Potential issue | 🔴 Critical

Test test_stage_3_executor mismatches the actual Stage3Executor.run() signature and return type.

Two issues:

  1. Missing prompt_version parameter (line 265–270): Stage3Executor.run() requires prompt_version: dict, but the test call omits it — this will raise a TypeError.
  2. Return type mismatch (lines 273–276): The test asserts result is a tuple of length 2, but run() returns a dict with keys "response", "grounding_metadata", "thought_summaries". The assertion will fail.
Proposed fix sketch
         result = Stage3Executor.run(
             gemini_key="test-key",
             model_name=GeminiModel.GEMINI_FLASH_LATEST,
             audio_file="test.mp3",
             metadata={"test": "metadata"},
+            prompt_version={"user_prompt": "test", "system_instruction": "test", "output_schema": {}},
         )

-        # Result should be a tuple (response, grounding_metadata)
-        assert isinstance(result, tuple)
-        assert len(result) == 2
-        response, grounding_metadata = result
-        assert isinstance(response, dict)
-        assert grounding_metadata is not None
+        # Result should be a dict with response, grounding_metadata, thought_summaries
+        assert isinstance(result, dict)
+        assert "response" in result
+        assert "grounding_metadata" in result
+        assert "thought_summaries" in result

153-160: ⚠️ Potential issue | 🔴 Critical

Update mocks to return dict instead of tuple—Stage3Executor.run() returns {"response": ..., "grounding_metadata": ..., "thought_summaries": ...}, not a tuple.

The mocks at lines 158, 188, and 349 set mock_run.return_value = (mock_gemini_response, "test_grounding_metadata") (tuple), but Stage3Executor.run() returns a dict. This breaks analyze_snippet() which unpacks the return value with **analyzing_response and later accesses dict keys like analyzing_response["response"] and analyzing_response["thought_summaries"].

Update all three mocks to:

mock_run.return_value = {
    "response": mock_gemini_response,
    "grounding_metadata": "test_grounding_metadata",
    "thought_summaries": []
}
🤖 Fix all issues with AI agents
In `@prompts/stage_3/analysis_prompt.md`:
- Around line 105-118: Add a language identifier to the fenced code block that
contains the workflow example starting with searxng_web_search("Maduro captured
US forces 2026") and the subsequent web_url_read and verification_evidence
entries; change the opening fence from ``` to ```text (or ```plaintext) so the
block is lint-compliant and renders correctly.
- Around line 993-1003: The prose mandates that "publication_date" and "title"
are required but the JSON Schema's "required" array omits them; update the
schema by adding "publication_date" and "title" to the "required" array so the
schema and prose align, and either remove or document the "content_fetched"
property in the prose (or explicitly mark it optional in prose) to resolve the
mismatch between schema and documentation.
- Around line 317-324: The Breaking News table's `verification_status` entries
use enums that don't match the JSON schema; update the table rows so
`verification_status` only uses the schema's allowed values ("verified_false",
"verified_true", "uncertain", "insufficient_evidence") and replace the current
entries `VERIFIED_FALSE`, `PARTIALLY_VERIFIABLE`, `UNVERIFIABLE_BREAKING`,
`UNVERIFIABLE_RECENT`, and `UNVERIFIABLE_STALE` accordingly in the table under
the `verification_status` column in prompts/stage_3/analysis_prompt.md so the
model outputs valid enum values.

In `@prompts/stage_3/output_schema.json`:
- Around line 16-17: The schema currently requires "thought_summaries", which
forces Gemini in __structure_with_schema (executors.py) to fabricate that field
when the original analysis lacks it; remove "thought_summaries" from the
required array in prompts/stage_3/output_schema.json so it becomes optional,
leaving the property in properties if needed, and rely on run() and
thought_summaries_from_api to populate summaries separately; update any
comments/tests that assumed it was required.

In `@prompts/stage_3/system_instruction.md`:
- Line 29: The wording "maximum score is 30%" and "(20% for claims within 24
hours)" is ambiguous given confidence_scores.overall is an integer 0–100; update
the text in system_instruction.md to state explicit integer values/out-of-100
wording (e.g., "maximum score is 30 (out of 100)" and "(20 out of 100 for claims
within 24 hours)") so the policy refers to confidence_scores.overall
unambiguously.

In `@src/processing_pipeline/stage_3/executors.py`:
- Around line 161-166: The CLI env dict currently hardcodes
os.environ["GOOGLE_GEMINI_KEY"] causing mismatch when run() receives a different
gemini_key; update the flow to pass the gemini_key parameter into
__analyze_with_custom_search (add gemini_key to its signature and all callers
from run()), and use that gemini_key when building the env dict (set
"GEMINI_API_KEY" from the gemini_key parameter instead of
os.environ["GOOGLE_GEMINI_KEY"]) so the SDK client and CLI subprocesss use the
same credentials.
- Around line 128-131: The decorators are in the wrong order for the methods
__analyze_with_custom_search and __analyze_with_google_search_grounding: move
`@classmethod` to be the topmost decorator (apply it last) and place
`@optional_task`(log_prints=True, retries=3) below it so Prefect's task wrapper
receives a plain function; swap the two decorators for both named methods
accordingly.
- Around line 76-82: The code constructs user_prompt_with_file using
os.path.basename(audio_file) which relies on the subprocess's working directory
to find the file; instead, pass an unambiguous path to the Gemini CLI or ensure
the subprocess's cwd is explicitly set. Update the call site that builds
user_prompt_with_file in __analyze_with_custom_search (and any caller like
download_audio_file_from_s3) to use the full audio_file path (not
os.path.basename) and modify the subprocess.run invocation in the Gemini CLI
executor to set cwd to the directory containing audio_file if you must pass a
basename; this ensures the CLI can always locate the file regardless of working
directory changes. Ensure references to audio_file, user_prompt_with_file,
download_audio_file_from_s3, and subprocess.run are updated consistently.

In `@src/processing_pipeline/stage_3/flows.py`:
- Around line 65-78: The downloaded local file returned by
download_audio_file_from_s3 is only removed on the happy path, so if
process_snippet (or any code between download and delete) raises the file leaks;
wrap the snippet processing and deletion in a try/finally around the block that
calls process_snippet (and mirror the change in the alternate branch that also
deletes local_file) so that os.remove(local_file) always runs in the finally,
referencing the local_file variable, the process_snippet(...) call, and
download_audio_file_from_s3(...) to locate the code to change.

In `@src/processing_pipeline/stage_3/tasks.py`:
- Around line 219-221: The except Exception block in process_snippet currently
swallows failures; change it to re-raise after updating status: catch Exception
as e, attempt to call supabase_client.set_snippet_status(snippet["id"],
ProcessingStatus.ERROR, str(e)) inside its own try/except so failures updating
Supabase are logged but do not suppress the original error, then re-raise the
original exception (raise) so the orchestrator sees the task as failed;
reference process_snippet, supabase_client.set_snippet_status,
ProcessingStatus.ERROR, local_file, and snippet in your changes.
- Around line 96-129: In __get_metadata: replace the fragile
.encode("latin-1").decode("unicode-escape") approach with a safe decode path
that first checks transcription type and attempts decoding via
codecs.decode(..., "unicode_escape") (or use a no-op if decoding raises) while
preserving the original transcription on error; use metadata.pop("start_time",
None), metadata.pop("end_time", None), metadata.pop("explanation", None) and
metadata.pop("keywords_detected", None) instead of del to avoid KeyError; and
parse recorded_at with a tolerant ISO parser (e.g.,
datetime.fromisoformat(snippet["recorded_at"].replace("Z", "+00:00")) or
dateutil.parser.parse) before formatting so timestamps with "Z" or different
offsets don't break. Ensure references to metadata["transcription"],
metadata.pop(...) and recorded_at parsing are updated in __get_metadata.
🧹 Nitpick comments (7)
src/processing_pipeline/stage_3/executors.py (4)

83-84: Unused exception variable e.

The caught RuntimeError is assigned to e but never referenced in the except block (e.g., for logging). Either log it or drop the binding.

Proposed fix
-        except RuntimeError as e:
-            print("Falling back to Google Search grounding with SDK...")
+        except RuntimeError:
+            print("Falling back to Google Search grounding with SDK...")

Or, better, log the original error for debuggability:

-        except RuntimeError as e:
-            print("Falling back to Google Search grounding with SDK...")
+        except RuntimeError as e:
+            print(f"CLI failed ({e}), falling back to Google Search grounding with SDK...")

196-201: Silently swallowing JSON parse errors may hide CLI issues.

Lines 200–201 catch json.JSONDecodeError with a bare pass, so malformed CLI output lines are silently ignored. Consider at least logging a debug message for troubleshooting.

Proposed fix
                 except json.JSONDecodeError:
-                    pass
+                    print(f"Skipping non-JSON CLI output line: {line[:200]}")

296-309: Restructuring step uses a hardcoded model (GEMINI_FLASH_LATEST) instead of the caller's model_name.

__structure_with_schema always uses GeminiModel.GEMINI_FLASH_LATEST (line 299) rather than the model_name the caller selected. If this is intentional (cheaper model for JSON restructuring), a brief comment would help. Otherwise, consider threading through the model name or making it configurable.


29-40: Consider whether a class with only @classmethod methods is the right abstraction.

Stage3Executor has no instance state — every method is a @classmethod. A plain module with top-level functions (or a namespace class without instantiation) would be simpler and avoid the decorator-ordering pitfalls with @classmethod + @optional_task.

src/processing_pipeline/stage_3/flows.py (1)

60-60: id shadows the Python built-in.

Using id as a loop variable shadows the built-in id() function. Consider renaming to snippet_id.

Proposed fix
-        for id in snippet_ids:
-            snippet = fetch_a_specific_snippet_from_supabase(supabase_client, id)
+        for snippet_id in snippet_ids:
+            snippet = fetch_a_specific_snippet_from_supabase(supabase_client, snippet_id)
prompts/stage_3/analysis_prompt.md (1)

575-609: Verification evidence template references claims_supported count in the summary section — it's missing.

The verification_summary template (Lines 601–606) includes claims_contradicted and claims_unverifiable, but not claims_supported or claims_verified. Yet Line 168 describes a scenario where results support a claim. A claims_supported count would complete the picture and make the summary internally consistent with the scoring framework.

This also applies to the schema at Lines 1009–1017, which similarly lacks a claims_supported field.

src/processing_pipeline/stage_3/tasks.py (1)

132-181: DRY: Stage3Executor.run(...) call is repeated three times with identical arguments.

The same call with the same four keyword arguments appears at Lines 139–145, 153–159, and 171–177. Extract a helper to reduce duplication and make future changes less error-prone.

Suggested refactor
 `@optional_task`(log_prints=True)
 def analyze_snippet(gemini_key, audio_file, metadata, prompt_version: dict):
     main_model = GeminiModel.GEMINI_2_5_PRO
     fallback_model = GeminiModel.GEMINI_FLASH_LATEST
 
+    def _run(model):
+        return {
+            **Stage3Executor.run(
+                gemini_key=gemini_key,
+                model_name=model,
+                audio_file=audio_file,
+                metadata=metadata,
+                prompt_version=prompt_version,
+            ),
+            "analyzed_by": model,
+        }
+
     try:
         print(f"Attempting analysis with {main_model}")
-        analyzing_response = Stage3Executor.run(
-            gemini_key=gemini_key,
-            model_name=main_model,
-            audio_file=audio_file,
-            metadata=metadata,
-            prompt_version=prompt_version,
-        )
-        return {
-            **analyzing_response,
-            "analyzed_by": main_model,
-        }
+        return _run(main_model)
     except errors.ServerError as e:
         print(f"Server error with {main_model} (code {e.code}): {e.message}")
         print(f"Falling back to {fallback_model}")
-        analyzing_response = Stage3Executor.run(
-            gemini_key=gemini_key,
-            model_name=fallback_model,
-            audio_file=audio_file,
-            metadata=metadata,
-            prompt_version=prompt_version,
-        )
-        return {
-            **analyzing_response,
-            "analyzed_by": fallback_model,
-        }
+        return _run(fallback_model)
     except errors.ClientError as e:
         if e.code in [HTTPStatus.UNAUTHORIZED, HTTPStatus.FORBIDDEN]:
             print(f"Auth error with {main_model} (code {e.code}): {e.message}")
             raise
         else:
             print(f"Client error with {main_model} (code {e.code}): {e.message}")
             print(f"Falling back to {fallback_model}")
-            analyzing_response = Stage3Executor.run(
-                gemini_key=gemini_key,
-                model_name=fallback_model,
-                audio_file=audio_file,
-                metadata=metadata,
-                prompt_version=prompt_version,
-            )
-            return {
-                **analyzing_response,
-                "analyzed_by": fallback_model,
-            }
+            return _run(fallback_model)

Comment on lines +105 to +118
```
1. searxng_web_search("Maduro captured US forces 2026")
-> Found: https://reuters.com/article/..., https://apnews.com/...

2. web_url_read("https://reuters.com/article/...")
-> Extract: "Reuters reports that as of [date], Venezuelan President Nicolás Maduro remains in power..."

3. Document in verification_evidence:
- url: "https://reuters.com/article/..."
- source_name: "Reuters"
- source_type: "tier1_wire_service"
- relevant_excerpt: "[exact quote from article]"
- relevance_to_claim: "contradicts_claim"
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing language identifier on fenced code block.

The code block at Line 105 has no language specified. Since this block illustrates a workflow example (not a specific language), adding a language like text or plaintext would satisfy linting and improve rendering.

Suggested fix
-  ```
+  ```text
   1. searxng_web_search("Maduro captured US forces 2026")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```
1. searxng_web_search("Maduro captured US forces 2026")
-> Found: https://reuters.com/article/..., https://apnews.com/...
2. web_url_read("https://reuters.com/article/...")
-> Extract: "Reuters reports that as of [date], Venezuelan President Nicolás Maduro remains in power..."
3. Document in verification_evidence:
- url: "https://reuters.com/article/..."
- source_name: "Reuters"
- source_type: "tier1_wire_service"
- relevant_excerpt: "[exact quote from article]"
- relevance_to_claim: "contradicts_claim"
```
🧰 Tools
🪛 markdownlint-cli2 (0.20.0)

[warning] 105-105: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In `@prompts/stage_3/analysis_prompt.md` around lines 105 - 118, Add a language
identifier to the fenced code block that contains the workflow example starting
with searxng_web_search("Maduro captured US forces 2026") and the subsequent
web_url_read and verification_evidence entries; change the opening fence from
``` to ```text (or ```plaintext) so the block is lint-compliant and renders
correctly.

Comment thread prompts/stage_3/analysis_prompt.md
Comment thread prompts/stage_3/analysis_prompt.md Outdated
Comment on lines +16 to +17
"thought_summaries",
"verification_evidence"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

thought_summaries being required in output_schema may cause issues in the restructuring fallback.

In executors.py __structure_with_schema, this schema is passed to Gemini to restructure free-text analysis into JSON. The original analysis text may not contain reasoning-process summaries, forcing the model to fabricate a thought_summaries value. Consider making it optional in this schema (remove from required) since run() already handles thought summaries separately via thought_summaries_from_api.

🤖 Prompt for AI Agents
In `@prompts/stage_3/output_schema.json` around lines 16 - 17, The schema
currently requires "thought_summaries", which forces Gemini in
__structure_with_schema (executors.py) to fabricate that field when the original
analysis lacks it; remove "thought_summaries" from the required array in
prompts/stage_3/output_schema.json so it becomes optional, leaving the property
in properties if needed, and rely on run() and thought_summaries_from_api to
populate summaries separately; update any comments/tests that assumed it was
required.

Comment thread prompts/stage_3/system_instruction.md Outdated
Comment on lines +128 to +131
@optional_task(log_prints=True, retries=3)
@classmethod
def __analyze_with_custom_search(
cls,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Find the definition of optional_task
rg "def optional_task" --type py -A 5

Repository: PublicDataWorks/verdad

Length of output: 408


🏁 Script executed:

# Search for imports of optional_task in the file
rg "optional_task|from|import" src/processing_pipeline/stage_3/executors.py | head -20

Repository: PublicDataWorks/verdad

Length of output: 914


🏁 Script executed:

# Check the full context around lines 128-131 and 217-218
sed -n '125,135p' src/processing_pipeline/stage_3/executors.py
sed -n '214,224p' src/processing_pipeline/stage_3/executors.py

Repository: PublicDataWorks/verdad

Length of output: 740


🌐 Web query:

Prefect @task decorator classmethod compatibility Python

💡 Result:

Prefect’s @task is compatible with @classmethod / @staticmethod only if the method decorator is applied on top of @task (i.e., @task must see a plain function, not a classmethod/staticmethod object). Prefect will raise a TypeError otherwise: “@classmethod should be applied on top of @task``” / same for @staticmethod. [1]

from prefect import task

class MyTasks:
    `@classmethod`
    `@task`
    def build(cls, x: int) -> int:
        return x + 1

    `@staticmethod`
    `@task`
    def clean(x: int) -> int:
        return x * 2

Notes:

  • For flows, Prefect explicitly supports decorating instance/class/static methods (example shown in docs). [2]
  • For tasks, follow the decorator ordering rule above to avoid the TypeError. [1]

Sources: Prefect task decorator implementation/API reference [1]; Prefect workflow docs showing class/static methods with decorators [2].


🏁 Script executed:

# Verify all instances of `@optional_task` with `@classmethod` combo
rg "@optional_task" -A 1 src/processing_pipeline/stage_3/executors.py

Repository: PublicDataWorks/verdad

Length of output: 196


🏁 Script executed:

# Check utils.py to understand optional_task full implementation
sed -n '1,50p' src/utils.py | head -30

Repository: PublicDataWorks/verdad

Length of output: 1126


Incorrect decorator ordering: @optional_task before @classmethod will cause a runtime TypeError.

Decorators are applied bottom-up. Currently, @classmethod is applied first (creating a descriptor), then @optional_task wraps that descriptor. Prefect's internal @task decorator requires a plain function, not a classmethod descriptor, and will raise TypeError: "@classmethod should be applied on top of @task" at runtime.

Swap the order on lines 128–129 and 217–218 so @classmethod is applied last:

Required fix
-    `@optional_task`(log_prints=True, retries=3)
-    `@classmethod`
+    `@classmethod`
+    `@optional_task`(log_prints=True, retries=3)
     def __analyze_with_custom_search(

And similarly on lines 217–218 for __analyze_with_google_search_grounding.

🤖 Prompt for AI Agents
In `@src/processing_pipeline/stage_3/executors.py` around lines 128 - 131, The
decorators are in the wrong order for the methods __analyze_with_custom_search
and __analyze_with_google_search_grounding: move `@classmethod` to be the topmost
decorator (apply it last) and place `@optional_task`(log_prints=True, retries=3)
below it so Prefect's task wrapper receives a plain function; swap the two
decorators for both named methods accordingly.

Comment on lines +161 to +166
env = {
"PATH": os.environ.get("PATH", ""),
"HOME": os.environ.get("HOME", ""),
"GEMINI_API_KEY": os.environ["GOOGLE_GEMINI_KEY"],
"GEMINI_SYSTEM_MD": system_instruction_path,
"SEARXNG_URL": os.environ.get("SEARXNG_URL", ""),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

API key inconsistency: gemini_key parameter vs. os.environ["GOOGLE_GEMINI_KEY"].

run() accepts gemini_key as a parameter and uses it to construct the SDK client (line 62), but the CLI env dict hardcodes os.environ["GOOGLE_GEMINI_KEY"] (line 164). If a caller passes a different key, the CLI and SDK paths will use different credentials. Either thread gemini_key through or document that the CLI always uses the env var.

Proposed fix — use the parameter
         env = {
             "PATH": os.environ.get("PATH", ""),
             "HOME": os.environ.get("HOME", ""),
-            "GEMINI_API_KEY": os.environ["GOOGLE_GEMINI_KEY"],
+            "GEMINI_API_KEY": gemini_key,
             "GEMINI_SYSTEM_MD": system_instruction_path,
             "SEARXNG_URL": os.environ.get("SEARXNG_URL", ""),
         }

Note: gemini_key is not currently available inside __analyze_with_custom_search because it's not passed as a parameter. You'll need to add it to the method signature and pass it from run().

🤖 Prompt for AI Agents
In `@src/processing_pipeline/stage_3/executors.py` around lines 161 - 166, The CLI
env dict currently hardcodes os.environ["GOOGLE_GEMINI_KEY"] causing mismatch
when run() receives a different gemini_key; update the flow to pass the
gemini_key parameter into __analyze_with_custom_search (add gemini_key to its
signature and all callers from run()), and use that gemini_key when building the
env dict (set "GEMINI_API_KEY" from the gemini_key parameter instead of
os.environ["GOOGLE_GEMINI_KEY"]) so the SDK client and CLI subprocesss use the
same credentials.

Comment on lines +65 to +78
local_file = download_audio_file_from_s3(s3_client, R2_BUCKET_NAME, snippet["file_path"])

# Process the snippet
process_snippet(
supabase_client,
snippet,
local_file,
GEMINI_KEY,
skip_review=skip_review,
prompt_version=prompt_version,
)

print(f"Delete the downloaded snippet clip: {local_file}")
os.remove(local_file)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Downloaded audio file leaks if process_snippet raises.

os.remove(local_file) on lines 78 and 97 runs only on the happy path. If process_snippet (or anything before the os.remove) throws, the file remains on disk. Wrap in try/finally:

Proposed fix (for the snippet_ids branch; apply analogously to the else branch)
                 local_file = download_audio_file_from_s3(s3_client, R2_BUCKET_NAME, snippet["file_path"])

-                # Process the snippet
-                process_snippet(
-                    supabase_client,
-                    snippet,
-                    local_file,
-                    GEMINI_KEY,
-                    skip_review=skip_review,
-                    prompt_version=prompt_version,
-                )
-
-                print(f"Delete the downloaded snippet clip: {local_file}")
-                os.remove(local_file)
+                try:
+                    # Process the snippet
+                    process_snippet(
+                        supabase_client,
+                        snippet,
+                        local_file,
+                        GEMINI_KEY,
+                        skip_review=skip_review,
+                        prompt_version=prompt_version,
+                    )
+                finally:
+                    print(f"Delete the downloaded snippet clip: {local_file}")
+                    os.remove(local_file)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
local_file = download_audio_file_from_s3(s3_client, R2_BUCKET_NAME, snippet["file_path"])
# Process the snippet
process_snippet(
supabase_client,
snippet,
local_file,
GEMINI_KEY,
skip_review=skip_review,
prompt_version=prompt_version,
)
print(f"Delete the downloaded snippet clip: {local_file}")
os.remove(local_file)
local_file = download_audio_file_from_s3(s3_client, R2_BUCKET_NAME, snippet["file_path"])
try:
# Process the snippet
process_snippet(
supabase_client,
snippet,
local_file,
GEMINI_KEY,
skip_review=skip_review,
prompt_version=prompt_version,
)
finally:
print(f"Delete the downloaded snippet clip: {local_file}")
os.remove(local_file)
🤖 Prompt for AI Agents
In `@src/processing_pipeline/stage_3/flows.py` around lines 65 - 78, The
downloaded local file returned by download_audio_file_from_s3 is only removed on
the happy path, so if process_snippet (or any code between download and delete)
raises the file leaks; wrap the snippet processing and deletion in a try/finally
around the block that calls process_snippet (and mirror the change in the
alternate branch that also deletes local_file) so that os.remove(local_file)
always runs in the finally, referencing the local_file variable, the
process_snippet(...) call, and download_audio_file_from_s3(...) to locate the
code to change.

Comment on lines +96 to +129
def __get_metadata(snippet):
snippet_uuid = snippet["id"]
flagged_snippets = snippet["stage_1_llm_response"]["detection_result"]["flagged_snippets"]
metadata = {}
for flagged_snippet in flagged_snippets:
if flagged_snippet["uuid"] == snippet_uuid:
metadata = flagged_snippet
try:
# Handle escaped unicode characters in the transcription
metadata["transcription"] = flagged_snippet["transcription"].encode("latin-1").decode("unicode-escape")
except (UnicodeError, AttributeError) as e:
# Fallback to original transcription if decoding fails
print(f"Warning: Failed to decode transcription: {e}")
metadata["transcription"] = flagged_snippet["transcription"]

audio_file = snippet["audio_file"]
recorded_at = datetime.strptime(snippet["recorded_at"], "%Y-%m-%dT%H:%M:%S+00:00")
audio_file["recorded_at"] = recorded_at.strftime("%B %-d, %Y %-I:%M %p")
audio_file["recording_day_of_week"] = recorded_at.strftime("%A")
audio_file["time_zone"] = "UTC"
metadata["additional_info"] = audio_file

del metadata["start_time"]
del metadata["end_time"]

# TODO: Add these fields back once we've fixed the pipeline
del metadata["explanation"]
del metadata["keywords_detected"]

metadata["start_time"] = snippet["start_time"].split(":", 1)[1]
metadata["end_time"] = snippet["end_time"].split(":", 1)[1]
metadata["duration"] = snippet["duration"].split(":", 1)[1]

return metadata
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fragile encoding hack and unsafe del on keys that may not exist.

Several concerns in __get_metadata:

  1. Line 105: .encode("latin-1").decode("unicode-escape") — This will throw UnicodeEncodeError for any non-Latin-1 character (e.g., Arabic, CJK), which would be caught but could mask data issues. Given the pipeline handles Spanish and Arabic audio, Arabic transcriptions would routinely hit this path.

  2. Lines 118–123: del metadata["start_time"], del metadata["end_time"], del metadata["explanation"], del metadata["keywords_detected"] — These will raise KeyError if the flagged snippet lacks any of these fields. Consider using metadata.pop("key", None) for safer removal.

  3. Line 112: The datetime format string "%Y-%m-%dT%H:%M:%S+00:00" hardcodes the UTC offset. If the timestamp ever uses Z suffix or a different offset, parsing will fail.

Suggested fix for safe key removal (Lines 118-123)
-    del metadata["start_time"]
-    del metadata["end_time"]
-
-    # TODO: Add these fields back once we've fixed the pipeline
-    del metadata["explanation"]
-    del metadata["keywords_detected"]
+    metadata.pop("start_time", None)
+    metadata.pop("end_time", None)
+
+    # TODO: Add these fields back once we've fixed the pipeline
+    metadata.pop("explanation", None)
+    metadata.pop("keywords_detected", None)
Suggested fix for datetime parsing (Line 112)
-    recorded_at = datetime.strptime(snippet["recorded_at"], "%Y-%m-%dT%H:%M:%S+00:00")
+    from datetime import timezone
+    from dateutil.parser import isoparse
+    recorded_at = isoparse(snippet["recorded_at"])

Alternatively, if you want to stay in the stdlib:

-    recorded_at = datetime.strptime(snippet["recorded_at"], "%Y-%m-%dT%H:%M:%S+00:00")
+    recorded_at = datetime.fromisoformat(snippet["recorded_at"])
🤖 Prompt for AI Agents
In `@src/processing_pipeline/stage_3/tasks.py` around lines 96 - 129, In
__get_metadata: replace the fragile .encode("latin-1").decode("unicode-escape")
approach with a safe decode path that first checks transcription type and
attempts decoding via codecs.decode(..., "unicode_escape") (or use a no-op if
decoding raises) while preserving the original transcription on error; use
metadata.pop("start_time", None), metadata.pop("end_time", None),
metadata.pop("explanation", None) and metadata.pop("keywords_detected", None)
instead of del to avoid KeyError; and parse recorded_at with a tolerant ISO
parser (e.g., datetime.fromisoformat(snippet["recorded_at"].replace("Z",
"+00:00")) or dateutil.parser.parse) before formatting so timestamps with "Z" or
different offsets don't break. Ensure references to metadata["transcription"],
metadata.pop(...) and recorded_at parsing are updated in __get_metadata.

Comment on lines +219 to +221
except Exception as e:
print(f"Failed to process {local_file}: {e}")
supabase_client.set_snippet_status(snippet["id"], ProcessingStatus.ERROR, str(e))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Bare Exception catch swallows errors silently — task appears to succeed on failure.

The except Exception at Line 219 catches all errors, sets the snippet status to ERROR, but never re-raises. This means:

  1. The process_snippet task will report success to the orchestrator even when processing failed.
  2. If supabase_client.set_snippet_status itself fails, both the original error and the status-update error are lost.

At minimum, re-raise after setting the error status so the caller/framework knows the task failed.

Suggested fix
     except Exception as e:
         print(f"Failed to process {local_file}: {e}")
-        supabase_client.set_snippet_status(snippet["id"], ProcessingStatus.ERROR, str(e))
+        try:
+            supabase_client.set_snippet_status(snippet["id"], ProcessingStatus.ERROR, str(e))
+        except Exception as status_err:
+            print(f"Failed to set error status for snippet {snippet['id']}: {status_err}")
+        raise
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except Exception as e:
print(f"Failed to process {local_file}: {e}")
supabase_client.set_snippet_status(snippet["id"], ProcessingStatus.ERROR, str(e))
except Exception as e:
print(f"Failed to process {local_file}: {e}")
try:
supabase_client.set_snippet_status(snippet["id"], ProcessingStatus.ERROR, str(e))
except Exception as status_err:
print(f"Failed to set error status for snippet {snippet['id']}: {status_err}")
raise
🧰 Tools
🪛 Ruff (0.14.14)

[warning] 219-219: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In `@src/processing_pipeline/stage_3/tasks.py` around lines 219 - 221, The except
Exception block in process_snippet currently swallows failures; change it to
re-raise after updating status: catch Exception as e, attempt to call
supabase_client.set_snippet_status(snippet["id"], ProcessingStatus.ERROR,
str(e)) inside its own try/except so failures updating Supabase are logged but
do not suppress the original error, then re-raise the original exception (raise)
so the orchestrator sees the task as failed; reference process_snippet,
supabase_client.set_snippet_status, ProcessingStatus.ERROR, local_file, and
snippet in your changes.

… enum

The H.1 Breaking News Protocol table used SCREAMING_CASE values
(VERIFIED_FALSE, PARTIALLY_VERIFIABLE, UNVERIFIABLE_*) that don't exist
in the verification_status enum, which would cause validation failures.
Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 8991dcc in 18 seconds. Click for details.
  • Reviewed 21 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

Workflow ID: wflow_WGamei1noZpb8GaV

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
prompts/stage_3/analysis_prompt.md (1)

1775-1948: ⚠️ Potential issue | 🟠 Major

Missing verification_evidence field in the complete example.

The schema requires verification_evidence (Line 638), and extensive documentation explains how to populate it (Lines 122-169, 575-609). However, the complete example output (Lines 1775-1948) does not include the verification_evidence field. This omission could result in the AI failing to provide this critical field in actual outputs.

The example should demonstrate a complete verification_evidence structure showing:

  • At least one entry in searches_performed with actual search queries
  • Corresponding results array with source details
  • Populated verification_summary
Example structure to add

Add after Line 1945 (before "thought_summaries"):

  "verification_evidence": {
    "searches_performed": [
      {
        "query": "vaccines mind control government",
        "search_intent": "Verify claim that vaccines are used for mind control",
        "result_status": "results_found",
        "results": [
          {
            "url": "https://www.factcheck.org/vaccine-myths/",
            "source_name": "FactCheck.org",
            "source_type": "tier1_factchecker",
            "publication_date": "2025-12-15",
            "title": "Debunking Vaccine Myths",
            "relevant_excerpt": "There is no scientific evidence that vaccines can control minds or contain mind-altering substances.",
            "relevance_to_claim": "contradicts_claim",
            "content_fetched": true
          }
        ]
      }
    ],
    "verification_summary": {
      "total_searches": 1,
      "claims_contradicted": 1,
      "claims_unverifiable": 0,
      "key_findings": "Fact-checking sources confirm that the mind control claim is false and has been repeatedly debunked."
    }
  },
🤖 Fix all issues with AI agents
In `@prompts/stage_3/analysis_prompt.md`:
- Line 1002: The schema adds a boolean field content_fetched but the prose never
explains its usage; update the prose (e.g., section C.1 around where
web_url_read and snippet vs full-content behavior is discussed) to state that
content_fetched should be true when the agent used web_url_read to retrieve the
full page content and false when only search snippet/metadata was available, or
alternatively remove the content_fetched property from the schema if you decide
it’s unnecessary; ensure references to content_fetched appear alongside the
web_url_read instructions and examples so callers know when to set it.
- Around line 321-325: The "THE GOLDEN RULE" sentence is ambiguous because it
can be read to override the 20% cap for claims under 24 hours; update that
sentence to explicitly reference the existing tiered caps in the table (the rows
with "MAX 20%" for "within 24 hours" and "MAX 30%" for "24-72 hours") and state
that the 30% maximum applies only to claims aged 24–72 hours while claims within
24 hours remain capped at 20%; keep the phrase "THE GOLDEN RULE" but replace the
current line with a clear, tiered statement that mentions "MAX 20% (0–24h)" and
"MAX 30% (24–72h)" to remove ambiguity.

Comment thread prompts/stage_3/analysis_prompt.md Outdated
Comment thread prompts/stage_3/analysis_prompt.md
The prose documentation mandates these fields but the JSON schemas and
Pydantic model had them as optional. publication_date allows null for
cases where the date is unavailable. Also documents content_fetched as
explicitly optional in the prose.
…fidence scores

confidence_scores.overall is an integer 0-100, not a percentage. Updated
all score references in analysis_prompt.md and system_instruction.md to
use explicit integer values (e.g., "30 (out of 100)") instead of "30%".
Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 03e05c1 in 16 seconds. Click for details.
  • Reviewed 64 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

Workflow ID: wflow_vSMdebzXzECu0JxA

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Update import script to reference stage 3 prompts from their
new location in the prompts/stage_3/ subdirectory, aligning
with the reorganized prompt directory structure.
Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 2639987 in 39 seconds. Click for details.
  • Reviewed 406 lines of code in 5 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

Workflow ID: wflow_UmF9gOMqvIB3LXUd

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/scripts/import_prompts_to_db.py (1)

47-51: ⚠️ Potential issue | 🟠 Major

Stage 4 paths in import_prompts_to_db.py are incorrect and will fail—use nested paths to match constants.py and existing files.

The mapping at lines 47–51 references old flat paths (prompts/Stage_4_*.md) that do not exist on disk. The runtime code in src/processing_pipeline/constants.py (lines 54–62) reads from nested paths (prompts/stage_4/*.md), which do exist. All other stages in import_prompts_to_db.py (STAGE_1, STAGE_3, GEMINI_TIMESTAMPED_TRANSCRIPTION) use nested paths. Update Stage 4 to match:

Fix
     PromptStage.STAGE_4: {
-        "system_instruction": "prompts/Stage_4_system_instruction.md",
-        "user_prompt": "prompts/Stage_4_review_prompt.md",
-        "output_schema": "prompts/Stage_4_output_schema.json",
+        "system_instruction": "prompts/stage_4/system_instruction.md",
+        "user_prompt": "prompts/stage_4/review_prompt.md",
+        "output_schema": "prompts/stage_4/output_schema.json",
     },
🤖 Fix all issues with AI agents
In `@prompts/stage_3/analysis_prompt.md`:
- Around line 977-1022: The verification_evidence schema is missing descriptive
guidance; update the verification_evidence object in analysis_prompt.md by
adding the missing descriptions from prompts/stage_3/output_schema.json: add
"Record of all web searches performed during fact-checking." to
searches_performed, "Individual search results with full details." to results,
and per-result property descriptions (e.g., source_type: "Classification of
source reliability tier.", publication_date: "Publication date or null if
unavailable.", relevant_excerpt: "Excerpt demonstrating relevance to the
claim.", relevance_to_claim: "How the result relates to the claim.") and add
descriptive text for verification_summary and its properties (total_searches,
claims_contradicted, claims_unverifiable, key_findings) so the
verification_evidence, searches_performed, results, source_type, and
verification_summary blocks include the same descriptions as in
output_schema.json.
🧹 Nitpick comments (3)
src/processing_pipeline/constants.py (2)

43-62: File handles are never closed.

Every helper uses a bare open(...) without a with statement or .close(), leaking file descriptors. Since you're already touching these lines, consider wrapping in with:

Example fix for one function
 def get_user_prompt_for_stage_3():
-    return open("prompts/stage_3/analysis_prompt.md", "r").read()
+    with open("prompts/stage_3/analysis_prompt.md", "r") as f:
+        return f.read()

Same applies to all six functions (lines 44, 48, 51, 55, 58, 62) and the pre-existing one on line 65.


93-99: Dead references to removed function in __main__ block.

Lines 94–99 reference get_timestamped_transcription_generation_output_schema, which was removed in this PR. While it's commented out so there's no runtime impact, it adds confusion. Consider cleaning up or removing these stale comments.

src/processing_pipeline/stage_1/flows.py (1)

224-224: Good refactor using get_audio_file_metadata.

This eliminates the duplicated metadata construction logic.

Note: redo_main_detection (lines 158–166) still manually constructs the same metadata dict. Consider using get_audio_file_metadata(audio_file) there too for consistency.

Comment on lines +977 to 1022
"verification_evidence": {
"type": "object",
"required": ["searches_performed", "verification_summary"],
"description": "Complete documentation of all web searches performed during fact-checking.",
"properties": {
"searches_performed": {
"type": "array",
"items": {
"type": "object",
"required": ["query", "search_intent", "result_status", "results"],
"properties": {
"query": { "type": "string" },
"search_intent": { "type": "string" },
"result_status": { "type": "string", "enum": ["results_found", "no_results", "results_inconclusive"] },
"results": {
"type": "array",
"items": {
"type": "object",
"required": ["url", "source_name", "source_type", "publication_date", "title", "relevant_excerpt", "relevance_to_claim"],
"properties": {
"url": { "type": "string" },
"source_name": { "type": "string" },
"source_type": { "type": "string", "enum": ["tier1_wire_service", "tier1_factchecker", "tier2_major_news", "tier3_regional_news", "official_source", "other"] },
"publication_date": { "type": ["string", "null"] },
"title": { "type": "string" },
"relevant_excerpt": { "type": "string" },
"relevance_to_claim": { "type": "string", "enum": ["supports_claim", "contradicts_claim", "provides_context", "inconclusive"] },
"content_fetched": { "type": "boolean" }
}
}
}
}
}
},
"verification_summary": {
"type": "object",
"required": ["total_searches", "claims_contradicted", "claims_unverifiable", "key_findings"],
"properties": {
"total_searches": { "type": "integer" },
"claims_contradicted": { "type": "integer" },
"claims_unverifiable": { "type": "integer" },
"key_findings": { "type": "string" }
}
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find output_schema.json
fd "output_schema.json" prompts/stage_3

Repository: PublicDataWorks/verdad

Length of output: 100


🏁 Script executed:

#!/bin/bash
# Read the output_schema.json file
cat prompts/stage_3/output_schema.json

Repository: PublicDataWorks/verdad

Length of output: 21501


Add descriptions to verification_evidence schema to match output_schema.json.

The verification_evidence schema in analysis_prompt.md (lines 977-1022) is missing descriptions for multiple fields that are documented in prompts/stage_3/output_schema.json. The structural definitions match, but the prompt version lacks guidance descriptions for searches_performed, results, and verification_summary properties. These missing descriptions reduce clarity for the model about field purposes and requirements.

Add the descriptions from output_schema.json to ensure the prompt provides complete guidance. Key missing descriptions:

  • searches_performed: "Record of all web searches performed during fact-checking."
  • results: "Individual search results with full details."
  • Each result item property needs its description (e.g., source_type: "Classification of source reliability tier.")
  • verification_summary and all its properties need descriptions
🤖 Prompt for AI Agents
In `@prompts/stage_3/analysis_prompt.md` around lines 977 - 1022, The
verification_evidence schema is missing descriptive guidance; update the
verification_evidence object in analysis_prompt.md by adding the missing
descriptions from prompts/stage_3/output_schema.json: add "Record of all web
searches performed during fact-checking." to searches_performed, "Individual
search results with full details." to results, and per-result property
descriptions (e.g., source_type: "Classification of source reliability tier.",
publication_date: "Publication date or null if unavailable.", relevant_excerpt:
"Excerpt demonstrating relevance to the claim.", relevance_to_claim: "How the
result relates to the claim.") and add descriptive text for verification_summary
and its properties (total_searches, claims_contradicted, claims_unverifiable,
key_findings) so the verification_evidence, searches_performed, results,
source_type, and verification_summary blocks include the same descriptions as in
output_schema.json.

@quancao-ea quancao-ea merged commit e1d27ae into main Feb 9, 2026
2 checks passed
@quancao-ea quancao-ea deleted the fix/stage-3-prompts-add-verification-fields-and-web-search branch March 17, 2026 02:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant