VER-298: Refactor Stage 3 prompts: add verification fields and fix web search tools#59
Conversation
Add guidance for recognizing and correctly handling events that occurred after the model's training cutoff date, preventing false positives when web search results conflict with pre-training knowledge.
Remove tool tracking from CLI method and extract verification evidence from validated responses instead of API metadata. Streamline the analysis flow to focus on structured output validation.
Move all Stage 3 prompt files from the root prompts directory to a new stage_3 subdirectory for better organization and consistency with the project structure.
Split Stage 3 processing pipeline into separate modules for better organization and maintainability. Extract executor logic, flow definitions, and task functions into dedicated files while maintaining backward compatibility through __init__.py exports.
WalkthroughRefactors Stage 3 into a package (executors, flows, tasks, models), adds verification_evidence and verification_status to prompts/schemas, removes legacy timestamped-transcription artifacts, updates imports/tests to the new layout, and adjusts Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Flow as in_depth_analysis
participant Executor as Stage3Executor
participant Supabase
participant S3
participant GeminiCLI as Gemini CLI
participant GeminiSDK as Gemini SDK
participant GoogleAPI as Google Search
Client->>Flow: start in_depth_analysis
Flow->>Supabase: fetch snippet & mark PROCESSING
Flow->>S3: download audio file
Flow->>Executor: run(gemini_key, model, audio, metadata, prompt_version)
Executor->>GeminiCLI: analyze via CLI (custom search)
alt CLI success
GeminiCLI-->>Executor: analysis (streamed JSON)
else CLI fails
Executor->>GeminiSDK: upload audio
Executor->>GoogleAPI: perform grounding search
GoogleAPI-->>Executor: search results
GeminiSDK-->>Executor: grounded analysis
end
Executor->>Executor: validate with Pydantic (Stage3Output)
alt validation fails
Executor->>GeminiSDK: restructure into required schema
GeminiSDK-->>Executor: formatted JSON
end
Executor-->>Flow: analysis + verification_evidence
Flow->>Supabase: update snippet with results
Flow->>S3: delete local audio
Flow-->>Client: complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Pylint (4.0.4)src/processing_pipeline/constants.py************* Module .pylintrc ... [truncated 7209 characters] ... ini_timestamped_transcription_generation_prompt", src/processing_pipeline/stage_1/flows.py************* Module .pylintrc ... [truncated 13107 characters] ... module": "src.processing_pipeline.stage_1.flows", src/scripts/import_prompts_to_db.py************* Module .pylintrc ... [truncated 7167 characters] ... essage-id": "R0912" Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed everything up to 9db4a22 in 16 seconds. Click for details.
- Reviewed
2153lines of code in18files - Skipped
0files when reviewing. - Skipped posting
0draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
Workflow ID: wflow_BnW9dJuHfIFj6bsd
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
Summary of ChangesHello @quancao-ea, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Stage 3 disinformation analysis pipeline by introducing a more robust and transparent factual verification process. It refactors prompts to include detailed instructions for web search, mandates comprehensive documentation of verification evidence, and updates the output schema to capture this information. The underlying Python codebase has also been restructured for improved modularity and to support the new verification mechanisms, ensuring that analyses are grounded in current, verifiable information and explicitly account for the model's knowledge limitations. Highlights
Changelog
Ignored Files
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and valuable refactoring of the Stage 3 processing pipeline. The changes enhance disinformation detection by adding detailed, structured verification instructions to the prompts, including guidance on using web search tools, documenting evidence, and handling knowledge cutoffs. The codebase is also improved by splitting the monolithic stage_3.py into a more modular structure with separate files for executors, flows, tasks, and models, which improves maintainability.
My review has identified a critical inconsistency in the new prompts that will likely cause schema validation errors, and a medium-severity issue where potentially useful debugging information from the Gemini SDK is being discarded. Addressing these points will help ensure the new pipeline is robust and reliable.
| if not response.text: | ||
| finish_reason = response.candidates[0].finish_reason if response.candidates else None | ||
|
|
||
| if finish_reason == FinishReason.MAX_TOKENS: | ||
| raise ValueError("The response from Gemini was too long and was cut off in step 1.") | ||
|
|
||
| print(f"Response finish reason: {finish_reason}") | ||
| raise ValueError("No response from Gemini in step 1.") | ||
|
|
||
| return { | ||
| "text": response.text, | ||
| "thought_summaries": thoughts, | ||
| } |
There was a problem hiding this comment.
The grounding_metadata available from the Google Search tool in the SDK response is being discarded here. While the new approach relies on the LLM generating the verification_evidence field, the structured grounding metadata from the API can be very valuable for logging, debugging, or as a fallback.
Consider capturing and returning this metadata. The calling run method could then decide whether to log it or handle it otherwise.
grounding_metadata = (
response.candidates[0].grounding_metadata.model_dump_json(indent=2) if response.candidates else None
)
if not response.text:
finish_reason = response.candidates[0].finish_reason if response.candidates else None
if finish_reason == FinishReason.MAX_TOKENS:
raise ValueError("The response from Gemini was too long and was cut off in step 1.")
print(f"Response finish reason: {finish_reason}")
raise ValueError("No response from Gemini in step 1.")
return {
"text": response.text,
"thought_summaries": thoughts,
"grounding_metadata_sdk": grounding_metadata,
}There was a problem hiding this comment.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tests/processing_pipeline/test_stage_3.py (2)
265-277:⚠️ Potential issue | 🔴 CriticalTest
test_stage_3_executormismatches the actualStage3Executor.run()signature and return type.Two issues:
- Missing
prompt_versionparameter (line 265–270):Stage3Executor.run()requiresprompt_version: dict, but the test call omits it — this will raise aTypeError.- Return type mismatch (lines 273–276): The test asserts
resultis atupleof length 2, butrun()returns adictwith keys"response","grounding_metadata","thought_summaries". The assertion will fail.Proposed fix sketch
result = Stage3Executor.run( gemini_key="test-key", model_name=GeminiModel.GEMINI_FLASH_LATEST, audio_file="test.mp3", metadata={"test": "metadata"}, + prompt_version={"user_prompt": "test", "system_instruction": "test", "output_schema": {}}, ) - # Result should be a tuple (response, grounding_metadata) - assert isinstance(result, tuple) - assert len(result) == 2 - response, grounding_metadata = result - assert isinstance(response, dict) - assert grounding_metadata is not None + # Result should be a dict with response, grounding_metadata, thought_summaries + assert isinstance(result, dict) + assert "response" in result + assert "grounding_metadata" in result + assert "thought_summaries" in result
153-160:⚠️ Potential issue | 🔴 CriticalUpdate mocks to return dict instead of tuple—
Stage3Executor.run()returns{"response": ..., "grounding_metadata": ..., "thought_summaries": ...}, not a tuple.The mocks at lines 158, 188, and 349 set
mock_run.return_value = (mock_gemini_response, "test_grounding_metadata")(tuple), butStage3Executor.run()returns a dict. This breaksanalyze_snippet()which unpacks the return value with**analyzing_responseand later accesses dict keys likeanalyzing_response["response"]andanalyzing_response["thought_summaries"].Update all three mocks to:
mock_run.return_value = { "response": mock_gemini_response, "grounding_metadata": "test_grounding_metadata", "thought_summaries": [] }
🤖 Fix all issues with AI agents
In `@prompts/stage_3/analysis_prompt.md`:
- Around line 105-118: Add a language identifier to the fenced code block that
contains the workflow example starting with searxng_web_search("Maduro captured
US forces 2026") and the subsequent web_url_read and verification_evidence
entries; change the opening fence from ``` to ```text (or ```plaintext) so the
block is lint-compliant and renders correctly.
- Around line 993-1003: The prose mandates that "publication_date" and "title"
are required but the JSON Schema's "required" array omits them; update the
schema by adding "publication_date" and "title" to the "required" array so the
schema and prose align, and either remove or document the "content_fetched"
property in the prose (or explicitly mark it optional in prose) to resolve the
mismatch between schema and documentation.
- Around line 317-324: The Breaking News table's `verification_status` entries
use enums that don't match the JSON schema; update the table rows so
`verification_status` only uses the schema's allowed values ("verified_false",
"verified_true", "uncertain", "insufficient_evidence") and replace the current
entries `VERIFIED_FALSE`, `PARTIALLY_VERIFIABLE`, `UNVERIFIABLE_BREAKING`,
`UNVERIFIABLE_RECENT`, and `UNVERIFIABLE_STALE` accordingly in the table under
the `verification_status` column in prompts/stage_3/analysis_prompt.md so the
model outputs valid enum values.
In `@prompts/stage_3/output_schema.json`:
- Around line 16-17: The schema currently requires "thought_summaries", which
forces Gemini in __structure_with_schema (executors.py) to fabricate that field
when the original analysis lacks it; remove "thought_summaries" from the
required array in prompts/stage_3/output_schema.json so it becomes optional,
leaving the property in properties if needed, and rely on run() and
thought_summaries_from_api to populate summaries separately; update any
comments/tests that assumed it was required.
In `@prompts/stage_3/system_instruction.md`:
- Line 29: The wording "maximum score is 30%" and "(20% for claims within 24
hours)" is ambiguous given confidence_scores.overall is an integer 0–100; update
the text in system_instruction.md to state explicit integer values/out-of-100
wording (e.g., "maximum score is 30 (out of 100)" and "(20 out of 100 for claims
within 24 hours)") so the policy refers to confidence_scores.overall
unambiguously.
In `@src/processing_pipeline/stage_3/executors.py`:
- Around line 161-166: The CLI env dict currently hardcodes
os.environ["GOOGLE_GEMINI_KEY"] causing mismatch when run() receives a different
gemini_key; update the flow to pass the gemini_key parameter into
__analyze_with_custom_search (add gemini_key to its signature and all callers
from run()), and use that gemini_key when building the env dict (set
"GEMINI_API_KEY" from the gemini_key parameter instead of
os.environ["GOOGLE_GEMINI_KEY"]) so the SDK client and CLI subprocesss use the
same credentials.
- Around line 128-131: The decorators are in the wrong order for the methods
__analyze_with_custom_search and __analyze_with_google_search_grounding: move
`@classmethod` to be the topmost decorator (apply it last) and place
`@optional_task`(log_prints=True, retries=3) below it so Prefect's task wrapper
receives a plain function; swap the two decorators for both named methods
accordingly.
- Around line 76-82: The code constructs user_prompt_with_file using
os.path.basename(audio_file) which relies on the subprocess's working directory
to find the file; instead, pass an unambiguous path to the Gemini CLI or ensure
the subprocess's cwd is explicitly set. Update the call site that builds
user_prompt_with_file in __analyze_with_custom_search (and any caller like
download_audio_file_from_s3) to use the full audio_file path (not
os.path.basename) and modify the subprocess.run invocation in the Gemini CLI
executor to set cwd to the directory containing audio_file if you must pass a
basename; this ensures the CLI can always locate the file regardless of working
directory changes. Ensure references to audio_file, user_prompt_with_file,
download_audio_file_from_s3, and subprocess.run are updated consistently.
In `@src/processing_pipeline/stage_3/flows.py`:
- Around line 65-78: The downloaded local file returned by
download_audio_file_from_s3 is only removed on the happy path, so if
process_snippet (or any code between download and delete) raises the file leaks;
wrap the snippet processing and deletion in a try/finally around the block that
calls process_snippet (and mirror the change in the alternate branch that also
deletes local_file) so that os.remove(local_file) always runs in the finally,
referencing the local_file variable, the process_snippet(...) call, and
download_audio_file_from_s3(...) to locate the code to change.
In `@src/processing_pipeline/stage_3/tasks.py`:
- Around line 219-221: The except Exception block in process_snippet currently
swallows failures; change it to re-raise after updating status: catch Exception
as e, attempt to call supabase_client.set_snippet_status(snippet["id"],
ProcessingStatus.ERROR, str(e)) inside its own try/except so failures updating
Supabase are logged but do not suppress the original error, then re-raise the
original exception (raise) so the orchestrator sees the task as failed;
reference process_snippet, supabase_client.set_snippet_status,
ProcessingStatus.ERROR, local_file, and snippet in your changes.
- Around line 96-129: In __get_metadata: replace the fragile
.encode("latin-1").decode("unicode-escape") approach with a safe decode path
that first checks transcription type and attempts decoding via
codecs.decode(..., "unicode_escape") (or use a no-op if decoding raises) while
preserving the original transcription on error; use metadata.pop("start_time",
None), metadata.pop("end_time", None), metadata.pop("explanation", None) and
metadata.pop("keywords_detected", None) instead of del to avoid KeyError; and
parse recorded_at with a tolerant ISO parser (e.g.,
datetime.fromisoformat(snippet["recorded_at"].replace("Z", "+00:00")) or
dateutil.parser.parse) before formatting so timestamps with "Z" or different
offsets don't break. Ensure references to metadata["transcription"],
metadata.pop(...) and recorded_at parsing are updated in __get_metadata.
🧹 Nitpick comments (7)
src/processing_pipeline/stage_3/executors.py (4)
83-84: Unused exception variablee.The caught
RuntimeErroris assigned toebut never referenced in the except block (e.g., for logging). Either log it or drop the binding.Proposed fix
- except RuntimeError as e: - print("Falling back to Google Search grounding with SDK...") + except RuntimeError: + print("Falling back to Google Search grounding with SDK...")Or, better, log the original error for debuggability:
- except RuntimeError as e: - print("Falling back to Google Search grounding with SDK...") + except RuntimeError as e: + print(f"CLI failed ({e}), falling back to Google Search grounding with SDK...")
196-201: Silently swallowing JSON parse errors may hide CLI issues.Lines 200–201 catch
json.JSONDecodeErrorwith a barepass, so malformed CLI output lines are silently ignored. Consider at least logging a debug message for troubleshooting.Proposed fix
except json.JSONDecodeError: - pass + print(f"Skipping non-JSON CLI output line: {line[:200]}")
296-309: Restructuring step uses a hardcoded model (GEMINI_FLASH_LATEST) instead of the caller'smodel_name.
__structure_with_schemaalways usesGeminiModel.GEMINI_FLASH_LATEST(line 299) rather than themodel_namethe caller selected. If this is intentional (cheaper model for JSON restructuring), a brief comment would help. Otherwise, consider threading through the model name or making it configurable.
29-40: Consider whether a class with only@classmethodmethods is the right abstraction.
Stage3Executorhas no instance state — every method is a@classmethod. A plain module with top-level functions (or a namespace class without instantiation) would be simpler and avoid the decorator-ordering pitfalls with@classmethod+@optional_task.src/processing_pipeline/stage_3/flows.py (1)
60-60:idshadows the Python built-in.Using
idas a loop variable shadows the built-inid()function. Consider renaming tosnippet_id.Proposed fix
- for id in snippet_ids: - snippet = fetch_a_specific_snippet_from_supabase(supabase_client, id) + for snippet_id in snippet_ids: + snippet = fetch_a_specific_snippet_from_supabase(supabase_client, snippet_id)prompts/stage_3/analysis_prompt.md (1)
575-609: Verification evidence template referencesclaims_supportedcount in the summary section — it's missing.The
verification_summarytemplate (Lines 601–606) includesclaims_contradictedandclaims_unverifiable, but notclaims_supportedorclaims_verified. Yet Line 168 describes a scenario where results support a claim. Aclaims_supportedcount would complete the picture and make the summary internally consistent with the scoring framework.This also applies to the schema at Lines 1009–1017, which similarly lacks a
claims_supportedfield.src/processing_pipeline/stage_3/tasks.py (1)
132-181: DRY:Stage3Executor.run(...)call is repeated three times with identical arguments.The same call with the same four keyword arguments appears at Lines 139–145, 153–159, and 171–177. Extract a helper to reduce duplication and make future changes less error-prone.
Suggested refactor
`@optional_task`(log_prints=True) def analyze_snippet(gemini_key, audio_file, metadata, prompt_version: dict): main_model = GeminiModel.GEMINI_2_5_PRO fallback_model = GeminiModel.GEMINI_FLASH_LATEST + def _run(model): + return { + **Stage3Executor.run( + gemini_key=gemini_key, + model_name=model, + audio_file=audio_file, + metadata=metadata, + prompt_version=prompt_version, + ), + "analyzed_by": model, + } + try: print(f"Attempting analysis with {main_model}") - analyzing_response = Stage3Executor.run( - gemini_key=gemini_key, - model_name=main_model, - audio_file=audio_file, - metadata=metadata, - prompt_version=prompt_version, - ) - return { - **analyzing_response, - "analyzed_by": main_model, - } + return _run(main_model) except errors.ServerError as e: print(f"Server error with {main_model} (code {e.code}): {e.message}") print(f"Falling back to {fallback_model}") - analyzing_response = Stage3Executor.run( - gemini_key=gemini_key, - model_name=fallback_model, - audio_file=audio_file, - metadata=metadata, - prompt_version=prompt_version, - ) - return { - **analyzing_response, - "analyzed_by": fallback_model, - } + return _run(fallback_model) except errors.ClientError as e: if e.code in [HTTPStatus.UNAUTHORIZED, HTTPStatus.FORBIDDEN]: print(f"Auth error with {main_model} (code {e.code}): {e.message}") raise else: print(f"Client error with {main_model} (code {e.code}): {e.message}") print(f"Falling back to {fallback_model}") - analyzing_response = Stage3Executor.run( - gemini_key=gemini_key, - model_name=fallback_model, - audio_file=audio_file, - metadata=metadata, - prompt_version=prompt_version, - ) - return { - **analyzing_response, - "analyzed_by": fallback_model, - } + return _run(fallback_model)
| ``` | ||
| 1. searxng_web_search("Maduro captured US forces 2026") | ||
| -> Found: https://reuters.com/article/..., https://apnews.com/... | ||
|
|
||
| 2. web_url_read("https://reuters.com/article/...") | ||
| -> Extract: "Reuters reports that as of [date], Venezuelan President Nicolás Maduro remains in power..." | ||
|
|
||
| 3. Document in verification_evidence: | ||
| - url: "https://reuters.com/article/..." | ||
| - source_name: "Reuters" | ||
| - source_type: "tier1_wire_service" | ||
| - relevant_excerpt: "[exact quote from article]" | ||
| - relevance_to_claim: "contradicts_claim" | ||
| ``` |
There was a problem hiding this comment.
Missing language identifier on fenced code block.
The code block at Line 105 has no language specified. Since this block illustrates a workflow example (not a specific language), adding a language like text or plaintext would satisfy linting and improve rendering.
Suggested fix
- ```
+ ```text
1. searxng_web_search("Maduro captured US forces 2026")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ``` | |
| 1. searxng_web_search("Maduro captured US forces 2026") | |
| -> Found: https://reuters.com/article/..., https://apnews.com/... | |
| 2. web_url_read("https://reuters.com/article/...") | |
| -> Extract: "Reuters reports that as of [date], Venezuelan President Nicolás Maduro remains in power..." | |
| 3. Document in verification_evidence: | |
| - url: "https://reuters.com/article/..." | |
| - source_name: "Reuters" | |
| - source_type: "tier1_wire_service" | |
| - relevant_excerpt: "[exact quote from article]" | |
| - relevance_to_claim: "contradicts_claim" | |
| ``` |
🧰 Tools
🪛 markdownlint-cli2 (0.20.0)
[warning] 105-105: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In `@prompts/stage_3/analysis_prompt.md` around lines 105 - 118, Add a language
identifier to the fenced code block that contains the workflow example starting
with searxng_web_search("Maduro captured US forces 2026") and the subsequent
web_url_read and verification_evidence entries; change the opening fence from
``` to ```text (or ```plaintext) so the block is lint-compliant and renders
correctly.
| "thought_summaries", | ||
| "verification_evidence" |
There was a problem hiding this comment.
thought_summaries being required in output_schema may cause issues in the restructuring fallback.
In executors.py __structure_with_schema, this schema is passed to Gemini to restructure free-text analysis into JSON. The original analysis text may not contain reasoning-process summaries, forcing the model to fabricate a thought_summaries value. Consider making it optional in this schema (remove from required) since run() already handles thought summaries separately via thought_summaries_from_api.
🤖 Prompt for AI Agents
In `@prompts/stage_3/output_schema.json` around lines 16 - 17, The schema
currently requires "thought_summaries", which forces Gemini in
__structure_with_schema (executors.py) to fabricate that field when the original
analysis lacks it; remove "thought_summaries" from the required array in
prompts/stage_3/output_schema.json so it becomes optional, leaving the property
in properties if needed, and rely on run() and thought_summaries_from_api to
populate summaries separately; update any comments/tests that assumed it was
required.
| @optional_task(log_prints=True, retries=3) | ||
| @classmethod | ||
| def __analyze_with_custom_search( | ||
| cls, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find the definition of optional_task
rg "def optional_task" --type py -A 5Repository: PublicDataWorks/verdad
Length of output: 408
🏁 Script executed:
# Search for imports of optional_task in the file
rg "optional_task|from|import" src/processing_pipeline/stage_3/executors.py | head -20Repository: PublicDataWorks/verdad
Length of output: 914
🏁 Script executed:
# Check the full context around lines 128-131 and 217-218
sed -n '125,135p' src/processing_pipeline/stage_3/executors.py
sed -n '214,224p' src/processing_pipeline/stage_3/executors.pyRepository: PublicDataWorks/verdad
Length of output: 740
🌐 Web query:
Prefect @task decorator classmethod compatibility Python
💡 Result:
Prefect’s @task is compatible with @classmethod / @staticmethod only if the method decorator is applied on top of @task (i.e., @task must see a plain function, not a classmethod/staticmethod object). Prefect will raise a TypeError otherwise: “@classmethod should be applied on top of @task``” / same for @staticmethod. [1]
from prefect import task
class MyTasks:
`@classmethod`
`@task`
def build(cls, x: int) -> int:
return x + 1
`@staticmethod`
`@task`
def clean(x: int) -> int:
return x * 2Notes:
- For flows, Prefect explicitly supports decorating instance/class/static methods (example shown in docs). [2]
- For tasks, follow the decorator ordering rule above to avoid the
TypeError. [1]
Sources: Prefect task decorator implementation/API reference [1]; Prefect workflow docs showing class/static methods with decorators [2].
🏁 Script executed:
# Verify all instances of `@optional_task` with `@classmethod` combo
rg "@optional_task" -A 1 src/processing_pipeline/stage_3/executors.pyRepository: PublicDataWorks/verdad
Length of output: 196
🏁 Script executed:
# Check utils.py to understand optional_task full implementation
sed -n '1,50p' src/utils.py | head -30Repository: PublicDataWorks/verdad
Length of output: 1126
Incorrect decorator ordering: @optional_task before @classmethod will cause a runtime TypeError.
Decorators are applied bottom-up. Currently, @classmethod is applied first (creating a descriptor), then @optional_task wraps that descriptor. Prefect's internal @task decorator requires a plain function, not a classmethod descriptor, and will raise TypeError: "@classmethod should be applied on top of @task" at runtime.
Swap the order on lines 128–129 and 217–218 so @classmethod is applied last:
Required fix
- `@optional_task`(log_prints=True, retries=3)
- `@classmethod`
+ `@classmethod`
+ `@optional_task`(log_prints=True, retries=3)
def __analyze_with_custom_search(And similarly on lines 217–218 for __analyze_with_google_search_grounding.
🤖 Prompt for AI Agents
In `@src/processing_pipeline/stage_3/executors.py` around lines 128 - 131, The
decorators are in the wrong order for the methods __analyze_with_custom_search
and __analyze_with_google_search_grounding: move `@classmethod` to be the topmost
decorator (apply it last) and place `@optional_task`(log_prints=True, retries=3)
below it so Prefect's task wrapper receives a plain function; swap the two
decorators for both named methods accordingly.
| env = { | ||
| "PATH": os.environ.get("PATH", ""), | ||
| "HOME": os.environ.get("HOME", ""), | ||
| "GEMINI_API_KEY": os.environ["GOOGLE_GEMINI_KEY"], | ||
| "GEMINI_SYSTEM_MD": system_instruction_path, | ||
| "SEARXNG_URL": os.environ.get("SEARXNG_URL", ""), |
There was a problem hiding this comment.
API key inconsistency: gemini_key parameter vs. os.environ["GOOGLE_GEMINI_KEY"].
run() accepts gemini_key as a parameter and uses it to construct the SDK client (line 62), but the CLI env dict hardcodes os.environ["GOOGLE_GEMINI_KEY"] (line 164). If a caller passes a different key, the CLI and SDK paths will use different credentials. Either thread gemini_key through or document that the CLI always uses the env var.
Proposed fix — use the parameter
env = {
"PATH": os.environ.get("PATH", ""),
"HOME": os.environ.get("HOME", ""),
- "GEMINI_API_KEY": os.environ["GOOGLE_GEMINI_KEY"],
+ "GEMINI_API_KEY": gemini_key,
"GEMINI_SYSTEM_MD": system_instruction_path,
"SEARXNG_URL": os.environ.get("SEARXNG_URL", ""),
}Note: gemini_key is not currently available inside __analyze_with_custom_search because it's not passed as a parameter. You'll need to add it to the method signature and pass it from run().
🤖 Prompt for AI Agents
In `@src/processing_pipeline/stage_3/executors.py` around lines 161 - 166, The CLI
env dict currently hardcodes os.environ["GOOGLE_GEMINI_KEY"] causing mismatch
when run() receives a different gemini_key; update the flow to pass the
gemini_key parameter into __analyze_with_custom_search (add gemini_key to its
signature and all callers from run()), and use that gemini_key when building the
env dict (set "GEMINI_API_KEY" from the gemini_key parameter instead of
os.environ["GOOGLE_GEMINI_KEY"]) so the SDK client and CLI subprocesss use the
same credentials.
| local_file = download_audio_file_from_s3(s3_client, R2_BUCKET_NAME, snippet["file_path"]) | ||
|
|
||
| # Process the snippet | ||
| process_snippet( | ||
| supabase_client, | ||
| snippet, | ||
| local_file, | ||
| GEMINI_KEY, | ||
| skip_review=skip_review, | ||
| prompt_version=prompt_version, | ||
| ) | ||
|
|
||
| print(f"Delete the downloaded snippet clip: {local_file}") | ||
| os.remove(local_file) |
There was a problem hiding this comment.
Downloaded audio file leaks if process_snippet raises.
os.remove(local_file) on lines 78 and 97 runs only on the happy path. If process_snippet (or anything before the os.remove) throws, the file remains on disk. Wrap in try/finally:
Proposed fix (for the snippet_ids branch; apply analogously to the else branch)
local_file = download_audio_file_from_s3(s3_client, R2_BUCKET_NAME, snippet["file_path"])
- # Process the snippet
- process_snippet(
- supabase_client,
- snippet,
- local_file,
- GEMINI_KEY,
- skip_review=skip_review,
- prompt_version=prompt_version,
- )
-
- print(f"Delete the downloaded snippet clip: {local_file}")
- os.remove(local_file)
+ try:
+ # Process the snippet
+ process_snippet(
+ supabase_client,
+ snippet,
+ local_file,
+ GEMINI_KEY,
+ skip_review=skip_review,
+ prompt_version=prompt_version,
+ )
+ finally:
+ print(f"Delete the downloaded snippet clip: {local_file}")
+ os.remove(local_file)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| local_file = download_audio_file_from_s3(s3_client, R2_BUCKET_NAME, snippet["file_path"]) | |
| # Process the snippet | |
| process_snippet( | |
| supabase_client, | |
| snippet, | |
| local_file, | |
| GEMINI_KEY, | |
| skip_review=skip_review, | |
| prompt_version=prompt_version, | |
| ) | |
| print(f"Delete the downloaded snippet clip: {local_file}") | |
| os.remove(local_file) | |
| local_file = download_audio_file_from_s3(s3_client, R2_BUCKET_NAME, snippet["file_path"]) | |
| try: | |
| # Process the snippet | |
| process_snippet( | |
| supabase_client, | |
| snippet, | |
| local_file, | |
| GEMINI_KEY, | |
| skip_review=skip_review, | |
| prompt_version=prompt_version, | |
| ) | |
| finally: | |
| print(f"Delete the downloaded snippet clip: {local_file}") | |
| os.remove(local_file) |
🤖 Prompt for AI Agents
In `@src/processing_pipeline/stage_3/flows.py` around lines 65 - 78, The
downloaded local file returned by download_audio_file_from_s3 is only removed on
the happy path, so if process_snippet (or any code between download and delete)
raises the file leaks; wrap the snippet processing and deletion in a try/finally
around the block that calls process_snippet (and mirror the change in the
alternate branch that also deletes local_file) so that os.remove(local_file)
always runs in the finally, referencing the local_file variable, the
process_snippet(...) call, and download_audio_file_from_s3(...) to locate the
code to change.
| def __get_metadata(snippet): | ||
| snippet_uuid = snippet["id"] | ||
| flagged_snippets = snippet["stage_1_llm_response"]["detection_result"]["flagged_snippets"] | ||
| metadata = {} | ||
| for flagged_snippet in flagged_snippets: | ||
| if flagged_snippet["uuid"] == snippet_uuid: | ||
| metadata = flagged_snippet | ||
| try: | ||
| # Handle escaped unicode characters in the transcription | ||
| metadata["transcription"] = flagged_snippet["transcription"].encode("latin-1").decode("unicode-escape") | ||
| except (UnicodeError, AttributeError) as e: | ||
| # Fallback to original transcription if decoding fails | ||
| print(f"Warning: Failed to decode transcription: {e}") | ||
| metadata["transcription"] = flagged_snippet["transcription"] | ||
|
|
||
| audio_file = snippet["audio_file"] | ||
| recorded_at = datetime.strptime(snippet["recorded_at"], "%Y-%m-%dT%H:%M:%S+00:00") | ||
| audio_file["recorded_at"] = recorded_at.strftime("%B %-d, %Y %-I:%M %p") | ||
| audio_file["recording_day_of_week"] = recorded_at.strftime("%A") | ||
| audio_file["time_zone"] = "UTC" | ||
| metadata["additional_info"] = audio_file | ||
|
|
||
| del metadata["start_time"] | ||
| del metadata["end_time"] | ||
|
|
||
| # TODO: Add these fields back once we've fixed the pipeline | ||
| del metadata["explanation"] | ||
| del metadata["keywords_detected"] | ||
|
|
||
| metadata["start_time"] = snippet["start_time"].split(":", 1)[1] | ||
| metadata["end_time"] = snippet["end_time"].split(":", 1)[1] | ||
| metadata["duration"] = snippet["duration"].split(":", 1)[1] | ||
|
|
||
| return metadata |
There was a problem hiding this comment.
Fragile encoding hack and unsafe del on keys that may not exist.
Several concerns in __get_metadata:
-
Line 105:
.encode("latin-1").decode("unicode-escape")— This will throwUnicodeEncodeErrorfor any non-Latin-1 character (e.g., Arabic, CJK), which would be caught but could mask data issues. Given the pipeline handles Spanish and Arabic audio, Arabic transcriptions would routinely hit this path. -
Lines 118–123:
del metadata["start_time"],del metadata["end_time"],del metadata["explanation"],del metadata["keywords_detected"]— These will raiseKeyErrorif the flagged snippet lacks any of these fields. Consider usingmetadata.pop("key", None)for safer removal. -
Line 112: The datetime format string
"%Y-%m-%dT%H:%M:%S+00:00"hardcodes the UTC offset. If the timestamp ever usesZsuffix or a different offset, parsing will fail.
Suggested fix for safe key removal (Lines 118-123)
- del metadata["start_time"]
- del metadata["end_time"]
-
- # TODO: Add these fields back once we've fixed the pipeline
- del metadata["explanation"]
- del metadata["keywords_detected"]
+ metadata.pop("start_time", None)
+ metadata.pop("end_time", None)
+
+ # TODO: Add these fields back once we've fixed the pipeline
+ metadata.pop("explanation", None)
+ metadata.pop("keywords_detected", None)Suggested fix for datetime parsing (Line 112)
- recorded_at = datetime.strptime(snippet["recorded_at"], "%Y-%m-%dT%H:%M:%S+00:00")
+ from datetime import timezone
+ from dateutil.parser import isoparse
+ recorded_at = isoparse(snippet["recorded_at"])Alternatively, if you want to stay in the stdlib:
- recorded_at = datetime.strptime(snippet["recorded_at"], "%Y-%m-%dT%H:%M:%S+00:00")
+ recorded_at = datetime.fromisoformat(snippet["recorded_at"])🤖 Prompt for AI Agents
In `@src/processing_pipeline/stage_3/tasks.py` around lines 96 - 129, In
__get_metadata: replace the fragile .encode("latin-1").decode("unicode-escape")
approach with a safe decode path that first checks transcription type and
attempts decoding via codecs.decode(..., "unicode_escape") (or use a no-op if
decoding raises) while preserving the original transcription on error; use
metadata.pop("start_time", None), metadata.pop("end_time", None),
metadata.pop("explanation", None) and metadata.pop("keywords_detected", None)
instead of del to avoid KeyError; and parse recorded_at with a tolerant ISO
parser (e.g., datetime.fromisoformat(snippet["recorded_at"].replace("Z",
"+00:00")) or dateutil.parser.parse) before formatting so timestamps with "Z" or
different offsets don't break. Ensure references to metadata["transcription"],
metadata.pop(...) and recorded_at parsing are updated in __get_metadata.
| except Exception as e: | ||
| print(f"Failed to process {local_file}: {e}") | ||
| supabase_client.set_snippet_status(snippet["id"], ProcessingStatus.ERROR, str(e)) |
There was a problem hiding this comment.
Bare Exception catch swallows errors silently — task appears to succeed on failure.
The except Exception at Line 219 catches all errors, sets the snippet status to ERROR, but never re-raises. This means:
- The
process_snippettask will report success to the orchestrator even when processing failed. - If
supabase_client.set_snippet_statusitself fails, both the original error and the status-update error are lost.
At minimum, re-raise after setting the error status so the caller/framework knows the task failed.
Suggested fix
except Exception as e:
print(f"Failed to process {local_file}: {e}")
- supabase_client.set_snippet_status(snippet["id"], ProcessingStatus.ERROR, str(e))
+ try:
+ supabase_client.set_snippet_status(snippet["id"], ProcessingStatus.ERROR, str(e))
+ except Exception as status_err:
+ print(f"Failed to set error status for snippet {snippet['id']}: {status_err}")
+ raise📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except Exception as e: | |
| print(f"Failed to process {local_file}: {e}") | |
| supabase_client.set_snippet_status(snippet["id"], ProcessingStatus.ERROR, str(e)) | |
| except Exception as e: | |
| print(f"Failed to process {local_file}: {e}") | |
| try: | |
| supabase_client.set_snippet_status(snippet["id"], ProcessingStatus.ERROR, str(e)) | |
| except Exception as status_err: | |
| print(f"Failed to set error status for snippet {snippet['id']}: {status_err}") | |
| raise |
🧰 Tools
🪛 Ruff (0.14.14)
[warning] 219-219: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In `@src/processing_pipeline/stage_3/tasks.py` around lines 219 - 221, The except
Exception block in process_snippet currently swallows failures; change it to
re-raise after updating status: catch Exception as e, attempt to call
supabase_client.set_snippet_status(snippet["id"], ProcessingStatus.ERROR,
str(e)) inside its own try/except so failures updating Supabase are logged but
do not suppress the original error, then re-raise the original exception (raise)
so the orchestrator sees the task as failed; reference process_snippet,
supabase_client.set_snippet_status, ProcessingStatus.ERROR, local_file, and
snippet in your changes.
… enum The H.1 Breaking News Protocol table used SCREAMING_CASE values (VERIFIED_FALSE, PARTIALLY_VERIFIABLE, UNVERIFIABLE_*) that don't exist in the verification_status enum, which would cause validation failures.
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed 8991dcc in 18 seconds. Click for details.
- Reviewed
21lines of code in1files - Skipped
0files when reviewing. - Skipped posting
0draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
Workflow ID: wflow_WGamei1noZpb8GaV
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
prompts/stage_3/analysis_prompt.md (1)
1775-1948:⚠️ Potential issue | 🟠 MajorMissing
verification_evidencefield in the complete example.The schema requires
verification_evidence(Line 638), and extensive documentation explains how to populate it (Lines 122-169, 575-609). However, the complete example output (Lines 1775-1948) does not include theverification_evidencefield. This omission could result in the AI failing to provide this critical field in actual outputs.The example should demonstrate a complete
verification_evidencestructure showing:
- At least one entry in
searches_performedwith actual search queries- Corresponding
resultsarray with source details- Populated
verification_summaryExample structure to add
Add after Line 1945 (before "thought_summaries"):
"verification_evidence": { "searches_performed": [ { "query": "vaccines mind control government", "search_intent": "Verify claim that vaccines are used for mind control", "result_status": "results_found", "results": [ { "url": "https://www.factcheck.org/vaccine-myths/", "source_name": "FactCheck.org", "source_type": "tier1_factchecker", "publication_date": "2025-12-15", "title": "Debunking Vaccine Myths", "relevant_excerpt": "There is no scientific evidence that vaccines can control minds or contain mind-altering substances.", "relevance_to_claim": "contradicts_claim", "content_fetched": true } ] } ], "verification_summary": { "total_searches": 1, "claims_contradicted": 1, "claims_unverifiable": 0, "key_findings": "Fact-checking sources confirm that the mind control claim is false and has been repeatedly debunked." } },
🤖 Fix all issues with AI agents
In `@prompts/stage_3/analysis_prompt.md`:
- Line 1002: The schema adds a boolean field content_fetched but the prose never
explains its usage; update the prose (e.g., section C.1 around where
web_url_read and snippet vs full-content behavior is discussed) to state that
content_fetched should be true when the agent used web_url_read to retrieve the
full page content and false when only search snippet/metadata was available, or
alternatively remove the content_fetched property from the schema if you decide
it’s unnecessary; ensure references to content_fetched appear alongside the
web_url_read instructions and examples so callers know when to set it.
- Around line 321-325: The "THE GOLDEN RULE" sentence is ambiguous because it
can be read to override the 20% cap for claims under 24 hours; update that
sentence to explicitly reference the existing tiered caps in the table (the rows
with "MAX 20%" for "within 24 hours" and "MAX 30%" for "24-72 hours") and state
that the 30% maximum applies only to claims aged 24–72 hours while claims within
24 hours remain capped at 20%; keep the phrase "THE GOLDEN RULE" but replace the
current line with a clear, tiered statement that mentions "MAX 20% (0–24h)" and
"MAX 30% (24–72h)" to remove ambiguity.
The prose documentation mandates these fields but the JSON schemas and Pydantic model had them as optional. publication_date allows null for cases where the date is unavailable. Also documents content_fetched as explicitly optional in the prose.
…fidence scores confidence_scores.overall is an integer 0-100, not a percentage. Updated all score references in analysis_prompt.md and system_instruction.md to use explicit integer values (e.g., "30 (out of 100)") instead of "30%".
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed 03e05c1 in 16 seconds. Click for details.
- Reviewed
64lines of code in2files - Skipped
0files when reviewing. - Skipped posting
0draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
Workflow ID: wflow_vSMdebzXzECu0JxA
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
Update import script to reference stage 3 prompts from their new location in the prompts/stage_3/ subdirectory, aligning with the reorganized prompt directory structure.
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed 2639987 in 39 seconds. Click for details.
- Reviewed
406lines of code in5files - Skipped
0files when reviewing. - Skipped posting
0draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
Workflow ID: wflow_UmF9gOMqvIB3LXUd
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/scripts/import_prompts_to_db.py (1)
47-51:⚠️ Potential issue | 🟠 MajorStage 4 paths in
import_prompts_to_db.pyare incorrect and will fail—use nested paths to matchconstants.pyand existing files.The mapping at lines 47–51 references old flat paths (
prompts/Stage_4_*.md) that do not exist on disk. The runtime code insrc/processing_pipeline/constants.py(lines 54–62) reads from nested paths (prompts/stage_4/*.md), which do exist. All other stages inimport_prompts_to_db.py(STAGE_1, STAGE_3, GEMINI_TIMESTAMPED_TRANSCRIPTION) use nested paths. Update Stage 4 to match:Fix
PromptStage.STAGE_4: { - "system_instruction": "prompts/Stage_4_system_instruction.md", - "user_prompt": "prompts/Stage_4_review_prompt.md", - "output_schema": "prompts/Stage_4_output_schema.json", + "system_instruction": "prompts/stage_4/system_instruction.md", + "user_prompt": "prompts/stage_4/review_prompt.md", + "output_schema": "prompts/stage_4/output_schema.json", },
🤖 Fix all issues with AI agents
In `@prompts/stage_3/analysis_prompt.md`:
- Around line 977-1022: The verification_evidence schema is missing descriptive
guidance; update the verification_evidence object in analysis_prompt.md by
adding the missing descriptions from prompts/stage_3/output_schema.json: add
"Record of all web searches performed during fact-checking." to
searches_performed, "Individual search results with full details." to results,
and per-result property descriptions (e.g., source_type: "Classification of
source reliability tier.", publication_date: "Publication date or null if
unavailable.", relevant_excerpt: "Excerpt demonstrating relevance to the
claim.", relevance_to_claim: "How the result relates to the claim.") and add
descriptive text for verification_summary and its properties (total_searches,
claims_contradicted, claims_unverifiable, key_findings) so the
verification_evidence, searches_performed, results, source_type, and
verification_summary blocks include the same descriptions as in
output_schema.json.
🧹 Nitpick comments (3)
src/processing_pipeline/constants.py (2)
43-62: File handles are never closed.Every helper uses a bare
open(...)without awithstatement or.close(), leaking file descriptors. Since you're already touching these lines, consider wrapping inwith:Example fix for one function
def get_user_prompt_for_stage_3(): - return open("prompts/stage_3/analysis_prompt.md", "r").read() + with open("prompts/stage_3/analysis_prompt.md", "r") as f: + return f.read()Same applies to all six functions (lines 44, 48, 51, 55, 58, 62) and the pre-existing one on line 65.
93-99: Dead references to removed function in__main__block.Lines 94–99 reference
get_timestamped_transcription_generation_output_schema, which was removed in this PR. While it's commented out so there's no runtime impact, it adds confusion. Consider cleaning up or removing these stale comments.src/processing_pipeline/stage_1/flows.py (1)
224-224: Good refactor usingget_audio_file_metadata.This eliminates the duplicated metadata construction logic.
Note:
redo_main_detection(lines 158–166) still manually constructs the same metadata dict. Consider usingget_audio_file_metadata(audio_file)there too for consistency.
| "verification_evidence": { | ||
| "type": "object", | ||
| "required": ["searches_performed", "verification_summary"], | ||
| "description": "Complete documentation of all web searches performed during fact-checking.", | ||
| "properties": { | ||
| "searches_performed": { | ||
| "type": "array", | ||
| "items": { | ||
| "type": "object", | ||
| "required": ["query", "search_intent", "result_status", "results"], | ||
| "properties": { | ||
| "query": { "type": "string" }, | ||
| "search_intent": { "type": "string" }, | ||
| "result_status": { "type": "string", "enum": ["results_found", "no_results", "results_inconclusive"] }, | ||
| "results": { | ||
| "type": "array", | ||
| "items": { | ||
| "type": "object", | ||
| "required": ["url", "source_name", "source_type", "publication_date", "title", "relevant_excerpt", "relevance_to_claim"], | ||
| "properties": { | ||
| "url": { "type": "string" }, | ||
| "source_name": { "type": "string" }, | ||
| "source_type": { "type": "string", "enum": ["tier1_wire_service", "tier1_factchecker", "tier2_major_news", "tier3_regional_news", "official_source", "other"] }, | ||
| "publication_date": { "type": ["string", "null"] }, | ||
| "title": { "type": "string" }, | ||
| "relevant_excerpt": { "type": "string" }, | ||
| "relevance_to_claim": { "type": "string", "enum": ["supports_claim", "contradicts_claim", "provides_context", "inconclusive"] }, | ||
| "content_fetched": { "type": "boolean" } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| }, | ||
| "verification_summary": { | ||
| "type": "object", | ||
| "required": ["total_searches", "claims_contradicted", "claims_unverifiable", "key_findings"], | ||
| "properties": { | ||
| "total_searches": { "type": "integer" }, | ||
| "claims_contradicted": { "type": "integer" }, | ||
| "claims_unverifiable": { "type": "integer" }, | ||
| "key_findings": { "type": "string" } | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find output_schema.json
fd "output_schema.json" prompts/stage_3Repository: PublicDataWorks/verdad
Length of output: 100
🏁 Script executed:
#!/bin/bash
# Read the output_schema.json file
cat prompts/stage_3/output_schema.jsonRepository: PublicDataWorks/verdad
Length of output: 21501
Add descriptions to verification_evidence schema to match output_schema.json.
The verification_evidence schema in analysis_prompt.md (lines 977-1022) is missing descriptions for multiple fields that are documented in prompts/stage_3/output_schema.json. The structural definitions match, but the prompt version lacks guidance descriptions for searches_performed, results, and verification_summary properties. These missing descriptions reduce clarity for the model about field purposes and requirements.
Add the descriptions from output_schema.json to ensure the prompt provides complete guidance. Key missing descriptions:
searches_performed: "Record of all web searches performed during fact-checking."results: "Individual search results with full details."- Each result item property needs its description (e.g., source_type: "Classification of source reliability tier.")
verification_summaryand all its properties need descriptions
🤖 Prompt for AI Agents
In `@prompts/stage_3/analysis_prompt.md` around lines 977 - 1022, The
verification_evidence schema is missing descriptive guidance; update the
verification_evidence object in analysis_prompt.md by adding the missing
descriptions from prompts/stage_3/output_schema.json: add "Record of all web
searches performed during fact-checking." to searches_performed, "Individual
search results with full details." to results, and per-result property
descriptions (e.g., source_type: "Classification of source reliability tier.",
publication_date: "Publication date or null if unavailable.", relevant_excerpt:
"Excerpt demonstrating relevance to the claim.", relevance_to_claim: "How the
result relates to the claim.") and add descriptive text for verification_summary
and its properties (total_searches, claims_contradicted, claims_unverifiable,
key_findings) so the verification_evidence, searches_performed, results,
source_type, and verification_summary blocks include the same descriptions as in
output_schema.json.
Important
Refactors Stage 3 prompts and processing logic to include web-based verification and structured evidence, updating schemas and tests accordingly.
verification_statusfield with values:verified_false,verified_true,uncertain,insufficient_evidence..gemini/settings.json.prompts/stage_3/analysis_prompt.md,system_instruction.md, andoutput_schema.jsonfor new verification logic.executors.py,flows.py,tasks.py, andmodels.py.src/processing_pipeline/stage_3.pyandtimestamped_transcription_generator.py.test_stage_1.pyandtest_stage_3.pyto reflect Stage 3 restructuring and new verification logic.This description was created by
for 2639987. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit