VER-277: Update prompts in Stage 3 to utilize Google Search result better#34
Conversation
Summary of ChangesHello @quancao-ea, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces significant enhancements to the Stage 3 analysis process, primarily by integrating a mandatory Google Search verification step for all factual claims. The changes aim to foster greater objectivity and accuracy in identifying disinformation by refining the confidence scoring framework to account for the presence or absence of contradictory evidence. This ensures that classifications are based on robust, verifiable information, preventing premature judgments and promoting a more balanced assessment of content. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request effectively refines the Stage 3 analysis prompts and data models to better leverage Google Search for evidence-based verification. The changes introduce a more rigorous, neutral, and objective process for scoring potential disinformation by mandating claim verification before scoring and providing a clearer framework for different confidence levels. The updates are consistently applied across the markdown prompts, JSON schema, and Python Pydantic models. My feedback includes a minor suggestion to refine the list of example reputable sources in the prompt to ensure they are broadly applicable for general fact-checking.
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughThis change reframes Stage 3 from disinformation-detection to verification-first, adds explicit web-grounding (Google search) and uncertainty handling, updates prompts and system guidance, extends the output schema and Python models with an Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Stage3 as Stage 3 Analysis
participant Google as Google Search
participant Scorer as Scoring Engine
User->>Stage3: Submit flagged snippet + metadata/transcription
Stage3->>Stage3: Extract claims & build queries (include recording date/time)
Stage3->>Google: Perform web-grounding searches
Google-->>Stage3: Return search results / sources
rect rgb(230,245,255)
Note over Scorer: Evidence-based scoring (web-grounded)
Stage3->>Scorer: Present claims + search evidence
alt Strong contradictory evidence
Scorer->>Scorer: High score (misleading/false) with sources
else Strong supporting evidence
Scorer->>Scorer: High score (verified accurate) with sources
else Insufficient/conflicting evidence
Scorer->>Scorer: Low score (uncertain)
Scorer->>Scorer: Set uncertain_claims_scored_low = true
end
end
Scorer-->>Stage3: Score, evidence list, validation flags
Stage3->>Stage3: Self-review & validation checklist
Stage3-->>User: Output JSON with explanation, scores, and validation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (5 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Pylint (4.0.2)src/processing_pipeline/stage_3.py************* Module .pylintrc ... [truncated 17713 characters] ... nvention", Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed everything up to e2089a2 in 2 minutes and 5 seconds. Click for details.
- Reviewed
255lines of code in4files - Skipped
0files when reviewing. - Skipped posting
5draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. prompts/Stage_3_analysis_prompt.md:171
- Draft comment:
The updated instructions now clearly require verifying factual claims with Google Search and include a new checklist item for uncertain claims. This enhances clarity and ensures conservative scoring when evidence is insufficient. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. prompts/Stage_3_output_schema.json:558
- Draft comment:
New 'uncertain_claims_scored_low' field has been added to the validation checklist and output schema, ensuring consistency with the revised analysis guidelines. The updated description for explanation is also clear. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
3. prompts/Stage_3_system_instruction.md:7
- Draft comment:
The system instructions now emphasize verifying all factual claims using Google Search and prioritizing recent information, which improves the overall evidence-based analysis. The guidelines are clear and comprehensive. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
4. src/processing_pipeline/stage_3_models.py:15
- Draft comment:
Model field descriptions, such as in the Explanation class and the Language class (using 'register_' with alias 'register'), have been updated to align with the revised analysis guidelines. The changes are consistent and improve clarity. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
5. prompts/Stage_3_analysis_prompt.md:195
- Draft comment:
Consider revising the tense for consistency. While the previous items use present tense (e.g., "if you cannot cite", "if web search confirms"), this bullet uses past tense "if you found". It may be clearer to change it to "if you find no reliable information". - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50% This is a very minor grammatical consistency issue. The comment is technically correct - there is a tense inconsistency. However, I need to consider: 1) Is this important enough to warrant a comment? 2) Is it clearly actionable? 3) Does it meet the bar of "strong evidence" that it's a real issue? The inconsistency is real and the suggestion is clear. However, this is a very minor style issue in a markdown documentation file. The rules say "Do NOT make comments that are obvious or unimportant" and "Do NOT comment unless there is clearly a code change required." This is more of a style preference than a code change requirement. The meaning is clear either way. While the tense inconsistency is real, this could be considered too minor and pedantic. The document is readable and understandable with either tense. This might fall under "obvious or unimportant" comments that should be avoided. The author might find this nitpicky rather than helpful. However, consistency in technical documentation is valuable, especially in instructions that will be followed by an AI system. The comment is actionable, specific, and includes a clear suggestion. It's about a line that was actually changed in this PR, so it's relevant to the diff. For documentation quality, maintaining consistent tense throughout parallel bullet points is a reasonable standard. This is a borderline case. The comment is technically correct and provides a clear, actionable suggestion for improving consistency. However, it's very minor and could be seen as pedantic. Given the rules emphasize not making "obvious or unimportant" comments, and this is a very minor style issue that doesn't affect functionality or clarity, I should lean toward deleting it.
Workflow ID: wflow_gNmdhZmhweLTZT0M
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed bf0af1f in 43 seconds. Click for details.
- Reviewed
62lines of code in2files - Skipped
0files when reviewing. - Skipped posting
3draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. prompts/Stage_3_analysis_prompt.md:1
- Draft comment:
The updated prompt now clearly outlines the new metadata and includes current date/time for verification. This enhances clarity. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. src/processing_pipeline/stage_3.py:1
- Draft comment:
Importing 'timezone' along with 'datetime' is appropriate for formatting the current timestamp. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
3. src/processing_pipeline/stage_3.py:277
- Draft comment:
Including the current date and time in the user prompt is a useful enhancement. Consider double‐checking the strftime format (using '%-d' and '%-I') for cross-platform compatibility. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
Workflow ID: wflow_ExckfrSxmNGQKrNP
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
prompts/Stage_3_analysis_prompt.md(9 hunks)src/processing_pipeline/stage_3.py(2 hunks)
🔇 Additional comments (9)
prompts/Stage_3_analysis_prompt.md (9)
3-4: LGTM: Strong reframing toward objective verification.The shift from "disinformation detection" to "independent verification and analysis" with explicit guidance not to assume content is disinformation aligns well with the PR's evidence-based approach.
11-22: LGTM: Clear metadata structure with time-sensitive verification support.The clarifications about transcription scope (line 16) and addition of current date/time (lines 21-22) properly align with the code changes in stage_3.py and support time-sensitive fact verification.
93-93: LGTM: Neutral framing supports verification-first approach.The updated explanation description now accommodates both disinformation-detected and verified-accurate outcomes, consistent with the objective verification mandate.
129-175: LGTM: Well-structured evidence-based scoring framework.The revised confidence scoring framework with explicit Google Search verification requirements is logically coherent:
- Scores 80-100: Strong contradictory evidence
- Scores 40-79: Some contradictory evidence (capped at 40 if none found)
- Scores 1-39: No contradictory evidence found
- Score 0: Claims verified as true
The conservative scoring principle (lines 172-174) and explicit boundary at score 40 (line 154) effectively prevent false positives from absence of information.
166-166: Past review concern resolved: Entertainment sources removed.The previous review comment about Variety and Deadline has been addressed—only general-purpose, high-authority news sources (CNN, BBC, Reuters, AP) remain in the examples.
194-195: LGTM: Validation checklist aligned with uncertain claims handling.The addition of the uncertain claims checklist item (line 195) properly reinforces the conservative scoring principle (0-40 max) established in the scoring framework.
448-448: LGTM: Explanation schema updated for neutral framing.The explanation field description now properly reflects both disinformation-detected and verified-accurate outcomes, consistent with the instruction changes.
1718-1727: LGTM: Final instructions reinforce evidence-based verification framework.The numbered instructions effectively summarize the key changes: Google Search verification requirement (line 1718), evidence-based scoring principle (line 1723), and conservative uncertainty handling (line 1726). All align with the framework established in earlier sections.
563-572: Model consistency verified—no action required.The
ValidationChecklistPydantic model insrc/processing_pipeline/stage_3_models.py(line 52) correctly includes theuncertain_claims_scored_low: boolfield, matching the schema update in the prompt file. The model and schema are in sync.
| f"{cls.USER_PROMPT}\n\n" | ||
| f"Here is the metadata of the attached audio clip:\n{json.dumps(metadata, indent=2)}\n\n" | ||
| f"Here is the current date and time: {datetime.now(timezone.utc).strftime('%B %-d, %Y %-I:%M %p UTC')}\n\n" |
There was a problem hiding this comment.
Platform compatibility issue: Unix-specific datetime format codes.
The format codes %-d and %-I use the dash modifier to suppress leading zeros, which only works on Unix/Linux/macOS platforms. On Windows, these will raise a ValueError, potentially breaking the pipeline.
Apply this diff to use platform-independent formatting:
- f"{cls.USER_PROMPT}\n\n"
- f"Here is the metadata of the attached audio clip:\n{json.dumps(metadata, indent=2)}\n\n"
- f"Here is the current date and time: {datetime.now(timezone.utc).strftime('%B %-d, %Y %-I:%M %p UTC')}\n\n"
+ f"{cls.USER_PROMPT}\n\n"
+ f"Here is the metadata of the attached audio clip:\n{json.dumps(metadata, indent=2)}\n\n"
+ f"Here is the current date and time: {datetime.now(timezone.utc).strftime('%B %d, %Y %I:%M %p UTC').replace(' 0', ' ')}\n\n"Alternatively, use zero-padded formats if leading zeros are acceptable:
- f"Here is the current date and time: {datetime.now(timezone.utc).strftime('%B %-d, %Y %-I:%M %p UTC')}\n\n"
+ f"Here is the current date and time: {datetime.now(timezone.utc).strftime('%B %d, %Y %I:%M %p UTC')}\n\n"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| f"{cls.USER_PROMPT}\n\n" | |
| f"Here is the metadata of the attached audio clip:\n{json.dumps(metadata, indent=2)}\n\n" | |
| f"Here is the current date and time: {datetime.now(timezone.utc).strftime('%B %-d, %Y %-I:%M %p UTC')}\n\n" | |
| f"{cls.USER_PROMPT}\n\n" | |
| f"Here is the metadata of the attached audio clip:\n{json.dumps(metadata, indent=2)}\n\n" | |
| f"Here is the current date and time: {datetime.now(timezone.utc).strftime('%B %d, %Y %I:%M %p UTC').replace(' 0', ' ')}\n\n" |
🤖 Prompt for AI Agents
In src/processing_pipeline/stage_3.py around lines 278 to 280, the f-string uses
Unix-only strftime codes %-d and %-I which raise ValueError on Windows; replace
those with platform-independent formatting by using zero-padded %d and %I (or
build the date/time string from datetime attributes) and then remove leading
zeros in a platform-safe way (e.g., format with %d/%I and strip the leading '0'
or construct the day/hour using dt.day and 12-hour conversion) so the output
remains the same but works on Windows and Unix.
Important
Refactor Stage 3 analysis to prioritize evidence-based verification, update scoring framework, and enhance models and documentation for clarity and accuracy.
Stage_3_analysis_prompt.md.Stage_3_analysis_prompt.md.Stage_3_analysis_prompt.md.uncertain_claims_scored_lowtoValidationChecklistinstage_3_models.py.Explanationmodel to reflect analysis findings instage_3_models.py.stage_3.py.Stage3Executor.run()instage_3.py.Stage_3_system_instruction.mdandStage_3_output_schema.json.This description was created by
for bf0af1f. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
Refactor
New Features
Documentation