VER-286: Add analyzed_by field#42
Conversation
Summary of ChangesHello @quancao-ea, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the processing pipeline by introducing a new Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
WalkthroughRefactored the Stage 3 analysis pipeline to standardize data structures across Changes
Sequence DiagramsequenceDiagram
participant Caller
participant analyze_snippet as analyze_snippet()
participant Stage3Executor as Stage3Executor.run()
participant update_snippet as update_snippet_in_supabase()
participant Supabase as SupabaseClient
Caller->>analyze_snippet: Call with gemini_key, audio_file, metadata
activate analyze_snippet
Note over analyze_snippet: Try primary model,<br/>fallback to alternate on error
analyze_snippet-->>Caller: Return {response, grounding_metadata, analyzed_by}
deactivate analyze_snippet
Caller->>Stage3Executor: Call run() with enriched response
activate Stage3Executor
Stage3Executor->>Stage3Executor: Process with ProcessingStatus enums<br/>(PROCESSING, PROCESSED, etc.)
Stage3Executor-->>Caller: Return {response, grounding_metadata, analyzed_by}
deactivate Stage3Executor
Caller->>update_snippet: Call with enriched structure
activate update_snippet
update_snippet->>Supabase: update_snippet(id, ..., analyzed_by, status, ...)
activate Supabase
Supabase-->>update_snippet: Confirm update
deactivate Supabase
deactivate update_snippet
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20–25 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Pylint (4.0.2)src/processing_pipeline/supabase_utils.py************* Module .pylintrc ... [truncated 31790 characters] ... ase_utils", src/processing_pipeline/stage_3.py************* Module .pylintrc ... [truncated 22082 characters] ... C0411" Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request successfully adds the analyzed_by field to track which model was used for analysis and refactors status updates to use the ProcessingStatus enum for better code clarity and maintainability. The changes are well-implemented across stage_3.py and supabase_utils.py. I've included one suggestion to reduce code duplication in the analyze_snippet function, which will improve its readability and make it easier to maintain.
| try: | ||
| print(f"Attempting analysis with {main_model}") | ||
| return Stage3Executor.run( | ||
| analyzing_response = Stage3Executor.run( | ||
| gemini_key=gemini_key, | ||
| model_name=main_model, | ||
| audio_file=audio_file, | ||
| metadata=metadata, | ||
| ) | ||
| return { | ||
| **analyzing_response, | ||
| "analyzed_by": main_model, | ||
| } | ||
| except errors.ServerError as e: | ||
| print(f"Server error with {main_model} (code {e.code}): {e.message}") | ||
| print(f"Falling back to {fallback_model}") | ||
| return Stage3Executor.run( | ||
| analyzing_response = Stage3Executor.run( | ||
| gemini_key=gemini_key, | ||
| model_name=fallback_model, | ||
| audio_file=audio_file, | ||
| metadata=metadata, | ||
| ) | ||
| return { | ||
| **analyzing_response, | ||
| "analyzed_by": fallback_model, | ||
| } | ||
| except errors.ClientError as e: | ||
| if e.code in [HTTPStatus.UNAUTHORIZED, HTTPStatus.FORBIDDEN]: | ||
| print(f"Auth error with {main_model} (code {e.code}): {e.message}") | ||
| raise | ||
| else: | ||
| print(f"Client error with {main_model} (code {e.code}): {e.message}") | ||
| print(f"Falling back to {fallback_model}") | ||
| return Stage3Executor.run( | ||
| analyzing_response = Stage3Executor.run( | ||
| gemini_key=gemini_key, | ||
| model_name=fallback_model, | ||
| audio_file=audio_file, | ||
| metadata=metadata, | ||
| ) | ||
| return { | ||
| **analyzing_response, | ||
| "analyzed_by": fallback_model, | ||
| } |
There was a problem hiding this comment.
There's some code duplication in the except blocks for ServerError and ClientError, as they both contain the same fallback logic. You can combine these two except blocks to reduce code repetition and improve maintainability.
try:
print(f"Attempting analysis with {main_model}")
analyzing_response = Stage3Executor.run(
gemini_key=gemini_key,
model_name=main_model,
audio_file=audio_file,
metadata=metadata,
)
return {
**analyzing_response,
"analyzed_by": main_model,
}
except (errors.ServerError, errors.ClientError) as e:
if isinstance(e, errors.ClientError) and e.code in [
HTTPStatus.UNAUTHORIZED,
HTTPStatus.FORBIDDEN,
]:
print(f"Auth error with {main_model} (code {e.code}): {e.message}")
raise
error_type = type(e).__name__
print(f"{error_type} with {main_model} (code {e.code}): {e.message}")
print(f"Falling back to {fallback_model}")
analyzing_response = Stage3Executor.run(
gemini_key=gemini_key,
model_name=fallback_model,
audio_file=audio_file,
metadata=metadata,
)
return {
**analyzing_response,
"analyzed_by": fallback_model,
}There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed everything up to 3f6dec5 in 1 minute and 55 seconds. Click for details.
- Reviewed
158lines of code in2files - Skipped
0files when reviewing. - Skipped posting
5draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. src/processing_pipeline/stage_3.py:83
- Draft comment:
New parameter 'analyzed_by' added to update_snippet_in_supabase. Ensure the DB schema and migrations include this field. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 15% vs. threshold = 50% This comment falls into the category of "ensure that..." or "verify that..." comments that the rules explicitly say should be removed. The comment is asking the PR author to ensure the DB schema includes this field, but: 1. If the schema doesn't include the field, this would be caught when the code runs (runtime error) 2. The comment doesn't point to a specific code problem in the diff 3. It's asking the author to double-check something rather than pointing out a definite issue 4. I cannot see the database schema or migrations from this file, so I cannot verify if this is actually a problem According to the rules: "Do NOT ask the PR author to confirm their intention, to explain, to double-check things, to ensure the behavior is intended" and "If X, then Y is an issue" type speculative comments should not be made. This comment is essentially saying "ensure X exists" which is speculative. However, it's possible that this is a legitimate concern if the database schema truly doesn't support this field yet. The author might have forgotten to add the database migration. But I cannot verify this from the current file alone, and the rules say to ignore cross-file issues and only comment if there's STRONG EVIDENCE of a problem. The rules are clear that I should delete comments that ask the author to "ensure" or "verify" things. Even if there might be a database schema issue, I cannot see evidence of it in this file, and such issues would be caught during testing or deployment. The comment is speculative and not actionable based solely on what's visible in this diff. This comment should be deleted because it asks the PR author to "ensure" something about the database schema, which is speculative and not directly actionable from the code changes shown. It would be caught by testing if it's actually a problem, and I have no evidence from this file that there's an issue.
2. src/processing_pipeline/stage_3.py:102
- Draft comment:
Passed 'analyzed_by' to supabase_client.update_snippet; verify the underlying method and DB column are updated accordingly. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is asking the PR author to verify that the underlying method and DB column are updated accordingly. This falls under asking the author to ensure the behavior is intended or to double-check things, which is against the rules.
3. src/processing_pipeline/stage_3.py:162
- Draft comment:
In analyze_snippet, ensure all branches return a dictionary with keys 'response', 'grounding_metadata', and the new 'analyzed_by'. - Reason this comment was not posted:
Comment looked like it was already resolved.
4. src/processing_pipeline/stage_3.py:214
- Draft comment:
process_snippet now destructures the analyzing_response to access 'response', 'grounding_metadata', and 'analyzed_by'. Confirm that analyze_snippet consistently supplies these keys. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is asking the PR author to confirm that a function consistently supplies certain keys. This falls under the rule of not asking the author to confirm or ensure behavior. The comment does not provide a specific suggestion or point out a clear issue, so it should be removed.
5. src/processing_pipeline/supabase_utils.py:221
- Draft comment:
update_snippet in supabase_utils now receives and uses the 'analyzed_by' parameter. Ensure the target table schema supports this new field. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =40%<= threshold50%The comment is asking the author to ensure that the target table schema supports a new field. This is similar to asking the author to double-check something, which is against the rules. However, it does point out a specific change in the code, which could be useful for the author to consider. The comment is borderline, but it leans towards being a request for confirmation rather than a direct suggestion or observation.
Workflow ID: wflow_dUJ8Vz5woyqCebVw
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/processing_pipeline/supabase_utils.py (1)
185-204: Update test call sites to include the required analyzed_by parameter.The new
analyzed_byparameter lacks a default value and is required. Two test cases will raiseTypeErrorat runtime:
tests/test_supabase_utils.py:260— missinganalyzed_bytests/test_supabase_utils.py:458— missinganalyzed_byThe production code in
src/processing_pipeline/stage_3.py:87was correctly updated.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
src/processing_pipeline/stage_3.py(6 hunks)src/processing_pipeline/supabase_utils.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/processing_pipeline/stage_3.py (3)
src/processing_pipeline/constants.py (1)
ProcessingStatus(14-20)src/processing_pipeline/processing_utils.py (1)
postprocess_snippet(53-59)src/processing_pipeline/supabase_utils.py (1)
set_snippet_status(69-79)
🪛 Ruff (0.14.4)
src/processing_pipeline/stage_3.py
162-165: Consider moving this statement to an else block
(TRY300)
230-230: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (2)
src/processing_pipeline/stage_3.py (2)
154-178: Nice addition tracking analyzed_by.Bundling the model attribution alongside the response keeps provenance intact through the pipeline.
212-227: Good move standardizing on ProcessingStatus.Swapping string literals for the enum keeps downstream status handling consistent and less error-prone.
| return { | ||
| "response": validated_output, | ||
| "grounding_metadata": grounding_metadata, | ||
| } | ||
|
|
||
| # Step 2: Structure with response_schema (if validation failed) | ||
| return cls.__structure_with_schema(client, analysis_text), grounding_metadata | ||
| return { | ||
| "response": cls.__structure_with_schema(client, analysis_text), | ||
| "grounding_metadata": grounding_metadata, | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
Validate downstream callers for the new dict return.
Stage3Executor.run previously yielded positional values; after this change it returns a mapping. Any remaining callers still unpacking the old tuple will now break.
🏁 Script executed:
#!/bin/bash
# Locate every usage of Stage3Executor.run to ensure callers expect a dict.
rg -n "Stage3Executor\.run" --type=pyLength of output: 932
I need to examine how callers actually use the return value from Stage3Executor.run to verify if the review concern is valid.
---
🏁 Script executed:
```shell
#!/bin/bash
# Extract context around production calls to Stage3Executor.run in stage_3.py
echo "=== Line 156 context ==="
sed -n '150,165p' src/processing_pipeline/stage_3.py
echo -e "\n=== Line 169 context ==="
sed -n '163,178p' src/processing_pipeline/stage_3.py
echo -e "\n=== Line 186 context ==="
sed -n '180,195p' src/processing_pipeline/stage_3.py
Length of output: 1802
🏁 Script executed:
#!/bin/bash
# Extract context around test calls to Stage3Executor.run
echo "=== Test line 265 context ==="
sed -n '260,275p' tests/processing_pipeline/test_stage_3.py
echo -e "\n=== Test line 282 context ==="
sed -n '277,290p' tests/processing_pipeline/test_stage_3.pyLength of output: 1386
🏁 Script executed:
#!/bin/bash
# Search for any tuple unpacking patterns with Stage3Executor.run or analyzing_response
rg -n -A2 "analyzing_response\s*=" src/processing_pipeline/stage_3.py | head -40Length of output: 549
🏁 Script executed:
#!/bin/bash
# Check for any tuple unpacking patterns in tests
rg -n -A2 "result\s*=" tests/processing_pipeline/test_stage_3.py | grep -A2 "Stage3Executor"Length of output: 139
🏁 Script executed:
#!/bin/bash
# Look for tuple unpacking (a, b = ...) patterns with analyzing_response or Stage3Executor
rg -n "\w+\s*,\s*\w+\s*=.*Stage3Executor" src/ tests/Length of output: 0
🏁 Script executed:
#!/bin/bash
# Confirm the exact tuple unpacking in test at line 265-276
sed -n '265,280p' tests/processing_pipeline/test_stage_3.pyLength of output: 593
---
🏁 Script executed:
```shell
#!/bin/bash
# Identify the test function containing the tuple unpacking at line 265-276
sed -n '250,276p' tests/processing_pipeline/test_stage_3.py | grep -n "def test"
Length of output: 0
🏁 Script executed:
#!/bin/bash
# Get full context of the test method
sed -n '245,280p' tests/processing_pipeline/test_stage_3.pyLength of output: 1472
---
🏁 Script executed:
```shell
#!/bin/bash
# Find the test function name by looking backwards from line 265
sed -n '230,265p' tests/processing_pipeline/test_stage_3.py | tail -40
Length of output: 1527
🏁 Script executed:
#!/bin/bash
# Search for def test functions near line 265
rg -n "def test_" tests/processing_pipeline/test_stage_3.py | grep -A1 -B1 "153\|183\|265\|338"Length of output: 0
🏁 Script executed:
#!/bin/bash
# Get broader context to find function name
awk 'NR>=230 && NR<=280' tests/processing_pipeline/test_stage_3.pyLength of output: 2084
Update test assertions to expect dict instead of tuple.
The test test_stage_3_executor in tests/processing_pipeline/test_stage_3.py at lines 273-275 explicitly expects Stage3Executor.run to return a tuple and attempts to unpack it. With the change to return a dict, these assertions and the unpacking will break:
assert isinstance(result, tuple)
assert len(result) == 2
response, grounding_metadata = resultUpdate the test to access dict keys instead: result["response"] and result["grounding_metadata"].
🤖 Prompt for AI Agents
In src/processing_pipeline/stage_3.py around lines 373 to 382,
Stage3Executor.run now returns a dict with keys "response" and
"grounding_metadata" but the tests still expect and unpack a tuple; update the
test assertions in tests/processing_pipeline/test_stage_3.py (around lines
273-275) to stop asserting tuple semantics and instead access result["response"]
and result["grounding_metadata"], removing the tuple type/length checks and the
unpacking.
Important
Add
analyzed_byfield to track analysis model and update status handling instage_3.py.analyzed_byfield to track the model used for analysis inanalyze_snippet()instage_3.py.update_snippet_in_supabase()instage_3.pyandupdate_snippet()insupabase_utils.pyto includeanalyzed_by.ProcessingStatusconstants for status updates inprocess_snippet()andin_depth_analysis()instage_3.py.This description was created by
for 3f6dec5. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
New Features
Improvements