VER-303: Split prompt_versions stage column into stage + sub_stage#66
VER-303: Split prompt_versions stage column into stage + sub_stage#66quancao-ea merged 3 commits intomainfrom
Conversation
Modify get_active_prompt method to accept optional sub_stage parameter and update all stage 1 prompt retrievals to use the new Stage1SubStage enum values.
Update stage 4 flows and main.py to use the new sub-stage enum when fetching prompts, aligning with the refactored stage/sub-stage architecture introduced for better pipeline modularity.
WalkthroughThis PR refactors the prompt staging system by introducing a hierarchical two-level structure: main stages (STAGE_1, STAGE_3, STAGE_4) paired with substages. It removes granular enum constants, updates Changes
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly Related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Pylint (4.0.5)src/processing_pipeline/stage_4/constants.py************* Module .pylintrc src/processing_pipeline/stage_1/constants.py************* Module .pylintrc src/processing_pipeline/stage_1/flows.py************* Module .pylintrc ... [truncated 14672 characters] ... module": "src.processing_pipeline.stage_1.flows",
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed everything up to d69be51 in 9 seconds. Click for details.
- Reviewed
483lines of code in9files - Skipped
0files when reviewing. - Skipped posting
0draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
Workflow ID: wflow_1Kb2cohnsv8D9zUW
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant architectural change to how prompt versions are managed and retrieved within the processing pipeline. By splitting the existing 'stage' column into 'stage' and 'sub_stage', the system gains enhanced flexibility and organization for defining and accessing specific prompts. This refactoring impacts several core components, including prompt definition enums, the Supabase client's prompt retrieval logic, the prompt import utility, and the underlying database schema, ensuring a more robust and scalable prompt management system. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
🧹 Nitpick comments (3)
src/scripts/import_prompts_to_db.py (2)
251-253: Consider catching a more specific exception.Static analysis flagged the broad
Exceptioncatch. While this is common in CLI scripts for user-friendly error reporting, consider catching more specific exceptions (e.g.,APIErrorfrom Supabase) to avoid masking unexpected errors.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/import_prompts_to_db.py` around lines 251 - 253, Replace the broad "except Exception as e" around the prompt-version creation with targeted exception handlers: catch the specific client/API errors (e.g., Supabase's APIError or the HTTP/client exception class your Supabase client raises) to increment error_count and print the user-friendly message for that known failure case, then optionally add a final generic except that logs and re-raises unexpected exceptions; reference the same block that prints "Error creating prompt version for {label}" and updates error_count so behavior and message remain consistent.
329-329: PotentialNonein stages list.
_parse_stage_labelreturnsNonefor unrecognized labels, but line 329 doesn't filter these out. If an invalid label passes through (which shouldn't happen due tochoicesvalidation on line 315), thestageslist would containNonevalues, causing errors downstream when unpackingstage, sub_stage = keyon line 189.While
choicesvalidation should prevent this, adding defensive handling would make the code more robust.🛡️ Optional defensive fix
- stages = [_parse_stage_label(s) for s in args.stages] if args.stages else None + stages = [_parse_stage_label(s) for s in args.stages] if args.stages else None + if stages and None in stages: + invalid = [s for s in args.stages if _parse_stage_label(s) is None] + raise ValueError(f"Invalid stage labels: {invalid}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/import_prompts_to_db.py` at line 329, The list comprehension that builds stages from args.stages may include None values because _parse_stage_label can return None; update the construction of stages (the variable stages) to filter out None results (e.g., use [s for s in (_parse_stage_label(s) for s in args.stages) if s is not None] or equivalent) and/or raise a clear error if any label is unrecognized so downstream unpacking (stage, sub_stage = key) in the code that consumes stages won't receive None; ensure you modify the code that assigns stages and keep references to _parse_stage_label and the stages variable consistent.src/processing_pipeline/stage_1/flows.py (1)
276-278: Improve early validation of Gemini client initialization.
_create_gemini_clientreturnsNoneifGOOGLE_GEMINI_KEYis not set, and the callers (initial_disinformation_detection,redo_main_detection,regenerate_timestamped_transcript) pass the result directly without checking. While downstream task functions do validate the client withif not gemini_client: raise ValueError(...), this allows unnecessary flow execution before failing.Consider either:
- Adding an early None check in the flow functions to fail immediately if the key is missing, or
- Documenting that this is expected behavior when Gemini is intentionally disabled.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/processing_pipeline/stage_1/flows.py` around lines 276 - 278, The flow functions initial_disinformation_detection, redo_main_detection, and regenerate_timestamped_transcript currently call _create_gemini_client() and pass its result downstream, allowing the flow to start before a missing GOOGLE_GEMINI_KEY is discovered; modify each of those flow entry points to perform an early check after calling _create_gemini_client(), and if the result is None immediately raise a ValueError (or a custom exception) with a clear message about the missing GOOGLE_GEMINI_KEY so the flow fails fast instead of executing further tasks; ensure you reference _create_gemini_client() in the check and keep the error text consistent across the three functions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/processing_pipeline/stage_1/flows.py`:
- Around line 276-278: The flow functions initial_disinformation_detection,
redo_main_detection, and regenerate_timestamped_transcript currently call
_create_gemini_client() and pass its result downstream, allowing the flow to
start before a missing GOOGLE_GEMINI_KEY is discovered; modify each of those
flow entry points to perform an early check after calling
_create_gemini_client(), and if the result is None immediately raise a
ValueError (or a custom exception) with a clear message about the missing
GOOGLE_GEMINI_KEY so the flow fails fast instead of executing further tasks;
ensure you reference _create_gemini_client() in the check and keep the error
text consistent across the three functions.
In `@src/scripts/import_prompts_to_db.py`:
- Around line 251-253: Replace the broad "except Exception as e" around the
prompt-version creation with targeted exception handlers: catch the specific
client/API errors (e.g., Supabase's APIError or the HTTP/client exception class
your Supabase client raises) to increment error_count and print the
user-friendly message for that known failure case, then optionally add a final
generic except that logs and re-raises unexpected exceptions; reference the same
block that prints "Error creating prompt version for {label}" and updates
error_count so behavior and message remain consistent.
- Line 329: The list comprehension that builds stages from args.stages may
include None values because _parse_stage_label can return None; update the
construction of stages (the variable stages) to filter out None results (e.g.,
use [s for s in (_parse_stage_label(s) for s in args.stages) if s is not None]
or equivalent) and/or raise a clear error if any label is unrecognized so
downstream unpacking (stage, sub_stage = key) in the code that consumes stages
won't receive None; ensure you modify the code that assigns stages and keep
references to _parse_stage_label and the stages variable consistent.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (9)
src/main.pysrc/processing_pipeline/constants.pysrc/processing_pipeline/stage_1/constants.pysrc/processing_pipeline/stage_1/flows.pysrc/processing_pipeline/stage_4/constants.pysrc/processing_pipeline/stage_4/flows.pysrc/processing_pipeline/supabase_utils.pysrc/scripts/import_prompts_to_db.pysupabase/database/sql/upsert_prompt_version.sql
💤 Files with no reviewable changes (1)
- src/processing_pipeline/constants.py
There was a problem hiding this comment.
Code Review
This pull request effectively refactors the prompt versioning system by splitting the stage concept into stage and an optional sub_stage. The changes are consistently applied across the Python codebase and the database, including modifications to enums, data access functions, and the database import script. The SQL update to upsert_prompt_version is well-handled, correctly using IS NOT DISTINCT FROM for NULL-safe comparisons. I have one suggestion in src/scripts/import_prompts_to_db.py to improve the efficiency of parsing stage labels.
| def _parse_stage_label(label: str): | ||
| """Parse a display string back to a (stage, sub_stage) tuple.""" | ||
| for key in PROMPT_MAPPING: | ||
| if _stage_label(key) == label: | ||
| return key | ||
| return None |
There was a problem hiding this comment.
For improved efficiency and readability, you can create a reverse mapping from labels to keys at the module level. This avoids iterating through PROMPT_MAPPING every time _parse_stage_label is called, making the lookup O(1). You can define the reverse mapping after _stage_label is defined.
def _parse_stage_label(label: str):
"""Parse a display string back to a (stage, sub_stage) tuple."""
reverse_mapping = {_stage_label(k): k for k in PROMPT_MAPPING}
return reverse_mapping.get(label)
Important
Refactor prompt versioning to include sub-stages, updating constants, functions, and SQL to support this change.
PromptStageintostageandsub_stageinget_active_prompt()insupabase_utils.py.upsert_prompt_version.sqlto handlesub_stage.import_prompts_to_db.pyto supportsub_stagein prompt import.PromptStageinconstants.py.Stage1SubStageinstage_1/constants.pyandStage4SubStageinstage_4/constants.py.get_active_prompt()insupabase_utils.pyto acceptsub_stage.initial_disinformation_detection()andredo_main_detection()instage_1/flows.pyto useStage1SubStage.analysis_review()instage_4/flows.pyto useStage4SubStage.PROMPT_MAPPINGinimport_prompts_to_db.pyto use(stage, sub_stage)tuples._stage_label()and_parse_stage_label()inimport_prompts_to_db.pyfor handling stage labels.This description was created by
for d69be51. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
Refactor
Chores