Skip to content

VER-303: Split prompt_versions stage column into stage + sub_stage#66

Merged
quancao-ea merged 3 commits intomainfrom
refactor/split-stage-sub-stage
Mar 2, 2026
Merged

VER-303: Split prompt_versions stage column into stage + sub_stage#66
quancao-ea merged 3 commits intomainfrom
refactor/split-stage-sub-stage

Conversation

@quancao-ea
Copy link
Copy Markdown
Collaborator

@quancao-ea quancao-ea commented Mar 1, 2026

Important

Refactor prompt versioning to include sub-stages, updating constants, functions, and SQL to support this change.

  • Behavior:
    • Split PromptStage into stage and sub_stage in get_active_prompt() in supabase_utils.py.
    • Update upsert_prompt_version.sql to handle sub_stage.
    • Modify import_prompts_to_db.py to support sub_stage in prompt import.
  • Constants:
    • Remove sub-stage constants from PromptStage in constants.py.
    • Add Stage1SubStage in stage_1/constants.py and Stage4SubStage in stage_4/constants.py.
  • Functions:
    • Update get_active_prompt() in supabase_utils.py to accept sub_stage.
    • Modify initial_disinformation_detection() and redo_main_detection() in stage_1/flows.py to use Stage1SubStage.
    • Update analysis_review() in stage_4/flows.py to use Stage4SubStage.
  • Misc:
    • Update PROMPT_MAPPING in import_prompts_to_db.py to use (stage, sub_stage) tuples.
    • Add _stage_label() and _parse_stage_label() in import_prompts_to_db.py for handling stage labels.

This description was created by Ellipsis for d69be51. You can customize this summary. It will automatically update as commits are pushed.

Summary by CodeRabbit

  • Refactor

    • Enhanced prompt versioning system with sub-stage support, enabling more granular control over prompt variants within each processing stage.
    • Updated prompt import format from stage-only to stage/sub_stage notation for increased flexibility in prompt management.
  • Chores

    • Updated database schema to support sub_stage tracking in prompt version records.

Modify get_active_prompt method to accept optional sub_stage
parameter and update all stage 1 prompt retrievals to use the
new Stage1SubStage enum values.
Update stage 4 flows and main.py to use the new sub-stage
enum when fetching prompts, aligning with the refactored
stage/sub-stage architecture introduced for better pipeline
modularity.
@linear
Copy link
Copy Markdown

linear Bot commented Mar 1, 2026

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 1, 2026

Walkthrough

This PR refactors the prompt staging system by introducing a hierarchical two-level structure: main stages (STAGE_1, STAGE_3, STAGE_4) paired with substages. It removes granular enum constants, updates get_active_prompt to accept an optional substage parameter, and modifies all prompt retrieval calls throughout the codebase accordingly. Database schema and prompt import tooling are updated to support composite stage/substage keys.

Changes

Cohort / File(s) Summary
SubStage Enum Definitions
src/processing_pipeline/stage_1/constants.py, src/processing_pipeline/stage_4/constants.py
Introduces new StrEnum-based Stage1SubStage with INITIAL_TRANSCRIPTION, INITIAL_DETECTION, TIMESTAMPED_TRANSCRIPTION, DISINFORMATION_DETECTION, and Stage4SubStage with KB_RESEARCHER, WEB_RESEARCHER, REVIEWER, KB_UPDATER.
Core PromptStage & API Updates
src/processing_pipeline/constants.py, src/processing_pipeline/supabase_utils.py
Removes granular PromptStage enum members (STAGE_1_INITIAL_TRANSCRIPTION, STAGE_1_INITIAL_DETECTION, STAGE_4_KB_RESEARCHER, etc.). Updates get_active_prompt signature to accept optional `sub_stage: StrEnum
Flow Integration
src/processing_pipeline/stage_1/flows.py, src/processing_pipeline/stage_4/flows.py, src/main.py
Updates all get_active_prompt call sites to pass substage variants alongside main stage (e.g., get_active_prompt(PromptStage.STAGE_1, Stage1SubStage.INITIAL_DETECTION)).
Database & Tooling
supabase/database/sql/upsert_prompt_version.sql, src/scripts/import_prompts_to_db.py
Adds p_sub_stage parameter to SQL function with deactivation logic using IS NOT DISTINCT FROM. Refactors import script to work with composite (stage, substage) tuple keys; introduces _stage_label() and _parse_stage_label() helpers; updates CLI argument handling and JSON payload construction.

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly Related PRs

Poem

🐰 A substage dance, so neat and clean,
Where once was one, now two convene,
Prompts organized, from root to leaf,
A hierarchy of blessed relief!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 26.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main architectural change: refactoring the prompt_versions system to use separate stage and sub_stage columns instead of a single stage column.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch refactor/split-stage-sub-stage

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Pylint (4.0.5)
src/processing_pipeline/stage_4/constants.py

************* Module .pylintrc
.pylintrc:1:0: F0011: error while parsing the configuration: File contains no section headers.
file: '.pylintrc', line: 1
'disable=C0116\n' (config-parse-error)
[
{
"type": "convention",
"module": "src.processing_pipeline.stage_4.constants",
"obj": "",
"line": 1,
"column": 0,
"endLine": null,
"endColumn": null,
"path": "src/processing_pipeline/stage_4/constants.py",
"symbol": "missing-module-docstring",
"message": "Missing module docstring",
"message-id": "C0114"
},
{
"type": "convention",
"module": "src.processing_pipeline.stage_4.constants",
"obj": "Stage4SubStage",
"line": 4,
"column": 0,
"endLine": 4,
"endColumn": 20,
"path": "src/processing_pipeline/stage_4/constants.py",
"symbol": "missing-class-docstring",
"message": "Missing class docstring",
"message-id": "C0115"
}
]

src/processing_pipeline/stage_1/constants.py

************* Module .pylintrc
.pylintrc:1:0: F0011: error while parsing the configuration: File contains no section headers.
file: '.pylintrc', line: 1
'disable=C0116\n' (config-parse-error)
[
{
"type": "convention",
"module": "src.processing_pipeline.stage_1.constants",
"obj": "",
"line": 1,
"column": 0,
"endLine": null,
"endColumn": null,
"path": "src/processing_pipeline/stage_1/constants.py",
"symbol": "missing-module-docstring",
"message": "Missing module docstring",
"message-id": "C0114"
},
{
"type": "convention",
"module": "src.processing_pipeline.stage_1.constants",
"obj": "Stage1SubStage",
"line": 4,
"column": 0,
"endLine": 4,
"endColumn": 20,
"path": "src/processing_pipeline/stage_1/constants.py",
"symbol": "missing-class-docstring",
"message": "Missing class docstring",
"message-id": "C0115"
}
]

src/processing_pipeline/stage_1/flows.py

************* Module .pylintrc
.pylintrc:1:0: F0011: error while parsing the configuration: File contains no section headers.
file: '.pylintrc', line: 1
'disable=C0116\n' (config-parse-error)
[
{
"type": "convention",
"module": "src.processing_pipeline.stage_1.flows",
"obj": "",
"line": 41,
"column": 0,
"endLine": null,
"endColumn": null,
"path": "src/processing_pipeline/stage_1/flows.py",
"symbol": "line-too-long",
"message": "Line too long (116/100)",
"message-id": "C0301"
},
{
"type": "convention",
"module": "src.processing_pipeline.stage_1.flows",
"obj": "",
"line": 64,
"column": 0,
"endLine": null,
"endColumn": null,
"path": "src/processing_pipeline/stage_1/flows.py",
"symbol": "line-too-long",
"message": "Line too long (116/100)",
"message-id": "C0301"
},
{
"type": "convention",

... [truncated 14672 characters] ...

module": "src.processing_pipeline.stage_1.flows",
"obj": "regenerate_timestamped_transcript",
"line": 194,
"column": 0,
"endLine": 194,
"endColumn": 37,
"path": "src/processing_pipeline/stage_1/flows.py",
"symbol": "too-many-locals",
"message": "Too many local variables (16/15)",
"message-id": "R0914"
},
{
"type": "warning",
"module": "src.processing_pipeline.stage_1.flows",
"obj": "regenerate_timestamped_transcript",
"line": 217,
"column": 8,
"endLine": 217,
"endColumn": 10,
"path": "src/processing_pipeline/stage_1/flows.py",
"symbol": "redefined-builtin",
"message": "Redefining built-in 'id'",
"message-id": "W0622"
}
]

  • 4 others

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to d69be51 in 9 seconds. Click for details.
  • Reviewed 483 lines of code in 9 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

Workflow ID: wflow_1Kb2cohnsv8D9zUW

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant architectural change to how prompt versions are managed and retrieved within the processing pipeline. By splitting the existing 'stage' column into 'stage' and 'sub_stage', the system gains enhanced flexibility and organization for defining and accessing specific prompts. This refactoring impacts several core components, including prompt definition enums, the Supabase client's prompt retrieval logic, the prompt import utility, and the underlying database schema, ensuring a more robust and scalable prompt management system.

Highlights

  • Prompt Versioning Refactor: The system for managing prompt versions has been refactored to use a 'stage' and 'sub_stage' combination instead of a single 'stage' identifier. This allows for more granular categorization and retrieval of prompts.
  • New Sub-Stage Enums: Dedicated Stage1SubStage and Stage4SubStage enums were introduced to define the specific sub-categories of prompts within Stage 1 and Stage 4, respectively.
  • Simplified PromptStage Enum: The main PromptStage enum was simplified by removing the previously detailed sub-stage entries, as these are now handled by the new sub_stage concept.
  • Updated Supabase Client: The get_active_prompt method in supabase_utils.py was updated to accept an optional sub_stage parameter, modifying its query logic to correctly retrieve prompts based on both stage and sub-stage.
  • Prompt Import Script Changes: The import_prompts_to_db.py script was updated to reflect the new (stage, sub_stage) mapping for prompts, including new helper functions for formatting and parsing stage labels, and adjusting the import and listing functionalities.
  • Database Function Update: The upsert_prompt_version SQL function was modified to include a p_sub_stage parameter, ensuring that prompt versions are correctly inserted and activated based on both their main stage and sub-stage.
Changelog
  • src/main.py
    • Updated prompt retrieval calls to include Stage4SubStage.
  • src/processing_pipeline/constants.py
    • Simplified PromptStage enum by removing specific sub-stage entries.
  • src/processing_pipeline/stage_1/constants.py
    • Added Stage1SubStage enum for granular prompt identification.
  • src/processing_pipeline/stage_1/flows.py
    • Updated prompt retrieval calls to include Stage1SubStage.
  • src/processing_pipeline/stage_4/constants.py
    • Added Stage4SubStage enum for granular prompt identification.
  • src/processing_pipeline/stage_4/flows.py
    • Updated prompt retrieval calls to include Stage4SubStage.
  • src/processing_pipeline/supabase_utils.py
    • Modified get_active_prompt to accept an optional sub_stage parameter and adjust query logic.
  • src/scripts/import_prompts_to_db.py
    • Updated PROMPT_MAPPING to use (stage, sub_stage) tuples.
    • Added helper functions _stage_label and _parse_stage_label for handling stage/sub-stage combinations.
    • Modified import and list logic to correctly process and display prompts with the new sub_stage structure.
  • supabase/database/sql/upsert_prompt_version.sql
    • Added DROP FUNCTION IF EXISTS upsert_prompt_version; to ensure function recreation.
    • Modified the upsert_prompt_version function to accept a p_sub_stage parameter.
    • Updated the INSERT statement to include the sub_stage column.
    • Adjusted the UPDATE logic for setting active prompts to consider both stage and sub_stage.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
src/scripts/import_prompts_to_db.py (2)

251-253: Consider catching a more specific exception.

Static analysis flagged the broad Exception catch. While this is common in CLI scripts for user-friendly error reporting, consider catching more specific exceptions (e.g., APIError from Supabase) to avoid masking unexpected errors.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/import_prompts_to_db.py` around lines 251 - 253, Replace the
broad "except Exception as e" around the prompt-version creation with targeted
exception handlers: catch the specific client/API errors (e.g., Supabase's
APIError or the HTTP/client exception class your Supabase client raises) to
increment error_count and print the user-friendly message for that known failure
case, then optionally add a final generic except that logs and re-raises
unexpected exceptions; reference the same block that prints "Error creating
prompt version for {label}" and updates error_count so behavior and message
remain consistent.

329-329: Potential None in stages list.

_parse_stage_label returns None for unrecognized labels, but line 329 doesn't filter these out. If an invalid label passes through (which shouldn't happen due to choices validation on line 315), the stages list would contain None values, causing errors downstream when unpacking stage, sub_stage = key on line 189.

While choices validation should prevent this, adding defensive handling would make the code more robust.

🛡️ Optional defensive fix
-        stages = [_parse_stage_label(s) for s in args.stages] if args.stages else None
+        stages = [_parse_stage_label(s) for s in args.stages] if args.stages else None
+        if stages and None in stages:
+            invalid = [s for s in args.stages if _parse_stage_label(s) is None]
+            raise ValueError(f"Invalid stage labels: {invalid}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/import_prompts_to_db.py` at line 329, The list comprehension that
builds stages from args.stages may include None values because
_parse_stage_label can return None; update the construction of stages (the
variable stages) to filter out None results (e.g., use [s for s in
(_parse_stage_label(s) for s in args.stages) if s is not None] or equivalent)
and/or raise a clear error if any label is unrecognized so downstream unpacking
(stage, sub_stage = key) in the code that consumes stages won't receive None;
ensure you modify the code that assigns stages and keep references to
_parse_stage_label and the stages variable consistent.
src/processing_pipeline/stage_1/flows.py (1)

276-278: Improve early validation of Gemini client initialization.

_create_gemini_client returns None if GOOGLE_GEMINI_KEY is not set, and the callers (initial_disinformation_detection, redo_main_detection, regenerate_timestamped_transcript) pass the result directly without checking. While downstream task functions do validate the client with if not gemini_client: raise ValueError(...), this allows unnecessary flow execution before failing.

Consider either:

  1. Adding an early None check in the flow functions to fail immediately if the key is missing, or
  2. Documenting that this is expected behavior when Gemini is intentionally disabled.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/processing_pipeline/stage_1/flows.py` around lines 276 - 278, The flow
functions initial_disinformation_detection, redo_main_detection, and
regenerate_timestamped_transcript currently call _create_gemini_client() and
pass its result downstream, allowing the flow to start before a missing
GOOGLE_GEMINI_KEY is discovered; modify each of those flow entry points to
perform an early check after calling _create_gemini_client(), and if the result
is None immediately raise a ValueError (or a custom exception) with a clear
message about the missing GOOGLE_GEMINI_KEY so the flow fails fast instead of
executing further tasks; ensure you reference _create_gemini_client() in the
check and keep the error text consistent across the three functions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/processing_pipeline/stage_1/flows.py`:
- Around line 276-278: The flow functions initial_disinformation_detection,
redo_main_detection, and regenerate_timestamped_transcript currently call
_create_gemini_client() and pass its result downstream, allowing the flow to
start before a missing GOOGLE_GEMINI_KEY is discovered; modify each of those
flow entry points to perform an early check after calling
_create_gemini_client(), and if the result is None immediately raise a
ValueError (or a custom exception) with a clear message about the missing
GOOGLE_GEMINI_KEY so the flow fails fast instead of executing further tasks;
ensure you reference _create_gemini_client() in the check and keep the error
text consistent across the three functions.

In `@src/scripts/import_prompts_to_db.py`:
- Around line 251-253: Replace the broad "except Exception as e" around the
prompt-version creation with targeted exception handlers: catch the specific
client/API errors (e.g., Supabase's APIError or the HTTP/client exception class
your Supabase client raises) to increment error_count and print the
user-friendly message for that known failure case, then optionally add a final
generic except that logs and re-raises unexpected exceptions; reference the same
block that prints "Error creating prompt version for {label}" and updates
error_count so behavior and message remain consistent.
- Line 329: The list comprehension that builds stages from args.stages may
include None values because _parse_stage_label can return None; update the
construction of stages (the variable stages) to filter out None results (e.g.,
use [s for s in (_parse_stage_label(s) for s in args.stages) if s is not None]
or equivalent) and/or raise a clear error if any label is unrecognized so
downstream unpacking (stage, sub_stage = key) in the code that consumes stages
won't receive None; ensure you modify the code that assigns stages and keep
references to _parse_stage_label and the stages variable consistent.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between bcf6378 and d69be51.

📒 Files selected for processing (9)
  • src/main.py
  • src/processing_pipeline/constants.py
  • src/processing_pipeline/stage_1/constants.py
  • src/processing_pipeline/stage_1/flows.py
  • src/processing_pipeline/stage_4/constants.py
  • src/processing_pipeline/stage_4/flows.py
  • src/processing_pipeline/supabase_utils.py
  • src/scripts/import_prompts_to_db.py
  • supabase/database/sql/upsert_prompt_version.sql
💤 Files with no reviewable changes (1)
  • src/processing_pipeline/constants.py

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively refactors the prompt versioning system by splitting the stage concept into stage and an optional sub_stage. The changes are consistently applied across the Python codebase and the database, including modifications to enums, data access functions, and the database import script. The SQL update to upsert_prompt_version is well-handled, correctly using IS NOT DISTINCT FROM for NULL-safe comparisons. I have one suggestion in src/scripts/import_prompts_to_db.py to improve the efficiency of parsing stage labels.

Comment on lines +97 to +102
def _parse_stage_label(label: str):
"""Parse a display string back to a (stage, sub_stage) tuple."""
for key in PROMPT_MAPPING:
if _stage_label(key) == label:
return key
return None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For improved efficiency and readability, you can create a reverse mapping from labels to keys at the module level. This avoids iterating through PROMPT_MAPPING every time _parse_stage_label is called, making the lookup O(1). You can define the reverse mapping after _stage_label is defined.

def _parse_stage_label(label: str):
    """Parse a display string back to a (stage, sub_stage) tuple."""
    reverse_mapping = {_stage_label(k): k for k in PROMPT_MAPPING}
    return reverse_mapping.get(label)

@quancao-ea quancao-ea merged commit e12f9e2 into main Mar 2, 2026
2 checks passed
@quancao-ea quancao-ea deleted the refactor/split-stage-sub-stage branch March 2, 2026 02:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant