feat(wren-ai-service): enable filtering is fallback triggered#1748
Conversation
…s multiple pipelines and update .env file with specific version numbers.
|
Caution Review failedThe pull request is closed. """ WalkthroughThis update introduces a new Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant PipelineComponent
participant Generator
Caller->>PipelineComponent: call_async_function(prompt, generator, ..., generator_name)
PipelineComponent->>Generator: generator(prompt, ..., generator_name)
Generator-->>PipelineComponent: result
PipelineComponent-->>Caller: (result, generator_name)
Possibly related PRs
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
✨ Finishing Touches
🧪 Generate Unit Tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 18
🔭 Outside diff range comments (1)
wren-ai-service/src/utils.py (1)
164-179: Tuple-unpacking makes the decorator brittle – add backward-compatibility guard
trace_cost()now assumes the wrapped coroutine returns a 2-tuple.
If a future/legacy pipeline returns only the result dict, Python will raiseValueError: not enough values to unpack.- result, generator_name = await func(*args, **kwargs) + outcome = await func(*args, **kwargs) + # Accept both the legacy `dict` and the new `(dict, str)` forms + if isinstance(outcome, tuple) and len(outcome) == 2: + result, generator_name = outcome + else: + result, generator_name = outcome, NoneThis keeps existing pipelines working while still supporting the new pair signature.
Also consider usinggenerator_name(currently commented) to ensure the correct model is logged.
♻️ Duplicate comments (6)
wren-ai-service/src/pipelines/generation/misleading_assistance.py (2)
74-80: Same coupling/tuple concern as raised inuser_guide_assistance.py.
97-98: Same component-naming nitpick as above.wren-ai-service/src/pipelines/generation/data_assistance.py (2)
74-80: Same coupling/tuple concern as raised inuser_guide_assistance.py.
97-98: Same component-naming nitpick as above.wren-ai-service/src/pipelines/generation/sql_answer.py (2)
69-74: Same coupling/tuple concern as raised inuser_guide_assistance.py.
94-95: Same component-naming nitpick as above.
🧹 Nitpick comments (7)
wren-ai-service/src/pipelines/generation/user_guide_assistance.py (1)
87-88: Component naming inconsistency
"generator_name"is stored alongside callables (generator,prompt_builder).
Consider prefixing with something likemodel_nameor grouping non-callable constants in
_configsfor clarity and to avoid accidental misinterpretation by Hamilton.wren-ai-service/src/pipelines/generation/question_recommendation.py (1)
235-241: Unused component increases payload size
"generator_name": llm_provider.get_model()is injected into_componentsbut never consumed by any node inside this pipeline (only thegeneratefunction takes it as an explicit arg).
Keeping redundant entries bloats every driver execution payload.Either:
- Feed
generator_nameexplicitly viainputs=instead of_components, or- Remove it from
_componentsand letgenerate()callllm_provider.get_model()directly.Small, but keeps the DAG tidy.
wren-ai-service/src/pipelines/generation/sql_regeneration.py (1)
156-162: Minor: remove dead dependencySame as other files – if nothing downstream references
"generator_name"from_components, prefer passing it directly or dropping it to keep the component map minimal.wren-ai-service/src/pipelines/generation/chart_adjustment.py (1)
146-163: Component bloatConsider whether storing
"generator_name"in_componentsis necessary for this DAG; otherwise eliminate it.wren-ai-service/src/pipelines/generation/sql_generation.py (1)
143-149: Question redundancyStoring
"generator_name"in_componentswithout later use adds overhead. Pass it explicitly or drop.wren-ai-service/src/pipelines/generation/sql_question.py (1)
100-106: Component not referenced
"generator_name"is injected yet unused by any downstream node. Consider removing to lighten execution context.wren-ai-service/src/pipelines/generation/sql_tables_extraction.py (1)
321-328: Minor: remove redundantelseafter early return- if prompt: - ... - return ..., generator_name - else: - return {}, generator_name + if not prompt: + return {} + return await table_columns_selection_generator(...), generator_nameSimplifies control-flow and satisfies pylint R1705.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (26)
wren-ai-service/eval/metrics/spider/process_sql.py(2 hunks)wren-ai-service/src/pipelines/generation/chart_adjustment.py(2 hunks)wren-ai-service/src/pipelines/generation/chart_generation.py(2 hunks)wren-ai-service/src/pipelines/generation/data_assistance.py(2 hunks)wren-ai-service/src/pipelines/generation/followup_sql_generation.py(2 hunks)wren-ai-service/src/pipelines/generation/followup_sql_generation_reasoning.py(2 hunks)wren-ai-service/src/pipelines/generation/intent_classification.py(2 hunks)wren-ai-service/src/pipelines/generation/misleading_assistance.py(2 hunks)wren-ai-service/src/pipelines/generation/question_recommendation.py(2 hunks)wren-ai-service/src/pipelines/generation/relationship_recommendation.py(2 hunks)wren-ai-service/src/pipelines/generation/semantics_description.py(2 hunks)wren-ai-service/src/pipelines/generation/sql_answer.py(2 hunks)wren-ai-service/src/pipelines/generation/sql_correction.py(2 hunks)wren-ai-service/src/pipelines/generation/sql_generation.py(2 hunks)wren-ai-service/src/pipelines/generation/sql_generation_reasoning.py(2 hunks)wren-ai-service/src/pipelines/generation/sql_question.py(2 hunks)wren-ai-service/src/pipelines/generation/sql_regeneration.py(2 hunks)wren-ai-service/src/pipelines/generation/sql_tables_extraction.py(2 hunks)wren-ai-service/src/pipelines/generation/user_guide_assistance.py(2 hunks)wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py(2 hunks)wren-ai-service/src/pipelines/retrieval/preprocess_sql_data.py(1 hunks)wren-ai-service/src/providers/__init__.py(3 hunks)wren-ai-service/src/utils.py(1 hunks)wren-ai-service/src/web/v1/routers/ask.py(1 hunks)wren-ai-service/tests/pytest/services/test_semantics_description.py(1 hunks)wren-ai-service/tools/dev/.env(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (5)
wren-ai-service/src/web/v1/routers/ask.py (2)
wren-ai-service/tests/pytest/services/test_ask.py (1)
ask_service(23-62)wren-ai-service/src/web/v1/services/ask.py (1)
AskFeedbackResultResponse(120-132)
wren-ai-service/src/pipelines/generation/sql_regeneration.py (1)
wren-ai-service/src/core/provider.py (2)
get_model(11-12)get_model(30-31)
wren-ai-service/src/pipelines/generation/data_assistance.py (1)
wren-ai-service/src/core/provider.py (2)
get_model(11-12)get_model(30-31)
wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py (1)
wren-ai-service/src/core/provider.py (2)
get_model(11-12)get_model(30-31)
wren-ai-service/src/pipelines/generation/relationship_recommendation.py (4)
wren-ai-service/src/pipelines/generation/question_recommendation.py (2)
generate(46-47)prompt(21-41)wren-ai-service/src/pipelines/generation/semantics_description.py (2)
generate(70-71)prompt(55-65)wren-ai-service/src/web/v1/routers/relationship_recommendation.py (1)
get(117-129)wren-ai-service/src/core/provider.py (2)
get_model(11-12)get_model(30-31)
🪛 dotenv-linter (3.3.0)
wren-ai-service/tools/dev/.env
[warning] 14-14: [UnorderedKey] The WREN_ENGINE_VERSION key should go before the WREN_PRODUCT_VERSION key
[warning] 15-15: [UnorderedKey] The WREN_AI_SERVICE_VERSION key should go before the WREN_ENGINE_VERSION key
[warning] 16-16: [UnorderedKey] The IBIS_SERVER_VERSION key should go before the WREN_AI_SERVICE_VERSION key
[warning] 18-18: [UnorderedKey] The WREN_BOOTSTRAP_VERSION key should go before the WREN_ENGINE_VERSION key
🪛 Pylint (3.3.7)
wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py
[refactor] 323-328: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Analyze (go)
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (15)
wren-ai-service/tools/dev/.env (1)
14-18: Pin explicit version tags to avoid drifting dependencies. The replacement of"latest"with fixed versions (0.16.1,0.23.2,0.16.1,0.28.0,0.1.5) ensures consistent, reproducible builds.wren-ai-service/src/pipelines/retrieval/preprocess_sql_data.py (1)
52-60: Guard against infinite loops in data reduction. Introducing theiterationcounter with a 1000-iteration break prevents endless token trimming when reducing SQL data.wren-ai-service/tests/pytest/services/test_semantics_description.py (1)
234-235: Formatting-only change for dictionary literal. Converting the nested dict to a single-line expression preserves behavior while improving brevity.wren-ai-service/src/web/v1/routers/ask.py (1)
155-159: Refactored feedback result initialization formatting. Switching to bracket notation for_ask_feedback_results[query_id]assignment is a non-functional style update that enhances clarity.wren-ai-service/src/providers/__init__.py (1)
78-86: Consistent multi-line unpacking inlitellm_params. The added commas and structured conditional unpacking forapi_versionandkwargsmaintain existing logic while improving readability.wren-ai-service/src/pipelines/generation/sql_correction.py (1)
103-108: Component name added correctly – LGTM
generator_nameis surfaced via_components; that keeps the Hamilton graph consistent with the new function signature.wren-ai-service/src/pipelines/generation/relationship_recommendation.py (2)
65-77: Potential misuse ofgenerateresult
normalized()still executesreply = generate.get("replies")[0].
Iftrace_costever returns the raw tuple (e.g., in error paths or when the decorator is disabled), this will raise. Consider guarding with:raw = generate[0] if isinstance(generate, tuple) else generate reply = raw.get("replies")[0]
199-200: Component injection looks goodThe extra
generator_nameentry is necessary for Hamilton to satisfy the new parameter; no issues spotted.wren-ai-service/src/pipelines/generation/sql_generation_reasoning.py (1)
107-108: Component wiring verified
generator_nameadded; consistent with other pipelines.wren-ai-service/src/pipelines/generation/intent_classification.py (1)
352-353: Component addition OK
generator_namecorrectly supplied.wren-ai-service/src/pipelines/generation/semantics_description.py (1)
219-220: No issues with component injection
generator_nameadded – consistent and correct.wren-ai-service/src/pipelines/generation/chart_generation.py (1)
130-131: Component naming is fine, but remember to mark it as a@nodeonly if consumedAdding
"generator_name"to_componentsis OK; just verify that no other pipeline adds another key with the same name to avoid collision when merging inputs.wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py (1)
459-460: Component addition OK – just ensure unique keyNo objections; same caveat on potential key collision as noted earlier.
wren-ai-service/src/pipelines/generation/followup_sql_generation_reasoning.py (1)
122-123: Constructor change looks goodInjection of model name into components is consistent with the new pattern.
wren-ai-service/src/pipelines/generation/followup_sql_generation.py (1)
153-154: Addition ofgenerator_namecomponent acceptedImplementation consistent across pipelines.
| prompt: dict, table_columns_selection_generator: Any, generator_name: str | ||
| ) -> dict: | ||
| if prompt: | ||
| return await table_columns_selection_generator( | ||
| prompt=prompt.get("prompt") | ||
| ) | ||
| ), generator_name | ||
| else: | ||
| return {} | ||
| return {}, generator_name | ||
|
|
There was a problem hiding this comment.
Logic now always enters truthy branch – breaks pruning path
Returning a tuple makes filter_columns_in_tables always truthy (({}, "gpt-4o")), so construct_retrieval_results() will incorrectly attempt to access ["replies"] on a tuple and bypass the “no-pruning” path.
Adopt one of:
- return await table_columns_selection_generator(...), generator_name
+ result: dict = await table_columns_selection_generator(...)
+ result["generator_name"] = generator_name
+ return resultand drop the tuple altogether; or adjust the downstream truthiness check & attribute access.
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 323-328: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
🤖 Prompt for AI Agents
In wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py around lines
321 to 329, the function returns a tuple which causes the caller to always
evaluate it as truthy, breaking the pruning logic. To fix this, modify the
function to return only the dictionary (or an empty dict) without wrapping it in
a tuple, or alternatively update the downstream code to correctly handle the
tuple by adjusting truthiness checks and attribute access accordingly.
…nsistency across pipelines, and update type hints to include Tuple in multiple files.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py (1)
321-328: Returning a tuple breaks downstream logic – revert to a single‐dict payload
filter_columns_in_tables()now returns(dict, str), yetconstruct_retrieval_results()still treats the node output as a dict (e.g.filter_columns_in_tables["replies"]).
At runtime this will raiseTypeError: tuple indices must be integers or slices, not str, and the mere truthiness of the tuple will wrongly force the pruning path.Restore the original dict return type and embed the generator name inside the dict to keep interfaces consistent and avoid truthiness pitfalls.
This also removes the superfluouselseflagged by pylint.-async def filter_columns_in_tables( - prompt: dict, table_columns_selection_generator: Any, generator_name: str -) -> tuple[dict, str]: - if prompt: - return await table_columns_selection_generator( - prompt=prompt.get("prompt") - ), generator_name - else: - return {}, generator_name +async def filter_columns_in_tables( + prompt: dict, + table_columns_selection_generator: Any, + generator_name: str, +) -> dict: + """Runs the column-selection LLM and attaches generator metadata.""" + + result: dict + if prompt: + result = await table_columns_selection_generator( + prompt=prompt.get("prompt") + ) + else: + result = {} + + # Propagate model identity without changing the outer type. + result["generator_name"] = generator_name + return resultAfter this change, no further modifications are needed in
construct_retrieval_results()and the pylint R1705 warning is resolved.
Please run the unit / integration tests to confirm the fix.
🧹 Nitpick comments (1)
wren-ai-service/eval/metrics/spider/process_sql.py (1)
560-565: Prefertry/exceptoverstr.isdigit()for robust LIMIT parsing
str.isdigit()rejects negative numbers and any representation containing a sign or whitespace, and it also requires the string to be non-empty.
A smalltry/exceptaroundint(limit_token)is both clearer and more permissive.- limit_token = toks[idx - 1] - if not str(limit_token).isdigit(): - # Preserve previous behaviour but make the intent explicit - return idx, 1 - - return idx, int(limit_token) + limit_token = toks[idx - 1] + try: + return idx, int(limit_token) + except ValueError: + # Preserve previous behaviour but make the intent explicit + return idx, 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
wren-ai-service/eval/metrics/spider/process_sql.py(2 hunks)wren-ai-service/src/pipelines/generation/chart_adjustment.py(2 hunks)wren-ai-service/src/pipelines/generation/chart_generation.py(2 hunks)wren-ai-service/src/pipelines/generation/followup_sql_generation.py(2 hunks)wren-ai-service/src/pipelines/generation/followup_sql_generation_reasoning.py(2 hunks)wren-ai-service/src/pipelines/generation/intent_classification.py(2 hunks)wren-ai-service/src/pipelines/generation/question_recommendation.py(2 hunks)wren-ai-service/src/pipelines/generation/relationship_recommendation.py(2 hunks)wren-ai-service/src/pipelines/generation/semantics_description.py(2 hunks)wren-ai-service/src/pipelines/generation/sql_correction.py(3 hunks)wren-ai-service/src/pipelines/generation/sql_generation.py(2 hunks)wren-ai-service/src/pipelines/generation/sql_generation_reasoning.py(2 hunks)wren-ai-service/src/pipelines/generation/sql_question.py(2 hunks)wren-ai-service/src/pipelines/generation/sql_regeneration.py(2 hunks)wren-ai-service/src/pipelines/generation/sql_tables_extraction.py(2 hunks)wren-ai-service/src/pipelines/generation/user_guide_assistance.py(2 hunks)wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (15)
- wren-ai-service/src/pipelines/generation/sql_correction.py
- wren-ai-service/src/pipelines/generation/relationship_recommendation.py
- wren-ai-service/src/pipelines/generation/sql_tables_extraction.py
- wren-ai-service/src/pipelines/generation/sql_generation_reasoning.py
- wren-ai-service/src/pipelines/generation/sql_question.py
- wren-ai-service/src/pipelines/generation/user_guide_assistance.py
- wren-ai-service/src/pipelines/generation/sql_regeneration.py
- wren-ai-service/src/pipelines/generation/chart_generation.py
- wren-ai-service/src/pipelines/generation/followup_sql_generation.py
- wren-ai-service/src/pipelines/generation/chart_adjustment.py
- wren-ai-service/src/pipelines/generation/sql_generation.py
- wren-ai-service/src/pipelines/generation/semantics_description.py
- wren-ai-service/src/pipelines/generation/followup_sql_generation_reasoning.py
- wren-ai-service/src/pipelines/generation/intent_classification.py
- wren-ai-service/src/pipelines/generation/question_recommendation.py
🧰 Additional context used
🪛 Pylint (3.3.7)
wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py
[refactor] 323-328: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Analyze (go)
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (1)
wren-ai-service/eval/metrics/spider/process_sql.py (1)
321-324: Exception handling correctly narrowed toValueError– good catchLimiting the
exceptclause toValueErrorprevents unrelated errors from being swallowed and improves debuggability.
No further action required.
Replaced tuple yields (e.g., `yield key, value`) with dicts to avoid issues related to unpacking, schema validation, or serialization in downstream code. This ensures more consistent and safer data handling across generator consumers.
Enhanced the trace_cost function to include fallback metadata, allowing for better tracking of model usage and fallback conditions. This update improves the context management for AI service operations.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py (1)
321-328: Tuple return still breaks downstream logic – same issue as flagged previously
filter_columns_in_tables()continues to return a tuple(<dict>, generator_name), butconstruct_retrieval_results()still treats the value as a mapping and immediately does
filter_columns_in_tables["replies"][0], which raisesTypeError: tuple indices must be integers.Either embed the model name inside the dict (recommended) or update every downstream consumer to unpack the tuple before treating it as a dict.
Suggested quick fix:
-async def filter_columns_in_tables( - prompt: dict, table_columns_selection_generator: Any, generator_name: str -) -> dict: - if prompt: - return await table_columns_selection_generator( - prompt=prompt.get("prompt") - ), generator_name - else: - return {}, generator_name +async def filter_columns_in_tables( + prompt: dict, table_columns_selection_generator: Any, generator_name: str +) -> dict: + if not prompt: + return {} + + result: dict = await table_columns_selection_generator( + prompt=prompt.get("prompt") + ) + # propagate the model name in-band + result["generator_name"] = generator_name + return resultThis keeps the return type a dict, preserves generator provenance, and avoids breaking callers.
🧹 Nitpick comments (1)
wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py (1)
323-328: Minor: UnnecessaryelseafterreturnAfter an early
return, theelseblock is redundant and triggers Pylint R1705. Removing it simplifies the flow.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
wren-ai-service/src/pipelines/generation/chart_adjustment.py(2 hunks)wren-ai-service/src/pipelines/generation/chart_generation.py(2 hunks)wren-ai-service/src/pipelines/generation/followup_sql_generation.py(2 hunks)wren-ai-service/src/pipelines/generation/followup_sql_generation_reasoning.py(2 hunks)wren-ai-service/src/pipelines/generation/intent_classification.py(2 hunks)wren-ai-service/src/pipelines/generation/question_recommendation.py(2 hunks)wren-ai-service/src/pipelines/generation/relationship_recommendation.py(2 hunks)wren-ai-service/src/pipelines/generation/semantics_description.py(2 hunks)wren-ai-service/src/pipelines/generation/sql_correction.py(2 hunks)wren-ai-service/src/pipelines/generation/sql_generation.py(2 hunks)wren-ai-service/src/pipelines/generation/sql_generation_reasoning.py(2 hunks)wren-ai-service/src/pipelines/generation/sql_question.py(2 hunks)wren-ai-service/src/pipelines/generation/sql_regeneration.py(2 hunks)wren-ai-service/src/pipelines/generation/sql_tables_extraction.py(2 hunks)wren-ai-service/src/pipelines/generation/user_guide_assistance.py(2 hunks)wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py(2 hunks)wren-ai-service/src/utils.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (16)
- wren-ai-service/src/pipelines/generation/relationship_recommendation.py
- wren-ai-service/src/pipelines/generation/followup_sql_generation.py
- wren-ai-service/src/pipelines/generation/sql_regeneration.py
- wren-ai-service/src/pipelines/generation/chart_generation.py
- wren-ai-service/src/pipelines/generation/semantics_description.py
- wren-ai-service/src/pipelines/generation/sql_generation_reasoning.py
- wren-ai-service/src/pipelines/generation/sql_correction.py
- wren-ai-service/src/pipelines/generation/user_guide_assistance.py
- wren-ai-service/src/pipelines/generation/sql_tables_extraction.py
- wren-ai-service/src/pipelines/generation/question_recommendation.py
- wren-ai-service/src/utils.py
- wren-ai-service/src/pipelines/generation/chart_adjustment.py
- wren-ai-service/src/pipelines/generation/sql_question.py
- wren-ai-service/src/pipelines/generation/sql_generation.py
- wren-ai-service/src/pipelines/generation/intent_classification.py
- wren-ai-service/src/pipelines/generation/followup_sql_generation_reasoning.py
🧰 Additional context used
🪛 Pylint (3.3.7)
wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py
[refactor] 323-328: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Analyze (go)
Summary by CodeRabbit
New Features
Chores