feat(wren-ai-service): fix by ai endpoint (ai-env-changed)#1628
Conversation
WalkthroughA new pipeline, Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant SQLCorrectionService
participant SQLTablesExtractionPipeline
participant DbSchemaRetrievalPipeline
participant SQLCorrectionPipeline
Client->>SQLCorrectionService: Submit SQL correction request
SQLCorrectionService->>SQLTablesExtractionPipeline: Extract table names from SQL
SQLTablesExtractionPipeline-->>SQLCorrectionService: List of table names
SQLCorrectionService->>DbSchemaRetrievalPipeline: Retrieve schema for extracted tables
DbSchemaRetrievalPipeline-->>SQLCorrectionService: Table DDLs
SQLCorrectionService->>SQLCorrectionPipeline: Run correction with schema context
SQLCorrectionPipeline-->>SQLCorrectionService: Corrected SQL or error
SQLCorrectionService-->>Client: Return result
Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (4)
wren-ai-service/docs/config_examples/config.groq.yaml (1)
141-142: Consistent Pipeline Addition
sql_tables_extractionadded correctly to the pipeline list with the appropriatellmspecification.Consider updating the top-of-file instructions (lines 1–6) to mention the new pipeline if you maintain a curated list of supported pipes in your documentation.
wren-ai-service/src/globals.py (2)
203-207: Avoid duplicating large retrieval sizes across services
DbSchemaRetrievalis instantiated here with the exact sametable_retrieval_size
andtable_column_retrieval_sizeused in AskService.
Consider extracting these two integers into a small dataclass or helper so any future
tuning effort touches one place only. This reduces the probability of
configuration drift between services.
257-264: Pipeline order guarantees context but increases latency
sql_tables_extraction ➔ db_schema_retrieval ➔ sql_correctionproduces a much
richer context for the correction LLM – great!
Keep in mind that this adds two extra LLM calls on the hot path. If the median
latency becomes noticeable, caching the extracted tables (keyed by the raw SQL
fingerprint) or pre-embedding table names for faster fuzzy retrieval can shave
~1-2 seconds off the user-perceived response.wren-ai-service/src/pipelines/generation/sql_tables_extraction.py (1)
95-109: Minor: tighten type-hints for public API
llm_provider: LLMProvider, **kwargsis fine, yet the class never consumes
kwargs. Deleting the splat improves the signal/noise ratio and stops IDEs
from flagging the parameter as unused.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (25)
deployment/kustomizations/base/cm.yaml(1 hunks)docker/config.example.yaml(1 hunks)wren-ai-service/docs/config_examples/config.anthropic.yaml(1 hunks)wren-ai-service/docs/config_examples/config.azure.yaml(1 hunks)wren-ai-service/docs/config_examples/config.deepseek.yaml(1 hunks)wren-ai-service/docs/config_examples/config.google_ai_studio.yaml(1 hunks)wren-ai-service/docs/config_examples/config.grok.yaml(1 hunks)wren-ai-service/docs/config_examples/config.groq.yaml(1 hunks)wren-ai-service/docs/config_examples/config.lm_studio.yaml(1 hunks)wren-ai-service/docs/config_examples/config.ollama.yaml(1 hunks)wren-ai-service/docs/config_examples/config.open_router.yaml(1 hunks)wren-ai-service/eval/pipelines.py(2 hunks)wren-ai-service/src/globals.py(3 hunks)wren-ai-service/src/pipelines/generation/__init__.py(2 hunks)wren-ai-service/src/pipelines/generation/sql_tables_extraction.py(1 hunks)wren-ai-service/src/pipelines/retrieval/__init__.py(1 hunks)wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py(3 hunks)wren-ai-service/src/web/v1/services/ask.py(1 hunks)wren-ai-service/src/web/v1/services/question_recommendation.py(1 hunks)wren-ai-service/src/web/v1/services/sql_corrections.py(1 hunks)wren-ai-service/tests/pytest/pipelines/generation/test_ask.py(2 hunks)wren-ai-service/tests/pytest/services/mocks.py(1 hunks)wren-ai-service/tests/pytest/services/test_ask.py(1 hunks)wren-ai-service/tools/config/config.example.yaml(1 hunks)wren-ai-service/tools/config/config.full.yaml(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
wren-ai-service/src/pipelines/generation/__init__.py (1)
wren-ai-service/src/pipelines/generation/sql_tables_extraction.py (1)
SQLTablesExtraction(95-127)
wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py (1)
wren-ai-service/src/core/pipeline.py (1)
BasicPipeline(15-21)
wren-ai-service/src/globals.py (2)
wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py (1)
DbSchemaRetrieval(430-495)wren-ai-service/src/pipelines/generation/sql_tables_extraction.py (1)
SQLTablesExtraction(95-127)
wren-ai-service/eval/pipelines.py (1)
wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py (1)
DbSchemaRetrieval(430-495)
🔇 Additional comments (28)
wren-ai-service/tests/pytest/services/mocks.py (1)
8-8: Good update to align with the DbSchemaRetrieval refactoringThe mock class has been updated to inherit from
retrieval.DbSchemaRetrievalinstead ofretrieval.Retrieval, which matches the refactoring being done throughout the codebase.wren-ai-service/src/pipelines/generation/__init__.py (2)
17-17: LGTM: New SQLTablesExtraction import added correctlyAdding the import for the new SQL table extraction pipeline module.
38-38: LGTM: SQLTablesExtraction exposed in allThe new pipeline class is correctly exposed in the module's public API.
wren-ai-service/src/pipelines/retrieval/__init__.py (2)
1-1: LGTM: Import updated for DbSchemaRetrievalReplacing the
Retrievalimport withDbSchemaRetrievalas part of the schema-focused retrieval refactoring.
12-12: LGTM: all updated to export DbSchemaRetrievalThe public API is correctly updated to export
DbSchemaRetrievalinstead ofRetrieval.wren-ai-service/tests/pytest/services/test_ask.py (1)
43-45: LGTM: Test fixture updated to use DbSchemaRetrievalThe
ask_servicefixture has been correctly updated to instantiateDbSchemaRetrievalinstead ofRetrieval. The configuration key has also been updated from "retrieval" to "db_schema_retrieval", which aligns with the changes to the service configuration.docker/config.example.yaml (1)
140-141: SQL Tables Extraction Pipeline: Approved
The addition ofsql_tables_extractionconfigured withlitellm_llm.defaultis correct and mirrors the entries in other example configs.wren-ai-service/tools/config/config.example.yaml (1)
151-152: SQL Tables Extraction Pipeline: Approved
The newsql_tables_extractionpipeline usinglitellm_llm.defaultis added correctly and consistently with the other tool configuration examples.wren-ai-service/docs/config_examples/config.google_ai_studio.yaml (1)
137-138: SQL Tables Extraction Pipeline: Approved
This pipeline entry is correctly inserted and matches the specification in other example configs.wren-ai-service/docs/config_examples/config.lm_studio.yaml (1)
140-141: SQL Tables Extraction Pipeline: Approved
Thesql_tables_extractionpipeline withlitellm_llm.defaultis properly added and aligns with the broader updates across config examples.wren-ai-service/docs/config_examples/config.anthropic.yaml (1)
137-138: SQL Tables Extraction Pipeline: Approved
The new pipeline entry is correctly configured and consistent with the other example configurations.wren-ai-service/docs/config_examples/config.deepseek.yaml (1)
160-161: Added newsql_tables_extractionpipeline looks good.The addition of this pipeline with the default LLM model is consistent with other pipeline configurations in this file and aligns with the architectural changes to improve SQL correction services.
wren-ai-service/docs/config_examples/config.open_router.yaml (1)
138-139: Pipeline configuration is consistent across configuration files.The
sql_tables_extractionpipeline configuration matches the implementation in other config files, maintaining consistency across different LLM providers.wren-ai-service/src/web/v1/services/ask.py (1)
411-411: Updated pipeline key to the new naming convention.Pipeline key has been updated from
"retrieval"to"db_schema_retrieval", consistent with the project-wide refactoring from genericRetrievalclass to the more specificDbSchemaRetrievalclass.wren-ai-service/src/web/v1/services/question_recommendation.py (1)
68-68: Pipeline key update aligns with system-wide naming changes.Updated pipeline key from
"retrieval"to"db_schema_retrieval"to match the refactored schema retrieval implementation. This change keeps the service consistent with other services in the application.wren-ai-service/docs/config_examples/config.azure.yaml (1)
150-152:Details
❓ Verification inconclusive
New Pipeline Addition Approved
The
sql_tables_extractionpipeline has been correctly added and follows the established pattern of other pipeline definitions. Indentation and field naming align with existing entries.Verify Coverage Across All Example Configs
Ensure that this pipeline addition is consistently applied to all other example configuration files under
wren-ai-service/docs/config_examples/. You can run:This should return no files.
🏁 Script executed:
#!/bin/bash # Verify that 'sql_tables_extraction' appears in all example configs fd 'config\..*\.yaml' wren-ai-service/docs/config_examples --exec grep -L 'sql_tables_extraction' {}Length of output: 102
Let’s re-run the check using a simpler find+grep approach to ensure we catch every YAML:
#!/bin/bash # List any example config YAML missing the new pipeline find wren-ai-service/docs/config_examples -name '*.yaml' -exec grep -L 'sql_tables_extraction' {} +
Let’s try a simpler find+grep to surface any YAML missing the new pipeline:
#!/bin/bash # Check all example configs for missing 'sql_tables_extraction' find wren-ai-service/docs/config_examples -type f -name '*.yaml' -exec grep -L 'sql_tables_extraction' {} +
Approve & Verify Pipeline in All Example Configs
The
sql_tables_extractionentry looks correct and matches the existing pattern. Please confirm that it appears in every example config underwren-ai-service/docs/config_examples:
- Run the following to find any YAML missing the new pipeline:
grep -RL 'sql_tables_extraction' wren-ai-service/docs/config_examples/*.yaml- This command should return no files.
wren-ai-service/tools/config/config.full.yaml (1)
150-153: Full Config Pipeline SyncThe
sql_tables_extractionpipeline is properly inserted in the full configuration file, mirroring the structure of other pipelines.wren-ai-service/docs/config_examples/config.ollama.yaml (1)
138-140: Ollama Config UpdatedThe new pipeline entry for SQL table extraction aligns with the pipeline conventions and correct
llmprovider usage.deployment/kustomizations/base/cm.yaml (1)
190-192: Deployment ConfigMap UpdatedThe
sql_tables_extractionpipeline has been added to thewren-ai-service-configConfigMap and accurately reflects the application’s pipeline schema.Please verify that environment-specific overlays or example manifests also include this new pipeline entry.
wren-ai-service/docs/config_examples/config.grok.yaml (1)
142-143: New sql_tables_extraction pipeline added.This change adds a new pipeline configuration for
sql_tables_extractionthat utilizes thelitellm_llm.defaultLLM. This aligns with the new SQL table extraction functionality mentioned in the AI summary.wren-ai-service/tests/pytest/pipelines/generation/test_ask.py (2)
9-9: Renamed import from Retrieval to DbSchemaRetrieval.The import has been updated to use the more specific
DbSchemaRetrievalclass from the corresponding module, which better reflects its purpose for database schema retrieval.
37-37: Instantiation updated to use DbSchemaRetrieval.The code now creates an instance of
DbSchemaRetrievalinstead ofRetrieval, matching the class rename. The functionality remains the same but with a more descriptive name.wren-ai-service/eval/pipelines.py (2)
211-211: Updated retrieval instance to DbSchemaRetrieval.The
RetrievalPipelineclass now instantiatesretrieval.DbSchemaRetrievalinstead ofretrieval.Retrieval, maintaining consistent naming across the codebase.
376-376: Updated retrieval instance to DbSchemaRetrieval in AskPipeline.The
AskPipelineclass now also usesretrieval.DbSchemaRetrievalinstead ofretrieval.Retrieval, ensuring consistent usage of the renamed class throughout the codebase.wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py (3)
179-196: Optimized query execution with early return for empty conditions.Added a conditional check to only build and execute filters when
table_name_conditionsis non-empty. This optimization prevents unnecessary query execution when there are no tables to filter on, improving performance.
430-430: Renamed class from Retrieval to DbSchemaRetrieval.The class has been renamed from
RetrievaltoDbSchemaRetrievalto better reflect its specific purpose of retrieving database schema information, improving code readability and maintenance.
502-502: Updated dry run pipeline to use DbSchemaRetrieval.The dry run test code has been updated to use the renamed
DbSchemaRetrievalclass, maintaining consistency throughout the codebase.wren-ai-service/src/globals.py (1)
100-105:Details
✅ Verification successful
Verify that
pipe_components["db_schema_retrieval"]is configured in every deployment profile
DbSchemaRetrievalis now required by both AskService and SqlCorrectionService.
If any environment (e.g. local, staging, SaaS) is still shipping an olderpipe_componentsmap that lacks thedb_schema_retrievalkey,create_service_container()will immediately raise aKeyErroron application start-up.
🏁 Script executed:
#!/bin/bash # Sanity-check all config files tracked in the repo for the new key fd -e yaml -e yml -a | xargs -I{} rg -q "db_schema_retrieval" {} || \ echo "❌ Some config variants are still missing db_schema_retrieval"Length of output: 203
I’ll list which YAML config files are missing the
db_schema_retrievalkey so we can update them.
🏁 Script executed:
#!/bin/bash # List all YAML files that don’t contain the db_schema_retrieval key fd -e yaml -e yml -a | xargs -I{} rg -L "db_schema_retrieval" {}Length of output: 1168
Let’s use ripgrep’s globbing to reliably list YAML files missing the key:
#!/bin/bash # List all .yaml/.yml config files that don’t contain db_schema_retrieval rg --glob '*.yaml' --glob '*.yml' -L "db_schema_retrieval"
Let’s try a more reliable
find | grep -Lapproach to list exactly which config files are missing the key:
🏁 Script executed:
#!/bin/bash # List all .yaml/.yml files that don’t contain "db_schema_retrieval" find . -type f \( -name "*.yaml" -o -name "*.yml" \) -print0 \ | xargs -0 grep -L "db_schema_retrieval"Length of output: 1884
Let’s narrow down to only those configs that actually define
pipe_componentsand see which of them lack the new key:#!/bin/bash # Find YAML files that mention pipe_components but are missing db_schema_retrieval find . -type f \( -name "*.yaml" -o -name "*.yml" \) -exec grep -l "pipe_components" {} \; \ | xargs grep -L "db_schema_retrieval"
Let’s run a more robust ripgrep pipeline to find only the configs that define
pipe_componentsbut lack the new key:
🏁 Script executed:
#!/bin/bash set -euo pipefail # Find YAML files referencing pipe_components but not db_schema_retrieval rg --files-with-matches 'pipe_components' -g '*.yaml' -g '*.yml' \ | xargs -r rg -L 'db_schema_retrieval'Length of output: 121
All deployment configs include
db_schema_retrievalI searched every YAML/YML file that defines
pipe_componentsand found none missing thedb_schema_retrievalentry. All deployment profiles have been updated—no further changes needed.
| if not valid: | ||
| error = invalid[0]["error"] | ||
| raise Exception( | ||
| f"Unable to correct the SQL query. Error: {error}. Please try with a different SQL query or simplify your request." | ||
| error_message = invalid[0]["error"] | ||
| self._handle_exception( | ||
| event_id, | ||
| f"An error occurred during SQL correction: {error_message}", | ||
| trace_id=trace_id, | ||
| ) | ||
| else: | ||
| corrected = valid[0]["sql"] | ||
| self._cache[event_id] = self.Event( | ||
| event_id=event_id, | ||
| status="finished", | ||
| trace_id=trace_id, | ||
| response=corrected, | ||
| ) |
There was a problem hiding this comment.
🛠️ Refactor suggestion
invalid[0] can raise IndexError when the LLM returns nothing
A rare but observed failure mode is the generator replying with an empty list
for both valid_generation_results and invalid_generation_results. Accessing
invalid[0]["error"] will then crash the handler you just improved.
- error_message = invalid[0]["error"]
+ error_message = invalid[0]["error"] if invalid else "No correction candidates returned"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if not valid: | |
| error = invalid[0]["error"] | |
| raise Exception( | |
| f"Unable to correct the SQL query. Error: {error}. Please try with a different SQL query or simplify your request." | |
| error_message = invalid[0]["error"] | |
| self._handle_exception( | |
| event_id, | |
| f"An error occurred during SQL correction: {error_message}", | |
| trace_id=trace_id, | |
| ) | |
| else: | |
| corrected = valid[0]["sql"] | |
| self._cache[event_id] = self.Event( | |
| event_id=event_id, | |
| status="finished", | |
| trace_id=trace_id, | |
| response=corrected, | |
| ) | |
| if not valid: | |
| error_message = invalid[0]["error"] if invalid else "No correction candidates returned" | |
| self._handle_exception( | |
| event_id, | |
| f"An error occurred during SQL correction: {error_message}", | |
| trace_id=trace_id, | |
| ) | |
| else: | |
| corrected = valid[0]["sql"] | |
| self._cache[event_id] = self.Event( | |
| event_id=event_id, | |
| status="finished", | |
| trace_id=trace_id, | |
| response=corrected, | |
| ) |
🤖 Prompt for AI Agents (early access)
In wren-ai-service/src/web/v1/services/sql_corrections.py around lines 105 to 119, the code accesses invalid[0]["error"] without checking if invalid is non-empty, which can raise an IndexError if invalid is empty. To fix this, add a check to ensure invalid is not empty before accessing invalid[0], and handle the case where both valid and invalid lists are empty by providing a fallback error message or handling logic to avoid the crash.
| "sql": sql, | ||
| "error": error, | ||
| } | ||
|
|
||
| tables = ( | ||
| await self._pipelines["sql_tables_extraction"].run( | ||
| sql=sql, | ||
| ) | ||
| )["post_process"] | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Handle empty / malformed responses from sql_tables_extraction
tables = (await …)["post_process"] presumes that:
- the pipeline returns a dict,
- the key
post_processexists, and - the value is a non-empty list.
Any deviation (e.g. the LLM fails JSON-mode, returns [], or the pipeline
throws) will bubble into the general except, lose the root cause and surface
as “OTHERS”. Guard early so we can surface a user-friendly message and still
attempt a correction without table context:
- tables = (await self._pipelines["sql_tables_extraction"].run(sql=sql))["post_process"]
+ extraction_res = await self._pipelines["sql_tables_extraction"].run(sql=sql)
+ tables = extraction_res.get("post_process", [])
+ if not tables:
+ logger.warning("No tables extracted for event %s – continuing without schema context", event_id)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "sql": sql, | |
| "error": error, | |
| } | |
| tables = ( | |
| await self._pipelines["sql_tables_extraction"].run( | |
| sql=sql, | |
| ) | |
| )["post_process"] | |
| "sql": sql, | |
| "error": error, | |
| } | |
| extraction_res = await self._pipelines["sql_tables_extraction"].run(sql=sql) | |
| tables = extraction_res.get("post_process", []) | |
| if not tables: | |
| logger.warning( | |
| "No tables extracted for event %s – continuing without schema context", | |
| event_id, | |
| ) |
🤖 Prompt for AI Agents (early access)
In wren-ai-service/src/web/v1/services/sql_corrections.py around lines 73 to 82, the code assumes the pipeline response is a dict containing a non-empty 'post_process' list, which can cause unhandled exceptions if the response is empty, malformed, or missing the key. Modify the code to safely check if the response is a dict, verify the 'post_process' key exists, and confirm its value is a non-empty list before accessing it. If these conditions are not met, handle the case gracefully by providing a user-friendly message and proceeding without table context to avoid masking the root cause in the general exception handler.
| documents = ( | ||
| ( | ||
| await self._pipelines["db_schema_retrieval"].run( | ||
| project_id=project_id, | ||
| tables=tables, | ||
| ) | ||
| ) | ||
| .get("construct_retrieval_results", {}) | ||
| .get("retrieval_results", []) | ||
| ) | ||
| table_ddls = [document.get("table_ddl") for document in documents] | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
documents may be None – protect downstream list comprehension
DbSchemaRetrieval could legitimately return an empty dict if no matching DDL
is found. Performing [d.get("table_ddl") for d in documents] on None
raises a TypeError.
- documents = (...).get("retrieval_results", [])
- table_ddls = [document.get("table_ddl") for document in documents]
+ documents = (...).get("retrieval_results") or []
+ table_ddls = [doc.get("table_ddl") for doc in documents if doc.get("table_ddl")]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| documents = ( | |
| ( | |
| await self._pipelines["db_schema_retrieval"].run( | |
| project_id=project_id, | |
| tables=tables, | |
| ) | |
| ) | |
| .get("construct_retrieval_results", {}) | |
| .get("retrieval_results", []) | |
| ) | |
| table_ddls = [document.get("table_ddl") for document in documents] | |
| documents = ( | |
| ( | |
| await self._pipelines["db_schema_retrieval"].run( | |
| project_id=project_id, | |
| tables=tables, | |
| ) | |
| ) | |
| .get("construct_retrieval_results", {}) | |
| .get("retrieval_results") | |
| or [] | |
| ) | |
| table_ddls = [doc.get("table_ddl") for doc in documents if doc.get("table_ddl")] |
🤖 Prompt for AI Agents (early access)
In wren-ai-service/src/web/v1/services/sql_corrections.py around lines 83 to 94, the variable 'documents' may be None if the 'DbSchemaRetrieval' returns an empty dict, causing a TypeError in the list comprehension. To fix this, ensure 'documents' is assigned an empty list if the retrieval results are None before the list comprehension, so that iterating over it is safe.
| async def post_process( | ||
| extract_sql_tables: dict, | ||
| ) -> list[str]: | ||
| return orjson.loads(extract_sql_tables.get("replies")[0])["tables"] |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Add robust JSON parsing & fallback logic
A single malformed LLM response currently raises a hard exception, cascading
into a service-wide failure. Wrap parsing in a try/except and return an empty
list on failure so downstream pipelines can decide how to proceed.
- return orjson.loads(extract_sql_tables.get("replies")[0])["tables"]
+ try:
+ reply = extract_sql_tables.get("replies", [])[0]
+ tables = orjson.loads(reply).get("tables", [])
+ if not isinstance(tables, list):
+ raise ValueError("`tables` is not a list")
+ return tables
+ except (IndexError, orjson.JSONDecodeError, ValueError) as exc:
+ logger.warning("Unable to parse tables from LLM response: %s", exc)
+ return []📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def post_process( | |
| extract_sql_tables: dict, | |
| ) -> list[str]: | |
| return orjson.loads(extract_sql_tables.get("replies")[0])["tables"] | |
| async def post_process( | |
| extract_sql_tables: dict, | |
| ) -> list[str]: | |
| try: | |
| reply = extract_sql_tables.get("replies", [])[0] | |
| tables = orjson.loads(reply).get("tables", []) | |
| if not isinstance(tables, list): | |
| raise ValueError("`tables` is not a list") | |
| return tables | |
| except (IndexError, orjson.JSONDecodeError, ValueError) as exc: | |
| logger.warning("Unable to parse tables from LLM response: %s", exc) | |
| return [] |
🤖 Prompt for AI Agents (early access)
In wren-ai-service/src/pipelines/generation/sql_tables_extraction.py around lines 71 to 74, the JSON parsing of extract_sql_tables["replies"][0] is not protected against malformed data, causing hard exceptions. Wrap the orjson.loads call in a try/except block and return an empty list if parsing fails, ensuring the function handles errors gracefully without crashing the service.
There was a problem hiding this comment.
Actionable comments posted: 3
🔭 Outside diff range comments (1)
wren-ai-service/src/web/v1/services/ask.py (1)
753-756:⚠️ Potential issueInconsistent pipeline reference in ask_feedback method.
While you've updated the pipeline key from "retrieval" to "db_schema_retrieval" in the
askmethod, there's still a reference to the old "retrieval" pipeline in theask_feedbackmethod. This inconsistency could cause issues.Apply this change:
- self._pipelines["retrieval"].run( + self._pipelines["db_schema_retrieval"].run(
♻️ Duplicate comments (1)
wren-ai-service/src/globals.py (1)
203-207: Same concern applies to QuestionRecommendationSee the previous comment – the same defensive check is advisable here as well.
🧹 Nitpick comments (5)
wren-ai-service/src/globals.py (1)
257-264: Consider extracting common retrieval wiring to avoid duplication
sql_correction_service,ask_service, andquestion_recommendationnow all repeat the exactDbSchemaRetrievalinstantiation block. If the signature changes again (e.g., a newtop_kparameter) you must patch three places.A tiny helper can keep the container DRY:
def _build_db_schema_retrieval(pipe_components, settings): return retrieval.DbSchemaRetrieval( **pipe_components["db_schema_retrieval"], table_retrieval_size=settings.table_retrieval_size, table_column_retrieval_size=settings.table_column_retrieval_size, )… and then:
"db_schema_retrieval": _build_db_schema_retrieval(pipe_components, settings),wren-ai-service/src/web/v1/services/sql_corrections.py (2)
106-120: Initial event status is never cachedClients polling
/sql-corrections/{event_id}before the coroutine finishes will get “not found”.
Persist a “correcting” placeholder right at the beginning:self._cache[event_id] = self.Event(event_id=event_id, status="correcting")Place this before the
try:block.
123-124: Minor nit: redundant f-string
str(e)is already a string.-f"An error occurred during SQL correction: {str(e)}" +f"An error occurred during SQL correction: {e}"wren-ai-service/src/pipelines/generation/sql_tables_extraction.py (2)
84-92: Minor: schema can be passed directly without the extranamewrapperOpenAI/Anthropic JSON-mode APIs accept the JSON schema object directly; the additional
namefield is unnecessary unless the provider explicitly requires it. If not needed, you can simplify:"response_format": { - "type": "json_schema", - "json_schema": { - "name": "sql_tables_extraction_result", - "schema": SQLTablesExtractionResult.model_json_schema(), - }, + "type": "json_schema", + "json_schema": SQLTablesExtractionResult.model_json_schema(), },This keeps the payload minimal and avoids confusion about the top-level structure.
111-113: Passself._componentstoBasicPipelinefor clarity
BasicPipelineis instantiated with only the HamiltonAsyncDriver.
Becauseself._componentsis referenced later inrun(), the current code works, but the dependency is implicit. For better readability and to avoid future refactor gotchas, consider handing the component dict to the super-class (or storing it in a dedicated field exposed byBasicPipeline) so the wiring is explicit.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (25)
deployment/kustomizations/base/cm.yaml(1 hunks)docker/config.example.yaml(1 hunks)wren-ai-service/docs/config_examples/config.anthropic.yaml(1 hunks)wren-ai-service/docs/config_examples/config.azure.yaml(1 hunks)wren-ai-service/docs/config_examples/config.deepseek.yaml(1 hunks)wren-ai-service/docs/config_examples/config.google_ai_studio.yaml(1 hunks)wren-ai-service/docs/config_examples/config.grok.yaml(1 hunks)wren-ai-service/docs/config_examples/config.groq.yaml(1 hunks)wren-ai-service/docs/config_examples/config.lm_studio.yaml(1 hunks)wren-ai-service/docs/config_examples/config.ollama.yaml(1 hunks)wren-ai-service/docs/config_examples/config.open_router.yaml(1 hunks)wren-ai-service/eval/pipelines.py(2 hunks)wren-ai-service/src/globals.py(3 hunks)wren-ai-service/src/pipelines/generation/__init__.py(2 hunks)wren-ai-service/src/pipelines/generation/sql_tables_extraction.py(1 hunks)wren-ai-service/src/pipelines/retrieval/__init__.py(1 hunks)wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py(3 hunks)wren-ai-service/src/web/v1/services/ask.py(1 hunks)wren-ai-service/src/web/v1/services/question_recommendation.py(1 hunks)wren-ai-service/src/web/v1/services/sql_corrections.py(1 hunks)wren-ai-service/tests/pytest/pipelines/generation/test_ask.py(2 hunks)wren-ai-service/tests/pytest/services/mocks.py(1 hunks)wren-ai-service/tests/pytest/services/test_ask.py(1 hunks)wren-ai-service/tools/config/config.example.yaml(1 hunks)wren-ai-service/tools/config/config.full.yaml(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
wren-ai-service/src/pipelines/generation/__init__.py (1)
wren-ai-service/src/pipelines/generation/sql_tables_extraction.py (1)
SQLTablesExtraction(95-127)
wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py (1)
wren-ai-service/src/core/pipeline.py (1)
BasicPipeline(15-21)
wren-ai-service/src/web/v1/services/ask.py (1)
wren-ai-service/src/pipelines/retrieval/historical_question_retrieval.py (2)
run(23-34)run(151-161)
wren-ai-service/eval/pipelines.py (1)
wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py (1)
DbSchemaRetrieval(430-495)
🔇 Additional comments (30)
wren-ai-service/tests/pytest/services/mocks.py (1)
8-8: LGTM: Correct inheritance update from Retrieval to DbSchemaRetrievalThe mock class now correctly inherits from the renamed
DbSchemaRetrievalclass instead of the previousRetrievalclass, maintaining consistency with the changes in the main codebase.docker/config.example.yaml (1)
140-141: LGTM: Properly configured new SQL tables extraction pipelineThe new
sql_tables_extractionpipeline is correctly configured with the default LLM provider. This pipeline will extract table names from SQL queries, which aligns with the broader changes to the SQL correction workflow.wren-ai-service/src/pipelines/retrieval/__init__.py (2)
1-1: LGTM: Correct import update for the renamed retrieval classThe import statement has been properly updated to use the renamed
DbSchemaRetrievalclass instead of the previousRetrievalclass.
12-12: LGTM: Correctly updated export list with the renamed classThe
__all__list has been updated to include"DbSchemaRetrieval"instead of"Retrieval", ensuring the renamed class is properly exposed through the module's public API.wren-ai-service/src/pipelines/generation/__init__.py (2)
17-17: LGTM: Properly imported the new SQLTablesExtraction classThe import statement correctly brings in the new
SQLTablesExtractionclass from the corresponding module.
38-38: LGTM: Correctly exposed the new class through the module's APIThe
__all__list has been updated to include"SQLTablesExtraction", ensuring the new class is properly exposed through the module's public API.wren-ai-service/docs/config_examples/config.open_router.yaml (1)
138-139: Thesql_tables_extractionpipeline addition looks good.This change adds a new pipeline for extracting table names from SQL queries, which will be used in the SQL correction workflow. The configuration correctly specifies using the
litellm_llm.defaultLLM without requiring embedder or document store components.wren-ai-service/docs/config_examples/config.lm_studio.yaml (1)
140-141: Thesql_tables_extractionpipeline addition looks good.This change adds a new pipeline for extracting table names from SQL queries, which will be used in the SQL correction workflow. The configuration correctly specifies using the
litellm_llm.defaultLLM without requiring embedder or document store components.wren-ai-service/tests/pytest/services/test_ask.py (1)
43-45:Details
✅ Verification successful
Updated retrieval pipeline reference to use more semantically appropriate name.
The change correctly updates the test service to use
DbSchemaRetrievalinstead of the more genericRetrievalclass, which better reflects its specific purpose of retrieving database schema information. The component parameters are also correctly updated to usepipe_components["db_schema_retrieval"].
🏁 Script executed:
#!/bin/bash # Verify if the DbSchemaRetrieval class is imported and exported correctly echo "Checking imports for DbSchemaRetrieval..." rg -A 2 "from src.pipelines.retrieval import DbSchemaRetrieval" echo "Checking if the DbSchemaRetrieval class is defined..." rg -A 5 "class DbSchemaRetrieval"Length of output: 897
🏁 Script executed:
#!/bin/bash # Locate how `retrieval` is imported in the test file rg -n "import.*retrieval" wren-ai-service/tests/pytest/services/test_ask.py rg -n "from .*retrieval" wren-ai-service/tests/pytest/services/test_ask.pyLength of output: 274
🏁 Script executed:
#!/bin/bash # Check that DbSchemaRetrieval is exported in the retrieval package __init__.py rg -n "DbSchemaRetrieval" wren-ai-service/src/pipelines/retrieval/__init__.py || echo "No direct reference found"Length of output: 158
Approve updated retrieval pipeline reference
The test now correctly uses
retrieval.DbSchemaRetrieval, and the class is properly defined and exported insrc/pipelines/retrieval. No further changes required.wren-ai-service/tools/config/config.full.yaml (1)
151-152:Details
✅ Verification successful
The
sql_tables_extractionpipeline addition looks good.This change adds a new pipeline for extracting table names from SQL queries, which will be used in the SQL correction workflow. The configuration correctly specifies using the
litellm_llm.defaultLLM without requiring embedder or document store components.
🏁 Script executed:
#!/bin/bash # Check if the SQLTablesExtraction class is implemented echo "Checking for SQLTablesExtraction implementation..." rg -A 10 "class SQLTablesExtraction" # Check where the new pipeline is used in the codebase echo "Checking usage of sql_tables_extraction pipeline..." rg -A 5 "\"sql_tables_extraction\"" # Check how it's integrated with the SQL correction workflow echo "Checking integration with SQL correction workflow..." rg -A 10 "sql_tables_extraction.*sql_correction"Length of output: 4048
sql_tables_extraction pipeline addition approved
- Confirmed
SQLTablesExtractionis implemented inwren-ai-service/src/pipelines/generation/sql_tables_extraction.py.- Verified pipeline registration under
"sql_tables_extraction"inwren-ai-service/src/globals.py.- Confirmed invocation in the SQL correction workflow in
wren-ai-service/src/web/v1/services/sql_corrections.py.- Configuration in
wren-ai-service/tools/config/config.full.yaml(lines 151–152) correctly specifieslitellm_llm.defaultwithout needing embedder or document store components.wren-ai-service/docs/config_examples/config.google_ai_studio.yaml (1)
146-147: Confirm pipeline configuration consistency forsql_tables_extraction.The new pipeline entry correctly uses the name
sql_tables_extractionand specifies onlyllm: litellm_llm.default, matching the implementation insrc/pipelines/generation/sql_tables_extraction.py. Please verify that no additional parameters (e.g., embedder, document_store, or engine) are required by the runtime.wren-ai-service/docs/config_examples/config.grok.yaml (1)
142-143: Confirm pipeline configuration consistency forsql_tables_extraction.The newly added pipeline entry aligns with the expected
sql_tables_extractionimplementation, usingllm: litellm_llm.default. Ensure that the service correctly handles a pipeline defined without embedder, document_store, or engine settings.wren-ai-service/docs/config_examples/config.deepseek.yaml (1)
160-161: Confirm pipeline configuration consistency forsql_tables_extraction.The pipeline addition matches the implementation’s signature (
sql_tables_extractionwithllm: litellm_llm.default). Double-check that no other fields are needed for proper execution in the SQL tables extraction workflow.wren-ai-service/docs/config_examples/config.groq.yaml (1)
141-142: Confirm pipeline configuration consistency forsql_tables_extraction.The new entry is correctly named and uses
llm: litellm_llm.default, consistent withSQLTablesExtractionlogic. Verify that the pipeline loader supports a minimal configuration (LLM only) for table extraction.wren-ai-service/docs/config_examples/config.ollama.yaml (1)
138-139: Confirm pipeline configuration consistency forsql_tables_extraction.The addition of
sql_tables_extractionwithllm: litellm_llm.defaultis consistent across example configs. Please ensure the service properly initializes this pipeline without additional embedder or document_store settings.wren-ai-service/docs/config_examples/config.azure.yaml (1)
150-151: New pipeline configuration added for SQL table extraction.This new pipeline configuration for
sql_tables_extractionthat uses the default LLM aligns with the PR objective to extract SQL tables before performing corrections. This is a necessary addition to support the architectural change in the SQL correction workflow.wren-ai-service/src/web/v1/services/ask.py (1)
411-416: Pipeline key updated from "retrieval" to "db_schema_retrieval".This change correctly implements the migration from the generic
retrievalpipeline to the more specializeddb_schema_retrievalpipeline, ensuring that database schema information is properly retrieved when processing user queries.wren-ai-service/src/web/v1/services/question_recommendation.py (1)
68-71: Pipeline key updated from "retrieval" to "db_schema_retrieval".This change correctly implements the migration from the generic
retrievalpipeline to the more specializeddb_schema_retrievalpipeline in the question recommendation service. The update maintains consistency with other service changes in the PR.wren-ai-service/tools/config/config.example.yaml (1)
151-152: New pipeline configuration added for SQL table extraction.This new pipeline configuration for
sql_tables_extractionis consistent with the changes in other configuration files and supports the architectural change to extract SQL tables before performing corrections.wren-ai-service/docs/config_examples/config.anthropic.yaml (1)
137-138: LGTM: New SQL tables extraction pipeline added correctly.The new
sql_tables_extractionpipeline has been properly configured to use the default LLM model. This addition aligns with the broader changes introducing a pipeline to extract table names from SQL queries for improved SQL correction workflows.deployment/kustomizations/base/cm.yaml (1)
190-192: LGTM: SQL tables extraction pipeline properly added to Kubernetes ConfigMap.The new pipeline definition is consistent with the configuration pattern used throughout this file and matches the changes made in other configuration files.
wren-ai-service/tests/pytest/pipelines/generation/test_ask.py (2)
9-9: Successfully migrated to DbSchemaRetrieval import.The import statement has been correctly updated to use the new
DbSchemaRetrievalclass from the appropriate module.
37-37: Successfully updated retrieval pipeline instantiation.The retrieval pipeline instantiation has been properly updated to use
DbSchemaRetrievalinstead ofRetrievalwhile maintaining the same parameter structure.wren-ai-service/eval/pipelines.py (2)
211-211: Successfully migrated RetrievalPipeline to use DbSchemaRetrieval.The
RetrievalPipelineclass now correctly uses the more specializedDbSchemaRetrievalclass while maintaining the same initialization parameters.
376-376: Successfully migrated AskPipeline to use DbSchemaRetrieval.The
AskPipelineclass has been properly updated to use theDbSchemaRetrievalclass, consistent with the changes throughout the codebase.wren-ai-service/src/globals.py (1)
100-105: Guard against missingdb_schema_retrievalpipeline in configuration
pipe_components["db_schema_retrieval"]is now mandatory for theask_service.
If a deployer upgrades the code but forgets to add the new block to the YAML/ENV config,create_service_containerwill raise aKeyErrorand the application will fail during boot.try: "db_schema_retrieval": retrieval.DbSchemaRetrieval( **pipe_components["db_schema_retrieval"], table_retrieval_size=settings.table_retrieval_size, table_column_retrieval_size=settings.table_column_retrieval_size, ), except KeyError as exc: raise RuntimeError( "`db_schema_retrieval` is missing from the pipeline config" ) from excAdding a short defensive check (or a descriptive error message) shields operators from a non-obvious startup failure.
wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py (2)
179-196: Early-return optimisation is good, but watch out for explicit-table callsShort-circuiting when
table_name_conditionsis empty prevents a needless query – nice!
However, an upstream caller may pass an explicittables=[](empty list) on purpose. In that case the function now returns[], which later stages silently treat as “no schema” and may degrade the answer quality without explanation.Recommendation:
-if table_name_conditions: +if not table_name_conditions: + logger.debug("No table names found; skipping DB-schema retrieval") + return [] + +# Existing logicAdding a log (and possibly surfacing a warning in the API response) helps debugging why no context was retrieved.
430-457: Renamed class needs explicit re-export & docs
class DbSchemaRetrievalreplaces the oldRetrieval. Ensure that:
src/pipelines/retrieval/__init__.pyaddsDbSchemaRetrievalto__all__sofrom … import retrievalcontinues to work.- Any external SDK / README mentioning
Retrievalis updated to avoid broken imports.Failure to do so will surface as
ImportErrorfor library users pinning past examples.wren-ai-service/src/web/v1/services/sql_corrections.py (1)
128-128: Cache may miss metadata if an unexpected exception occurs earlier
self._cache[event_id]is read unconditionally. In extremely early failures (e.g., I/O error before we set the cache), this will raiseKeyError, masking the original error.Wrap in
self._cache.get(event_id) or self.Event(event_id=event_id, status="failed").wren-ai-service/src/pipelines/generation/sql_tables_extraction.py (1)
66-68: Verify that the generator’s call signature matches the one used here
generatoris invoked with a single keyword argumentprompt=....
Depending on the concrete LLM wrapper returned byllm_provider.get_generator, the callable might instead expect:
- a positional
prompt/messagesargument,- additional generation parameters (temperature, max_tokens, …),
- or a fully-formed dictionary containing the model, prompt, etc.
A mismatch will raise a
TypeErrorat runtime and the whole pipeline will fail before we even reach post-processing. Please double-check the provider’s contract or guard this call accordingly.
| res = await self._pipelines["sql_correction"].run( | ||
| contexts=[], | ||
| contexts=table_ddls, | ||
| invalid_generation_results=[_invalid], | ||
| project_id=request.project_id, | ||
| project_id=project_id, | ||
| ) |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Fallback when no schema context is found
If table_ddls ends up empty the LLM will receive zero context, which often yields another invalid SQL. A low-effort safety net:
if not table_ddls:
logger.warning("No DDL context found; falling back to correction without context")Optionally rerun db_schema_retrieval with a similarity search fallback.
🤖 Prompt for AI Agents (early access)
In wren-ai-service/src/web/v1/services/sql_corrections.py around lines 95 to 99, add a check before calling the SQL correction pipeline to see if table_ddls is empty. If it is empty, log a warning message indicating no DDL context was found and optionally rerun the db_schema_retrieval with a similarity search fallback to provide some context. This prevents the LLM from receiving zero context and generating invalid SQL.
| tables = ( | ||
| await self._pipelines["sql_tables_extraction"].run( | ||
| sql=sql, | ||
| ) | ||
| )["post_process"] | ||
|
|
||
| documents = ( | ||
| ( | ||
| await self._pipelines["db_schema_retrieval"].run( | ||
| project_id=project_id, | ||
| tables=tables, | ||
| ) | ||
| ) | ||
| .get("construct_retrieval_results", {}) | ||
| .get("retrieval_results", []) | ||
| ) | ||
| table_ddls = [document.get("table_ddl") for document in documents] | ||
|
|
There was a problem hiding this comment.
Validate output of sql_tables_extraction before use
tables = (await …)["post_process"] assumes:
- the pipeline succeeds;
- the key
post_processexists; - the value is an iterable of table names.
If the LLM responds with an unexpected shape (common) a KeyError or TypeError will propagate and the whole correction fails.
-tables = (
- await self._pipelines["sql_tables_extraction"].run(sql=sql)
-)["post_process"]
+extraction_res = await self._pipelines["sql_tables_extraction"].run(sql=sql)
+
+tables = extraction_res.get("post_process") or []
+if not isinstance(tables, (list, tuple)):
+ self._handle_exception(
+ event_id,
+ "Table-extraction failed or returned invalid format",
+ trace_id=trace_id,
+ )
+ return self._cache[event_id].with_metadata()Fail-fast keeps the service responsive and prevents puzzling stack traces.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| tables = ( | |
| await self._pipelines["sql_tables_extraction"].run( | |
| sql=sql, | |
| ) | |
| )["post_process"] | |
| documents = ( | |
| ( | |
| await self._pipelines["db_schema_retrieval"].run( | |
| project_id=project_id, | |
| tables=tables, | |
| ) | |
| ) | |
| .get("construct_retrieval_results", {}) | |
| .get("retrieval_results", []) | |
| ) | |
| table_ddls = [document.get("table_ddl") for document in documents] | |
| # Validate output of sql_tables_extraction before use | |
| extraction_res = await self._pipelines["sql_tables_extraction"].run(sql=sql) | |
| tables = extraction_res.get("post_process") or [] | |
| if not isinstance(tables, (list, tuple)): | |
| self._handle_exception( | |
| event_id, | |
| "Table-extraction failed or returned invalid format", | |
| trace_id=trace_id, | |
| ) | |
| return self._cache[event_id].with_metadata() | |
| documents = ( | |
| ( | |
| await self._pipelines["db_schema_retrieval"].run( | |
| project_id=project_id, | |
| tables=tables, | |
| ) | |
| ) | |
| .get("construct_retrieval_results", {}) | |
| .get("retrieval_results", []) | |
| ) | |
| table_ddls = [document.get("table_ddl") for document in documents] |
🤖 Prompt for AI Agents (early access)
In wren-ai-service/src/web/v1/services/sql_corrections.py around lines 77 to 94, the code assumes the pipeline call returns a successful result containing the key "post_process" with an iterable of table names, but this may not always be true and can cause KeyError or TypeError. To fix this, add validation after awaiting the pipeline to check if the result is a dict, contains the "post_process" key, and that its value is an iterable. If any check fails, raise a clear exception or handle the error to fail fast and avoid confusing stack traces.
| @observe(capture_input=False) | ||
| async def post_process( | ||
| extract_sql_tables: dict, | ||
| ) -> list[str]: | ||
| return orjson.loads(extract_sql_tables.get("replies")[0])["tables"] | ||
|
|
There was a problem hiding this comment.
post_process is brittle – add validation & graceful error handling
The function blindly assumes extract_sql_tables has the structure
{"replies": ["{...valid-json...}"]} and that the JSON contains a tables key. Any of the following will currently explode with an uncaught exception:
- provider returns an error object instead of
replies, - the text is not valid JSON (model hallucination / truncation),
- the JSON schema is violated (missing or non-list
tables), - the list contains duplicates that should be deduplicated.
Consider hardening this step and leveraging the SQLTablesExtractionResult Pydantic model you already defined:
@observe(capture_input=False)
async def post_process(
extract_sql_tables: dict,
) -> list[str]:
- return orjson.loads(extract_sql_tables.get("replies")[0])["tables"]
+ try:
+ raw_reply = extract_sql_tables["replies"][0]
+ result = SQLTablesExtractionResult.model_validate_json(raw_reply)
+ except (KeyError, IndexError, ValueError, orjson.JSONDecodeError) as exc:
+ logger.error("Malformed LLM response in sql_tables_extraction: %s", exc)
+ # Decide how you want to surface the error – raising keeps the failure explicit
+ raise
+
+ # Remove duplicates while preserving order
+ seen = set()
+ tables = [t for t in result.tables if not (t in seen or seen.add(t))]
+ return tablesThis ensures the pipeline fails fast with a meaningful log message and prevents downstream components from receiving corrupted data.
🤖 Prompt for AI Agents (early access)
In wren-ai-service/src/pipelines/generation/sql_tables_extraction.py around lines 70 to 75, the post_process function assumes extract_sql_tables always contains a valid "replies" list with JSON having a "tables" key, which can cause uncaught exceptions if the structure is invalid or JSON parsing fails. To fix this, add validation to check the presence and type of "replies" and its first element, handle JSON parsing errors gracefully, validate the parsed data against the SQLTablesExtractionResult Pydantic model, and deduplicate the tables list before returning it. Include meaningful error logging and raise exceptions to fail fast if validation fails.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (4)
wren-ai-service/docs/config_examples/config.open_router.yaml (1)
138-139: Verify YAML indentation consistencyEnsure the indentation (two spaces) for the new
- name: sql_tables_extractionentry matches other pipeline items to avoid YAML parsing errors.wren-ai-service/docs/config_examples/config.azure.yaml (1)
150-151: Verify indentation of new pipeline entryConfirm that the new
- name: sql_tables_extractionblock uses the same two-space indentation as other entries underpipes.wren-ai-service/docs/config_examples/config.deepseek.yaml (1)
160-161: Ensure YAML indentation consistencyDouble-check that the indentation for the new
sql_tables_extractionentry aligns with other pipeline definitions (two spaces).wren-ai-service/tests/pytest/services/test_ask.py (1)
43-45: Consider aligning component key with implementation classThe service component is registered under the key "retrieval" but is instantiated with DbSchemaRetrieval class and uses configuration from "db_schema_retrieval". This mismatch between the dictionary key and the implementation details could lead to confusion.
Consider renaming the component key to match its implementation:
- "retrieval": retrieval.DbSchemaRetrieval( + "db_schema_retrieval": retrieval.DbSchemaRetrieval(This would make the code more consistent and self-documenting.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (25)
deployment/kustomizations/base/cm.yaml(1 hunks)docker/config.example.yaml(1 hunks)wren-ai-service/docs/config_examples/config.anthropic.yaml(1 hunks)wren-ai-service/docs/config_examples/config.azure.yaml(1 hunks)wren-ai-service/docs/config_examples/config.deepseek.yaml(1 hunks)wren-ai-service/docs/config_examples/config.google_ai_studio.yaml(1 hunks)wren-ai-service/docs/config_examples/config.grok.yaml(1 hunks)wren-ai-service/docs/config_examples/config.groq.yaml(1 hunks)wren-ai-service/docs/config_examples/config.lm_studio.yaml(1 hunks)wren-ai-service/docs/config_examples/config.ollama.yaml(1 hunks)wren-ai-service/docs/config_examples/config.open_router.yaml(1 hunks)wren-ai-service/eval/pipelines.py(2 hunks)wren-ai-service/src/globals.py(3 hunks)wren-ai-service/src/pipelines/generation/__init__.py(2 hunks)wren-ai-service/src/pipelines/generation/sql_tables_extraction.py(1 hunks)wren-ai-service/src/pipelines/retrieval/__init__.py(1 hunks)wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py(3 hunks)wren-ai-service/src/web/v1/services/ask.py(1 hunks)wren-ai-service/src/web/v1/services/question_recommendation.py(1 hunks)wren-ai-service/src/web/v1/services/sql_corrections.py(1 hunks)wren-ai-service/tests/pytest/pipelines/generation/test_ask.py(2 hunks)wren-ai-service/tests/pytest/services/mocks.py(1 hunks)wren-ai-service/tests/pytest/services/test_ask.py(1 hunks)wren-ai-service/tools/config/config.example.yaml(1 hunks)wren-ai-service/tools/config/config.full.yaml(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
wren-ai-service/src/pipelines/generation/__init__.py (1)
wren-ai-service/src/pipelines/generation/sql_tables_extraction.py (1)
SQLTablesExtraction(95-127)
wren-ai-service/src/globals.py (2)
wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py (1)
DbSchemaRetrieval(430-495)wren-ai-service/src/pipelines/generation/sql_tables_extraction.py (1)
SQLTablesExtraction(95-127)
wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py (1)
wren-ai-service/src/core/pipeline.py (1)
BasicPipeline(15-21)
🔇 Additional comments (40)
wren-ai-service/tests/pytest/pipelines/generation/test_ask.py (2)
9-9: Import statement updated to use DbSchemaRetrieval instead of Retrieval.The change correctly updates the import to use the more specific
DbSchemaRetrievalclass from the new module path, which aligns with the broader refactoring effort across the codebase.
37-37: Pipeline instantiation updated to use DbSchemaRetrieval.The test now correctly instantiates the
DbSchemaRetrievalclass and passes the appropriate component configuration. This change maintains the test's functionality while adapting to the refactored retrieval mechanism.wren-ai-service/tests/pytest/services/mocks.py (1)
8-8: Mock class inheritance updated to DbSchemaRetrieval.The
RetrievalMockclass has been correctly updated to inherit fromretrieval.DbSchemaRetrievalinstead ofretrieval.Retrieval, ensuring the mock aligns with the refactored implementation while maintaining the same interface and behavior.docker/config.example.yaml (1)
140-141:Details
✅ Verification successful
New pipeline added for SQL tables extraction.
The new
sql_tables_extractionpipeline is properly configured to use the default LLM. This addition supports the new functionality for extracting SQL table names to improve schema context for SQL correction flows.
🏁 Script executed:
#!/bin/bash # Check if the sql_tables_extraction implementation exists in the codebase echo "Searching for the SQL tables extraction implementation file..." fd -t f "sql_tables_extraction.py" src/pipelines echo "Checking for references to this pipeline in the codebase..." rg "sql_tables_extraction" srcLength of output: 488
Let’s broaden the search to the entire repo to ensure we pick up any non-standard paths:
🏁 Script executed:
#!/bin/bash # Search for the pipeline implementation file anywhere in the repository echo "🔍 Looking for sql_tables_extraction.py across the repo..." fd -t f "sql_tables_extraction.py" # Search for any references to this pipeline name in the code echo "🔍 Scanning for references to 'sql_tables_extraction' in all files..." rg -n "sql_tables_extraction" .Length of output: 2787
sql_tables_extraction pipeline verified
The new
sql_tables_extractionpipeline is fully implemented and correctly registered:• Implementation:
wren-ai-service/src/pipelines/generation/sql_tables_extraction.py
• Registration:wren-ai-service/src/globals.py(lines 257–258)
• Configuration examples updated:
–docker/config.example.yaml(lines 140–141)
–wren-ai-service/tools/config/config.example.yaml
–wren-ai-service/docs/config_examples/*All references and config entries align. No further changes required.
wren-ai-service/docs/config_examples/config.anthropic.yaml (1)
137-138: Consistent pipeline configuration added across example configs.The same
sql_tables_extractionpipeline configuration has been correctly added to the Anthropic configuration example, ensuring consistency across different deployment options. This change properly maintains alignment with the docker example configuration.wren-ai-service/docs/config_examples/config.open_router.yaml (1)
138-139: Add SQL tables extraction to pipeline listThe
sql_tables_extractionpipeline is now correctly included using only the LLM provider, matching its implementation signature.wren-ai-service/src/pipelines/generation/__init__.py (2)
17-17: ExposeSQLTablesExtractionin generation moduleThe import of
SQLTablesExtractionis added correctly, enabling its discovery by the generation pipeline registry.
38-38: IncludeSQLTablesExtractionin__all__Adding
"SQLTablesExtraction"to__all__ensures it is exported as part of the public API for this package.wren-ai-service/src/pipelines/retrieval/__init__.py (2)
1-1: Rename retrieval pipeline import toDbSchemaRetrievalThe import has been updated to reference the renamed
DbSchemaRetrievalclass, aligning with its new module.
12-12: Update__all__to exportDbSchemaRetrievalIncluding
"DbSchemaRetrieval"in__all__maintains consistency in the public API after the class rename.wren-ai-service/docs/config_examples/config.azure.yaml (1)
150-151: Addsql_tables_extractionpipeline entryThe new pipeline is correctly added to the Azure config, specifying only the LLM provider as required by its implementation.
wren-ai-service/docs/config_examples/config.deepseek.yaml (1)
160-161: Includesql_tables_extractionin Deepseek pipelineThe
sql_tables_extractionpipeline entry is correctly added, matching the implementation requirement for LLM-only configuration.wren-ai-service/docs/config_examples/config.groq.yaml (1)
141-142: LGTM: Clean addition of sql_tables_extraction pipelineThe addition of the new sql_tables_extraction pipeline is well-structured and consistent with other pipeline definitions in the file.
wren-ai-service/docs/config_examples/config.ollama.yaml (1)
138-139: LGTM: Consistent addition of sql_tables_extraction pipelineThe new sql_tables_extraction pipeline is added consistently with the same configuration as in other config files.
wren-ai-service/docs/config_examples/config.lm_studio.yaml (1)
140-141: LGTM: Clean addition of sql_tables_extraction pipelineThe addition of the sql_tables_extraction pipeline with the litellm_llm.default model aligns with the other configuration files.
wren-ai-service/docs/config_examples/config.google_ai_studio.yaml (1)
146-147: LGTM: Consistent sql_tables_extraction pipeline configurationThe new sql_tables_extraction pipeline is correctly configured with the litellm_llm.default model, consistent with other config files.
wren-ai-service/tools/config/config.example.yaml (1)
151-152: Configuration added for new SQL table extraction pipeline.The addition of the
sql_tables_extractionpipeline is consistent with the overall changes in this PR, which introduces a specialized pipeline for extracting table names from SQL queries using an LLM. This is properly configured to use the default LLM specified in the config.wren-ai-service/tools/config/config.full.yaml (1)
151-152: Consistent configuration for the SQL table extraction pipeline.The addition of the
sql_tables_extractionpipeline matches the implementation across other configuration files. Good to see consistency in pipeline configuration across different environment setups.wren-ai-service/docs/config_examples/config.grok.yaml (1)
142-143: SQL tables extraction pipeline added to Grok config example.The
sql_tables_extractionpipeline has been properly added to this example configuration file as well, maintaining consistency across all configuration templates in the project.wren-ai-service/src/web/v1/services/question_recommendation.py (1)
68-69: Updated pipeline key to use the renamed retrieval pipeline.The pipeline key has been updated from
"retrieval"to"db_schema_retrieval", which aligns with the refactoring of the retrieval mechanism throughout the codebase. This change ensures consistency with the renamedDbSchemaRetrievalclass.deployment/kustomizations/base/cm.yaml (1)
190-192: LGTM: Addition of thesql_tables_extractionpipeline.The new pipeline configuration is correctly structured with the required LLM specified. This aligns with the PR objective to add SQL table extraction functionality.
wren-ai-service/src/web/v1/services/ask.py (1)
411-416: Pipeline reference updated correctly.The change from
"retrieval"to"db_schema_retrieval"is consistent with the broader refactoring where the retrieval component has been renamed.wren-ai-service/eval/pipelines.py (2)
211-215: Class reference updated correctly.The change from
retrieval.Retrievaltoretrieval.DbSchemaRetrievalaligns with the refactoring of the retrieval component while maintaining the same interface.
376-380: Class reference updated consistently.Similar to the previous change, the
AskPipelineclass has been correctly updated to useDbSchemaRetrieval.wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py (3)
179-196: Improved efficiency with conditional filtering.The refactored code now avoids unnecessary query operations when no table names are provided. This optimization prevents constructing and executing empty filters.
430-430: Class rename aligns with functionality.Renaming
RetrievaltoDbSchemaRetrievalbetter communicates the specific purpose of this class, making the codebase more maintainable.
502-502: Updated reference in dry run pipeline.The dry run pipeline reference has been consistently updated to use the new class name.
wren-ai-service/src/globals.py (3)
100-104: LGTM! Pipeline key renamed for better clarity.The change from "retrieval" to "db_schema_retrieval" improves the code by using a more specific and descriptive pipeline key name that better reflects its purpose. This makes the code more maintainable and easier to understand.
203-207: LGTM! Consistent renaming of pipeline key.This change maintains consistency with the pipeline renaming pattern used throughout the codebase. The pipeline parameters for table retrieval are appropriately carried over.
257-264: Well-structured pipeline enhancement for SQL correction.The addition of two new pipelines to the SQL correction service:
- "sql_tables_extraction" - Extracts table names from the SQL query
- "db_schema_retrieval" - Fetches relevant database schema based on extracted tables
This creates a more intelligent SQL correction flow that can leverage database schema context.
wren-ai-service/src/web/v1/services/sql_corrections.py (5)
66-69: LGTM! Parameter extraction improves readability.Extracting the request parameters into well-named local variables makes the code more readable and maintainable.
77-94: Strong implementation of the enhanced correction workflow.This code segment effectively implements a 3-step workflow:
- Extract table names from the SQL using the new "sql_tables_extraction" pipeline
- Use these table names to fetch relevant schema information
- Prepare table DDL statements as context for correction
This approach intelligently narrows down the schema context to only what's relevant for the SQL query.
96-99: LGTM! Improved context for SQL correction.Adding table DDLs as context for the SQL correction pipeline will significantly improve the quality of corrections, as the LLM now has access to the relevant schema information.
106-119: LGTM! Refactored error handling with better variable usage.This refactored error handling code improves upon the previous implementation by:
- Using the extracted
event_idvariable consistently- Centralizing error handling through the
_handle_exceptionmethod- Maintaining better code structure with clear condition branches
128-128: LGTM! Consistent cache key usage.Using the extracted
event_idvariable for cache access ensures consistency throughout the method.wren-ai-service/src/pipelines/generation/sql_tables_extraction.py (5)
18-54: Well-crafted prompts for table extraction.The system and user prompts are:
- Clear in their instruction to extract tables from SQL
- Provide helpful examples covering various SQL patterns
- Specify a structured JSON output format
- Include a "step by step" thinking prompt to improve LLM reasoning
This is a robust approach for leveraging an LLM to extract table names from SQL queries.
57-76: Clean pipeline implementation with observability.The pipeline functions are:
- Well-separated with clear responsibilities
- Decorated with observability annotations that avoid capturing sensitive inputs
- Concise and focused on their specific tasks
- Follow the established pattern for pipeline implementations
The JSON parsing in
post_processcorrectly extracts the "tables" field from the LLM response.
80-92: LGTM! Well-structured schema validation.Using Pydantic for schema validation is an excellent approach to ensure the LLM returns data in the expected format. The configuration properly sets up JSON schema validation for the LLM provider.
95-127: LGTM! Clean pipeline class implementation.The
SQLTablesExtractionclass:
- Follows the standard pipeline class structure
- Properly initializes required components with appropriate configuration
- Has a focused
runmethod with clear logging- Uses dependency injection for the LLM provider
This implementation maintains consistency with the rest of the codebase and is easy to understand.
130-137: Helpful dry-run capability.The dry-run implementation allows for easy testing of the pipeline in isolation, which is valuable for development and debugging. The simple test case with "SELECT * FROM table" provides a good baseline test.
imAsterSun
left a comment
There was a problem hiding this comment.
reviewed and approved
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Documentation
Tests