feat(wren-ai-service): sql pairs#961
Conversation
0d5aa5d to
bf2515b
Compare
1f952c1 to
3142348
Compare
3142348 to
63999d4
Compare
WalkthroughThis pull request introduces significant enhancements to the Wren AI service, focusing on SQL pairs management, pipeline restructuring, and removal of timer-related functionality. The changes include new pipelines for SQL pairs preparation, deletion, and retrieval, a consolidation of module imports, and the removal of the Changes
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (41)
wren-ai-service/src/web/v1/services/chart_adjustment.py (1)
9-9: Confirmed import replacement fromasync_timertotrace_metadata.Previously, the
async_timerimport was used for timing. Now you’ve replaced it withtrace_metadata, which focuses on tracing rather than capturing performance metrics. This is okay if timing is no longer needed. If performance measurement or timing is still required, consider adding an alternative.wren-ai-service/src/pipelines/retrieval/sql_pairs_retrieval.py (6)
1-16: Ensure consistent logging strategy for debugging and production.
While the imports and logging setup appear fine, consider whetherlogger.setLevel(...)or additional contextual logs are needed to help debug issues in production, especially given the asynchronous nature of this pipeline.
18-34: Enhance documentation for OutputFormatter.
This class is straightforward, but consider adding a docstring or clarifying comment explaining how the output structure is intended to be consumed by downstream processes.
61-82: Ensure filter conditions align with Qdrant indexing.
Ifidis empty or None, thefiltersobject is null. Confirm that an unfiltered retrieval is appropriate. This could cause performance/memory issues when the index is large.
84-90: Parameterize score thresholds in filtered_documents.
The filter usesscore=0.7as a hard-coded threshold. Consider making it configurable, possibly from environment variables or pipeline settings, to allow more flexible tuning.
105-127: Add docstring for the SqlPairsRetrieval class.
This class orchestrates the retrieval pipeline. A brief docstring explaining the main usage, inputs, outputs, and expected environment would help future maintainers.
141-148: Remove or secure the main block in production.
This main block is useful for local testing but may not be intended for production. Consider guarding it with anif DEBUG:check or removing it in deployment contexts.wren-ai-service/src/web/v1/services/sql_pairs_preparation.py (6)
1-12: Consistently handle exceptions in service modules.
While logging is present, consider a centralized error-handling mechanism or standardized exception classes for consistent error responses across all service modules.
34-37: Validate the necessity of a dedicated response model.
SqlPairsPreparationResponseonly has a single field. Consider if it could be combined with the request or a more general wrap.
39-55: Service layering for deletion.
DeleteSqlPairsRequestandDeleteSqlPairsResponsemirror the prepare request/response. Validate usage to avoid duplication. If differences are minimal, a shared model might be more maintainable.
62-69: Assess error details in SqlPairsPreparationStatusResponse.
SqlPairsPreparationErroruses a very general codeOTHERS. If more error categories exist or are needed, consider enumerating them for clearer telemetry and troubleshooting.
71-81: Ensure TTLCache suits concurrency and memory constraints.
The cache is set to a potentially largemaxsize=1_000_000. Confirm that this usage pattern doesn’t cause memory issues under high load.
169-188: Distinct logging for “id not found” in get_prepare_sql_pairs_status.
Currently logs as anexception. Consider logging at a warning or info level if it’s expected that some requests might query missing IDs.wren-ai-service/src/pipelines/indexing/sql_pairs_preparation.py (6)
1-21: Add docstrings or references for domain usage.
The imports and usage are not fully explained in the code. For maintainability, clarify in docstrings how these providers and classes fit into the broader SQL pairs preparation architecture.
80-87: Document usage of prompts function.
This function transforms raw SQL pairs into prompt dicts. A short docstring explaining the rationale for theprompt_builderwould help future maintainers.
89-99: Parallel tasks with gather: handle partial failures.
If one generation task fails,asyncio.gather()raises, discarding all results. Consider usingreturn_exceptions=Trueor a fallback mechanism if partial progress is acceptable.
126-134: Check embedding context.
If the documents are large, consider chunking or partial embedding strategies. Large documents can cause memory or performance bottlenecks.
157-160: Expand on field usage in SqlIntentionGenerationResult.
Currently, onlyintentionis stored. If more fields are needed later (e.g., potential error code, data lineage), consider extending this model now for future-proofing.
173-217: Centralize pipeline component initialization.
SqlPairsPreparationincludes numerous components in_components. If more logic is added, consider splitting into separate factory methods or classes to maintain clarity.wren-ai-service/src/globals.py (13)
9-9: Optimize import style.
While wildcard imports can be convenient, consider using explicit imports for clarity and to avoid naming conflicts, especially in large codebases.
20-21: Consistent naming for new services.
SqlExplanationServiceandSqlPairsPreparationServicefollow a different naming style from older classes. Confirm naming guidelines to keep the codebase consistent.
39-41: Documentation for newly introduced class members.
sql_explanation_service,sql_regeneration_service, andsql_pairs_preparation_serviceare newly added. Add docstrings describing their purpose.
61-61: Track semantic version changes for generation pipeline.
Modules likegeneration.SemanticsDescriptionare newly integrated. If new features are introduced, reflect that in service versioning or changelogs.
84-87: Standardize specialized pipeline references.
intent_classificationanddata_assistanceare grouped here. Confirm these belong in generation pipelines. Possibly rename them to reflect domain usage.
100-102: Assess performance implications of sql_pairs_retrieval.
It’s newly introduced, and retrieving large volumes of pairs can be memory-intensive. Evaluate the need for pagination or partial retrieval.
119-122: Guard or configure chart generation.
Chart generation can be resource-hungry. If runtime constraints exist, consider gating or caching repeated chart generation requests.
130-133: Enforce consistent naming between chart_generation and chart_adjustment.
One pipeline ischart_generation, the other ischart_adjustment. Validate that this difference is intentional and documented.
152-155: Docstring improvement for ask_details_service.
Explain howsql_breakdownandsql_summarypipelines collectively handle the request.
174-180: Consolidate naming for sql_summary.
sql_summaryis used in multiple services. If they differ in usage or logic, clarify via better naming or shared references.
182-188: Potential to unify explanation logic.
sql_explanation_servicemight overlap partially withsql_summary. If so, consider unifying or layering them to reduce duplication.
190-198: Document advanced usage for sql_regeneration_service.
Regeneration can be complex. Provide docstrings or references regarding the logic behind regenerating queries.
221-231: Integration tests for SQL pairs prep.
Before finalizing, ensure integration tests are available that callsql_pairs_preparation_serviceend to end, verifying typical and edge use cases (e.g., empty requests, large-scale requests).wren-ai-service/src/pipelines/retrieval/__init__.py (1)
1-5: Import clarity for new retrieval classes.
You have introduced five new imports here. Ensure each class is truly needed by external modules and that no cyclical dependencies occur.wren-ai-service/src/pipelines/indexing/sql_pairs_deletion.py (1)
59-67: Dry run invocation
Providing a main block for quick debugging or demonstration is helpful. Any advanced dry-run logic (e.g., fetching test data) could be added if needed.wren-ai-service/src/pipelines/retrieval/historical_question_retrieval.py (1)
89-89: [Parameterization suggestion] The threshold value 0.9 might be preferable as a configurable setting.
Hardcodingscore=0.9may hamper flexibility if different retrieval contexts demand varying thresholds. Consider passing this as a parameter or configuring it externally for better adaptability.wren-ai-service/src/pipelines/generation/sql_generation.py (1)
58-62: Include sanitization or validation for sample data.These lines introduce a template section for SQL samples. Consider validating or sanitizing the
sample.summaryandsample.sqlfields to avoid potential injection or formatting issues in downstream usage.wren-ai-service/src/web/v1/services/ask.py (1)
270-276: Newsql_pairs_retrievalpipeline usage.
The introduction ofsql_pairs_retrievalis coherent with the PR’s objective to manage SQL pairs. Ensure you handle empty results or errors consistently in the pipeline.docker/config.example.yaml (1)
113-113: Fix trailing whitespace.Remove the trailing space after
openai_embedder.text-embedding-3-large.- embedder: openai_embedder.text-embedding-3-large + embedder: openai_embedder.text-embedding-3-large🧰 Tools
🪛 yamllint (1.35.1)
[error] 113-113: trailing spaces
(trailing-spaces)
wren-ai-service/src/pipelines/indexing/__init__.py (1)
90-110: Consider handling the return value from delete_documents.The implementation looks good, but the return value from
delete_documentsmight contain useful information about the number of documents deleted. Consider either:
- Removing the explicit return statement since it's not needed, or
- Returning the result to the caller for potential logging or verification.
- return await self._sql_pairs_store.delete_documents(filters) + result = await self._sql_pairs_store.delete_documents(filters) + logger.info(f"Deleted {result} SQL pair documents") + return resultwren-ai-service/tools/config/config.example.yaml (1)
131-131: Fix trailing whitespace.Remove the trailing space after
openai_embedder.text-embedding-3-large.- embedder: openai_embedder.text-embedding-3-large + embedder: openai_embedder.text-embedding-3-large🧰 Tools
🪛 yamllint (1.35.1)
[error] 131-131: trailing spaces
(trailing-spaces)
wren-ai-service/tools/config/config.full.yaml (1)
150-150: Remove trailing whitespaceThere is a trailing space after
openai_embedder.text-embedding-3-large.- embedder: openai_embedder.text-embedding-3-large + embedder: openai_embedder.text-embedding-3-large🧰 Tools
🪛 yamllint (1.35.1)
[error] 150-150: trailing spaces
(trailing-spaces)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (65)
deployment/kustomizations/base/cm.yaml(2 hunks)docker/config.example.yaml(1 hunks)wren-ai-service/README.md(0 hunks)wren-ai-service/docs/configuration.md(0 hunks)wren-ai-service/eval/evaluation.py(1 hunks)wren-ai-service/eval/pipelines.py(2 hunks)wren-ai-service/src/config.py(0 hunks)wren-ai-service/src/globals.py(8 hunks)wren-ai-service/src/pipelines/generation/__init__.py(1 hunks)wren-ai-service/src/pipelines/generation/chart_adjustment.py(1 hunks)wren-ai-service/src/pipelines/generation/chart_generation.py(0 hunks)wren-ai-service/src/pipelines/generation/data_assistance.py(0 hunks)wren-ai-service/src/pipelines/generation/followup_sql_generation.py(6 hunks)wren-ai-service/src/pipelines/generation/intent_classification.py(0 hunks)wren-ai-service/src/pipelines/generation/sql_answer.py(0 hunks)wren-ai-service/src/pipelines/generation/sql_breakdown.py(0 hunks)wren-ai-service/src/pipelines/generation/sql_correction.py(0 hunks)wren-ai-service/src/pipelines/generation/sql_expansion.py(0 hunks)wren-ai-service/src/pipelines/generation/sql_explanation.py(0 hunks)wren-ai-service/src/pipelines/generation/sql_generation.py(4 hunks)wren-ai-service/src/pipelines/generation/sql_regeneration.py(0 hunks)wren-ai-service/src/pipelines/generation/sql_summary.py(0 hunks)wren-ai-service/src/pipelines/indexing/__init__.py(1 hunks)wren-ai-service/src/pipelines/indexing/sql_pairs_deletion.py(1 hunks)wren-ai-service/src/pipelines/indexing/sql_pairs_preparation.py(1 hunks)wren-ai-service/src/pipelines/retrieval/__init__.py(1 hunks)wren-ai-service/src/pipelines/retrieval/historical_question_retrieval.py(5 hunks)wren-ai-service/src/pipelines/retrieval/preprocess_sql_data.py(0 hunks)wren-ai-service/src/pipelines/retrieval/retrieval.py(0 hunks)wren-ai-service/src/pipelines/retrieval/sql_executor.py(0 hunks)wren-ai-service/src/pipelines/retrieval/sql_pairs_retrieval.py(1 hunks)wren-ai-service/src/utils.py(0 hunks)wren-ai-service/src/web/development.py(0 hunks)wren-ai-service/src/web/v1/routers/__init__.py(2 hunks)wren-ai-service/src/web/v1/routers/ask.py(0 hunks)wren-ai-service/src/web/v1/routers/ask_details.py(0 hunks)wren-ai-service/src/web/v1/routers/semantics_preparation.py(0 hunks)wren-ai-service/src/web/v1/routers/sql_answers.py(0 hunks)wren-ai-service/src/web/v1/routers/sql_expansions.py(0 hunks)wren-ai-service/src/web/v1/routers/sql_explanations.py(0 hunks)wren-ai-service/src/web/v1/routers/sql_pairs_preparation.py(1 hunks)wren-ai-service/src/web/v1/routers/sql_regenerations.py(0 hunks)wren-ai-service/src/web/v1/services/ask.py(5 hunks)wren-ai-service/src/web/v1/services/ask_details.py(1 hunks)wren-ai-service/src/web/v1/services/chart.py(1 hunks)wren-ai-service/src/web/v1/services/chart_adjustment.py(1 hunks)wren-ai-service/src/web/v1/services/semantics_preparation.py(0 hunks)wren-ai-service/src/web/v1/services/sql_answer.py(1 hunks)wren-ai-service/src/web/v1/services/sql_expansion.py(1 hunks)wren-ai-service/src/web/v1/services/sql_explanation.py(1 hunks)wren-ai-service/src/web/v1/services/sql_pairs_preparation.py(1 hunks)wren-ai-service/src/web/v1/services/sql_regeneration.py(1 hunks)wren-ai-service/tests/data/config.test.yaml(1 hunks)wren-ai-service/tests/pytest/pipelines/indexing/test_sql_pairs_deletion.py(1 hunks)wren-ai-service/tests/pytest/pipelines/indexing/test_sql_pairs_preparation.py(1 hunks)wren-ai-service/tests/pytest/services/mocks.py(4 hunks)wren-ai-service/tests/pytest/services/test_ask.py(3 hunks)wren-ai-service/tests/pytest/services/test_ask_details.py(2 hunks)wren-ai-service/tests/pytest/services/test_sql_pairs_preparation.py(1 hunks)wren-ai-service/tests/pytest/test_config.py(0 hunks)wren-ai-service/tests/pytest/test_main.py(1 hunks)wren-ai-service/tests/pytest/test_utils.py(1 hunks)wren-ai-service/tools/config/config.example.yaml(1 hunks)wren-ai-service/tools/config/config.full.yaml(1 hunks)wren-ui/README.md(0 hunks)
💤 Files with no reviewable changes (28)
- wren-ai-service/src/config.py
- wren-ui/README.md
- wren-ai-service/tests/pytest/test_config.py
- wren-ai-service/src/web/v1/routers/sql_explanations.py
- wren-ai-service/src/web/v1/routers/ask_details.py
- wren-ai-service/src/web/v1/services/semantics_preparation.py
- wren-ai-service/README.md
- wren-ai-service/docs/configuration.md
- wren-ai-service/src/web/v1/routers/sql_answers.py
- wren-ai-service/src/web/development.py
- wren-ai-service/src/pipelines/retrieval/preprocess_sql_data.py
- wren-ai-service/src/web/v1/routers/sql_regenerations.py
- wren-ai-service/src/pipelines/generation/data_assistance.py
- wren-ai-service/src/pipelines/retrieval/sql_executor.py
- wren-ai-service/src/pipelines/generation/sql_breakdown.py
- wren-ai-service/src/pipelines/generation/sql_answer.py
- wren-ai-service/src/web/v1/routers/sql_expansions.py
- wren-ai-service/src/utils.py
- wren-ai-service/src/web/v1/routers/ask.py
- wren-ai-service/src/pipelines/generation/sql_expansion.py
- wren-ai-service/src/web/v1/routers/semantics_preparation.py
- wren-ai-service/src/pipelines/generation/sql_correction.py
- wren-ai-service/src/pipelines/generation/intent_classification.py
- wren-ai-service/src/pipelines/generation/sql_summary.py
- wren-ai-service/src/pipelines/generation/sql_explanation.py
- wren-ai-service/src/pipelines/generation/sql_regeneration.py
- wren-ai-service/src/pipelines/generation/chart_generation.py
- wren-ai-service/src/pipelines/retrieval/retrieval.py
🧰 Additional context used
🪛 yamllint (1.35.1)
deployment/kustomizations/base/cm.yaml
[error] 132-132: syntax error: could not find expected ':'
(syntax)
wren-ai-service/tools/config/config.full.yaml
[error] 150-150: trailing spaces
(trailing-spaces)
docker/config.example.yaml
[error] 113-113: trailing spaces
(trailing-spaces)
wren-ai-service/tools/config/config.example.yaml
[error] 131-131: trailing spaces
(trailing-spaces)
🔇 Additional comments (90)
wren-ai-service/src/pipelines/generation/followup_sql_generation.py (6)
3-3: Import statement looks good.
The newly imported Dict is now utilized for the sql_samples parameter. This aligns well with the code's usage of typed dictionaries in the subsequent changes.
107-116: Check for missing or invalid data in template rendering.
You’re conditionally injecting {{sample.summary}} and {{sample.sql}} within the prompt. It might be worth adding validation or at least a fallback if summary or sql fields are missing or None. This ensures the rendered prompt doesn't break if partial data is passed.
133-133: Optional parameter addition is consistent with Python 3.10+.
Declaring sql_samples: List[Dict] | None = None requires Python 3.10 or above (for the union operator “|”). Ensure the runtime environment is updated accordingly if it isn’t already.
145-145: Parameter forwarding to prompt_builder.run() is valid.
Forwarding sql_samples here is consistent with the prompt function’s updated signature and ensures that sql_samples are utilized in the Jinja template.
220-220: Extended method signature for run().
Adding the sql_samples parameter broadens the function’s scope to incorporate SQL examples. Verify that all upstream invocations provide or handle this parameter properly.
232-232: Ensure consistent usage of sql_samples in downstream pipelines.
The dictionary key sql_samples is correctly added. Verify that any follow-up or third-party pipeline steps expecting additional data still function as intended.
wren-ai-service/src/web/v1/services/chart_adjustment.py (1)
Line range hint 42-65: Double-check for potential gaps left by removing the @async_timer decorator.
The removal of the @async_timer decorator might impact any monitoring, performance analysis, or logging processes that previously relied on it. If there's still a need for execution time tracking, ensure that you have another suitable mechanism in place or confirm that timing is no longer a project requirement.
wren-ai-service/src/pipelines/retrieval/sql_pairs_retrieval.py (4)
36-51: Validate filters logic in count_documents.
When id is None, filters are not applied. Ensure that this is intended and that it won’t return documents from unintended projects. If project isolation is crucial, you may want to handle the None case differently.
53-59: Check for empty embeddings usage.
If count_documents is zero, the function returns {}. Validate that an empty embedding dict does not lead to unexpected behavior in later steps, e.g., silent failures or bypassed logic.
92-100: Handle potential empty outputs in formatted_output.
When filtered_documents is empty or doesn’t contain the key "documents", the function returns an empty list by default. Verify the calling code properly handles the scenario of zero documents.
129-139: Confirm resilience in the run method.
The pipeline depends on external components (_components). If any are improperly initialized or fail during execution, confirm that the pipeline either retries or raises a clear exception.
wren-ai-service/src/web/v1/services/sql_pairs_preparation.py (3)
20-34: Ensure each SqlPairsPreparationRequest has a unique query_id.
The _query_id is optional but is used in status tracking. An accidental reuse of query_id may overwrite a previous request’s status.
57-60: Consider input validation for sql_pairs_preparation_id.
No validation is performed on sql_pairs_preparation_id. Add checks for format or allowed characters, especially if used as a key in caches or logs.
83-125: Add concurrency or partial-completion checks in prepare_sql_pairs.
The method updates the cache status on success/failure. If multiple calls with the same query_id occur concurrently, the status might fluctuate unexpectedly. Consider locking or unique IDs.
wren-ai-service/src/pipelines/indexing/sql_pairs_preparation.py (5)
51-78: Consider adding validation or a fallback for incomplete fields in the run method (SqlPairsDescriptionConverter).
If sql_pair data is missing keys, .get() returns None. Confirm that id or sql is always present to avoid empty documents or indexing errors.
117-124: Input validation for convert_sql_pairs_to_documents.
Similar to the SqlPairsDescriptionConverter, ensure fields are present and typed correctly.
136-147: Confirm the rationale for calling delete_sql_pairs within the same pipeline.
delete_sql_pairs references sql_pairs_cleaner. If these pairs are also needed for embedding or further processing, ensure the cleaning logic is indeed desired here.
162-171: Monitor JSON schema validity.
Ensure the defined JSON schema in SQL_INTENTION_GENERATION_MODEL_KWARGS remains in sync if additional fields are added to SqlIntentionGenerationResult.
219-226: Validate the final pipeline run for empty input.
If sql_pairs=[], the pipeline might skip certain steps. Confirm that this path is tested and returns a meaningful response to the caller.
wren-ai-service/src/globals.py (5)
96-99: Check historical_question_retrieval usage.
Renaming historical_question to historical_question_retrieval might require changes in tests or data references. Verify consistent usage across the codebase.
105-108: Check completeness of sql_generation and sql_correction logic.
These pipelines can be interdependent. If one fails, confirm the fallback or error route is properly handled.
141-144: Potential concurrency checks in preprocess_sql_data.
If preprocess_sql_data fetches or transforms large data sets, confirm concurrency usage is safe.
168-171: Remove or confirm necessity of sql_correction in expansion.
sql_expansion sometimes duplicates logic found in sql_correction. Confirm there’s no overlap or repeated overhead.
Line range hint 206-215: Verify that retrieval pipelines are correctly reused.
question_recommendation reuses database retrieval logic. Confirm that the pipeline handles different search contexts without collisions.
wren-ai-service/src/pipelines/retrieval/__init__.py (1)
7-13: Use the all list consistently.
Explicitly listing public exports in __all__ is a good practice. Periodically check if new classes need to be added or removed as the module evolves.
wren-ai-service/src/web/v1/routers/__init__.py (4)
11-11: Use of singular naming improves clarity
Renaming semantics_preparations to semantics_preparation provides a clearer singular concept. This is consistent with best practices for naming modules and routes.
15-15: New router inclusion for SQL Pairs Preparation
Great addition of sql_pairs_preparation. This aligns with the PR objective of expanding SQL pairs functionality.
25-25: Confirm references to updated router name
Ensure references to the old module or plural naming (semantics_preparations) are removed throughout the codebase to avoid confusion or import errors.
32-32: SQL pairs preparation route
This newly added route effectively wires up the SQL Pairs Preparation service. Looks good for handling the new functionality.
wren-ai-service/src/pipelines/generation/__init__.py (2)
1-16: Comprehensive imports
All newly imported classes are succinctly grouped. This enhances discoverability and modular usage in downstream code. Ensure these imports do not introduce circular dependencies.
18-35: Explicit public API
Defining __all__ clarifies the intended exports and prevents accidental usage of internal classes. Good practice!
wren-ai-service/tests/pytest/pipelines/indexing/test_sql_pairs_deletion.py (4)
10-11: Asynchronous test function
Using @pytest.mark.asyncio is correct for testing async code. This allows for better concurrency handling while verifying pipeline behaviors.
21-24: Multiple SQL pairs insertion
The SqlPair instantiation ensures each pair has a unique identifier. This tests that deletion operations can handle multiple entries, which is a solid coverage scenario.
37-38: Validation after first deletion
The assertion that the document count is still 2 provides a partial deletion scenario verification. This is good for ensuring partial or selective deletions operate as expected.
39-42: Full deletion scenario
Asserting 0 documents remain ensures complete deletion functionality is verified. This thoroughly validates the end-to-end flow of the pipeline.
wren-ai-service/src/pipelines/indexing/sql_pairs_deletion.py (3)
16-23: Leverage SqlPairsCleaner effectively
The delete_sql_pairs function simplifies the pipeline approach by delegating deletion to sql_pairs_cleaner. This encapsulation is clean and maintainable.
29-40: Pipeline constructor is coherent
Passing in a DocumentStoreProvider and building a dictionary of components is a structured approach. Ensure that the dataset_name="sql_pairs" remains consistent with any future expansions (e.g. multi-dataset support).
45-58: Final pipeline execution
Using self._pipe.execute within run is flexible, allowing additional steps to be added if the pipeline grows. The observe decorator aligns well with logging and monitoring.
wren-ai-service/tests/pytest/services/mocks.py (5)
3-3: Consolidated imports look good.
The consolidated import from src.pipelines helps keep the imports cleaner. No issues here.
16-16: Ensure consistent naming usage.
HistoricalQuestionMock now inherits from HistoricalQuestionRetrieval. Verify all references to HistoricalQuestionMock and historical_question.HistoricalQuestion across the codebase are updated or removed as needed.
24-24: Inheritance updated to IntentClassification.
Confirm that the prior import references to intent_classification.IntentClassification have been removed or updated.
34-34: Refactored to use generation.SQLGeneration.
Check all references to sql_generation.SQLGeneration to ensure they are replaced with the new generation.SQLGeneration.
55-55: No issues with the renamed SQLSummary.
Inheriting from generation.SQLSummary is consistent with the new import patterns.
wren-ai-service/tests/pytest/test_utils.py (1)
77-77: user_id parameter is set to None.
This ensures consistency with the broader removal of user_id from various request classes. Verify that any logic relying on user_id has been updated accordingly.
wren-ai-service/tests/pytest/pipelines/indexing/test_sql_pairs_preparation.py (7)
1-6: Imports are modular and clear.
Imports for settings, DocumentStoreProvider, and SqlPairsPreparation are well arranged and explicit.
9-19: Test function naming and structure are descriptive.
The test name test_sql_pairs_preparation_saving_to_document_store accurately conveys the test’s purpose. Setup steps (e.g., store creation) are straightforward.
20-29: SqlPairsPreparation instantiation is well-focused.
Parameters are clearly passed in. Validate that the pipeline components defined in settings.components["sql_pairs_preparation"] exist and are correct.
31-37: Document validations look correct.
All essential assertions (content, meta, sql_pair_id, sql) are present. This ensures robust coverage.
40-50: Second test function effectively covers multiple IDs scenario.
This test replicates the pipeline run with different project IDs, ensuring coverage for varied contexts.
51-69: Repeated run logic is consistent.
Running the pipeline multiple times is a valid approach to confirm isolation by id.
70-79: Filter-based validation is sound.
Filtering documents by project_id checks data partitioning logic and ensures correct categorization per ID.
wren-ai-service/tests/pytest/services/test_ask_details.py (2)
6-6: Broad import of generation is consistent with ongoing refactoring.
No concerns here; it aligns with other updated references to the generation module.
20-23: Creation of SQLBreakdown and SQLSummary is aligned with the new prefix.
Ensure references to the older module or class names (sql_breakdown / sql_summary) are removed or replaced throughout.
wren-ai-service/src/pipelines/retrieval/historical_question_retrieval.py (4)
13-13: [Import usage confirmation] ScoreFilter appears to be correctly imported and used.
It’s good to see ScoreFilter continue to be used here. Ensure no references to any removed or renamed classes remain.
107-107: [Rename alignment] The new class name aligns well with its retrieval purpose.
Renaming HistoricalQuestion → HistoricalQuestionRetrieval clarifies the pipeline’s intent. Confirm that all references to the old name have been removed throughout the codebase.
132-132: [Log statement consistency] The logger statement is appropriate to the renamed class.
Good job updating the logger message to match the new HistoricalQuestionRetrieval pipeline name.
147-147: [Consistency check] The pipeline’s dry run reference is properly updated to the new class name.
This usage in dry_run_pipeline looks correct.
wren-ai-service/src/web/v1/services/sql_explanation.py (1)
55-55: [Naming practice] Changing the class name from SQLExplanationService → SqlExplanationService improves consistency.
Ensure references to the previous name are fully removed, and confirm adherence to project naming conventions.
wren-ai-service/src/web/v1/services/sql_answer.py (1)
10-10: [Trace metadata addition] The trace_metadata decorator could provide valuable debugging insight.
No apparent issues with the addition. Confirm that the relevant logs or tracing dashboards can consume this metadata effectively.
wren-ai-service/src/web/v1/services/sql_regeneration.py (1)
78-78: [Service rename] Transitioning from SQLRegenerationService to SqlRegenerationService is consistent with the style used elsewhere.
Verify that all imports or references reflect this updated name.
wren-ai-service/src/pipelines/generation/sql_generation.py (4)
85-85: Parameter rename looks consistent.
Changing samples to sql_samples clarifies usage. Ensure any external references to this parameter have also been updated to avoid runtime errors.
93-93: Good usage of the newly introduced parameter.
Forwarding sql_samples to the prompt builder is correctly aligned with the updated parameter name.
168-168: Addition of the sql_samples parameter aligns with the prompt signature.
The run() method signature now correctly matches the updated function parameters, improving readability.
178-178: Properly passing sql_samples.
Adding sql_samples to the inputs ensures consistent data flow through the pipeline.
wren-ai-service/src/web/v1/routers/sql_pairs_preparation.py (3)
68-88: POST /sql-pairs endpoint creation looks solid.
The endpoint’s asynchronous flow using BackgroundTasks is a neat solution for offloading long-running tasks. Make sure to handle potential exceptions within the background task to keep the status consistent.
91-111: DELETE /sql-pairs endpoint is well-structured.
The logic for generating a unique query ID and tracking deletion status is clearly implemented. Confirm that any exceptions in the deletion process also appropriately update the status.
114-123: GET /sql-pairs/{sql_pairs_preparation_id} endpoint.
Retrieving the current status based on the unique ID completes a full asynchronous workflow. Looks good for monitoring and handling user queries about the process.
wren-ai-service/src/web/v1/services/chart.py (1)
9-9: New import for tracing metadata.
The introduction of trace_metadata aligns with the simplified removal of @async_timer. Ensuring each request is traceable without extra timers makes the code cleaner.
wren-ai-service/tests/pytest/services/test_sql_pairs_preparation.py (3)
18-32: Fixture for sql_pairs_preparation_service.
The fixture sets up the environment and document store for SQL pairs. This is a clean approach to ensure each test starts with the correct, isolated context.
49-93: Test for SQL pairs preparation.
The loop that checks status until "finished" or "failed" ensures asynchronous correctness. Great job verifying the final state of the document store after preparation.
95-164: Test for SQL pairs deletion.
Reusing the same query_id for deletion is a nice approach that demonstrates a real-world flow (prepare → delete). Checking the empty document store confirms the logic is complete.
wren-ai-service/src/web/v1/services/ask_details.py (2)
10-10: Ensure import consistency and usage clarity.
The added import of trace_metadata appears consistent with the usage in the ask_details method. No issues found.
Line range hint 25-36: Confirm removal of user_id from AskDetailsRequest is fully handled.
We see that user_id is no longer part of the class. Please verify that all references to user_id have been removed or refactored accordingly elsewhere, so that downstream functionality is not impacted.
wren-ai-service/eval/evaluation.py (1)
121-122: Validate fallback path when keys are absent.
Switching to meta.get("session_id") and meta.get("user_id") prevents potential KeyError. Ensure that if these values are absent and become None, downstream logic can safely handle it or provide a suitable default.
wren-ai-service/tests/pytest/services/test_ask.py (2)
37-52: Refactored pipeline references look good.
Using generation and retrieval modules consistently improves readability and aligns naming across components. Good job!
163-163: Renamed mock aligns with real service naming.
Updating historical_question to historical_question_retrieval maintains consistency. Looks good!
wren-ai-service/tests/pytest/test_main.py (1)
26-26: Test name correction is fine—ensure references are updated if needed.
Renaming from test_semantics_preparations to test_semantics_preparation clarifies the intent. Verify that other references, if any, have been updated consistently.
wren-ai-service/src/web/v1/services/sql_expansion.py (1)
9-9: Removal of async_timer looks consistent.
No issues spotted with removing the @async_timer. The remaining @observe and @trace_metadata decorators ensure the method still has observability and trace functionality. This change also aligns with other timer removals throughout the PR.
wren-ai-service/eval/pipelines.py (2)
145-146: Graceful fallback on missing session and user data.
Switching to .get("session_id") and .get("user_id") mitigates potential KeyError exceptions when these fields are missing. This improves resilience.
162-163: Consistent session and user ID handling.
Similar improvement here—replacing direct dictionary key access with .get(...) ensures safer metadata retrieval.
wren-ai-service/src/web/v1/services/ask.py (4)
10-10: Trace metadata import usage is correct.
No further action needed; consistent with how other services incorporate tracing.
245-247: Renamed pipeline key from historical_question to historical_question_retrieval.
This aligns with the updated naming convention in configuration files. No functional issues apparent.
286-286: sql_samples parameter for followup_sql_generation.
Accepting sql_samples adds flexibility, but confirm that both the retrieval and generation pipeline handle the relevant data shape and potential empty arrays.
297-297: sql_samples parameter for sql_generation.
The same suggestion applies here—validate or test thoroughly to ensure the new parameter is consistently consumed and does not introduce unexpected behaviors.
wren-ai-service/tests/data/config.test.yaml (1)
45-45: Consistent pipeline rename in config.
Renaming historical_question to historical_question_retrieval keeps the configuration aligned with code changes. This helps avoid runtime mismatches.
docker/config.example.yaml (1)
107-117: LGTM: SQL pairs pipeline configuration.
The new SQL pairs pipelines are properly configured with the required components (document store, embedder, and LLM where needed).
🧰 Tools
🪛 yamllint (1.35.1)
[error] 113-113: trailing spaces
(trailing-spaces)
wren-ai-service/src/pipelines/indexing/__init__.py (1)
119-125: LGTM: Export list updated correctly.
The __all__ list is properly updated to include the new SQL pairs components.
wren-ai-service/tools/config/config.example.yaml (1)
125-135: LGTM: SQL pairs pipeline configuration.
The new SQL pairs pipelines are properly configured with the required components (document store, embedder, and LLM where needed).
🧰 Tools
🪛 yamllint (1.35.1)
[error] 131-131: trailing spaces
(trailing-spaces)
wren-ai-service/tools/config/config.full.yaml (1)
144-147: Consider renaming for consistency with existing patterns
The pipeline name sql_pairs_preparation could be renamed to sql_pairs_indexing to maintain consistency with other indexing operations like db_schema_indexing and historical_question_indexing.
wren-ai-service/src/pipelines/generation/chart_adjustment.py (1)
175-175: LGTM: Clean simplification of post-processor invocation
The change to use direct class invocation instead of the run method simplifies the code while maintaining the same functionality.
There was a problem hiding this comment.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
wren-ai-service/src/web/v1/services/ask.py (1)
Pipeline name inconsistency detected - both old and new names are present
The codebase currently contains references to both pipeline names, which needs to be addressed:
Old name "historical_question" is still used in:
src/globals.pysrc/web/v1/services/ask.pysrc/web/v1/services/semantics_preparation.pysrc/pipelines/indexing/historical_question.py- Related test files
New name "historical_question_retrieval" is used in:
src/globals.py- Configuration files (config.example.yaml, config.full.yaml, etc.)
src/pipelines/retrieval/historical_question_retrieval.py- Related test files
This indicates an incomplete rename that could cause runtime issues. The pipeline name should be consistent across all files.
🔗 Analysis chain
Line range hint
133-141: Verify pipeline name change across configurations.The pipeline name change from "historical_question" to "historical_question_retrieval" improves clarity. Let's verify this change is consistent across the codebase.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for any remaining references to the old pipeline name # and verify the new pipeline name is properly configured. echo "Checking for old pipeline name references..." rg "historical_question[^_]" --type yaml --type python echo "Verifying new pipeline name configuration..." rg "historical_question_retrieval" --type yaml --type pythonLength of output: 377
Script:
#!/bin/bash # Description: Check for any remaining references to the old pipeline name # and verify the new pipeline name is properly configured. echo "Checking for old pipeline name references..." rg "historical_question[^_]" echo "Verifying new pipeline name configuration..." rg "historical_question_retrieval"Length of output: 2532
🧹 Nitpick comments (1)
wren-ai-service/src/web/v1/services/ask.py (1)
Line range hint
272-299: Consider validating SQL samples before usage.The SQL samples retrieved from
sql_pairs_retrievalare passed directly to the generation pipelines. Consider adding validation to ensure the samples are well-formed and relevant before usage.Example validation:
sql_samples = ( await self._pipelines["sql_pairs_retrieval"].run( query=ask_request.query, id=ask_request.project_id, ) )["formatted_output"].get("documents", []) + +# Validate SQL samples before usage +valid_samples = [ + sample for sample in sql_samples + if sample and isinstance(sample, dict) and "sql" in sample +] +if not valid_samples: + logger.warning("No valid SQL samples retrieved for query: %s", ask_request.query)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
wren-ai-service/src/pipelines/generation/intent_classification.py(0 hunks)wren-ai-service/src/web/v1/services/ask.py(4 hunks)wren-ai-service/tests/pytest/services/mocks.py(4 hunks)
💤 Files with no reviewable changes (1)
- wren-ai-service/src/pipelines/generation/intent_classification.py
🚧 Files skipped from review as they are similar to previous changes (1)
- wren-ai-service/tests/pytest/services/mocks.py
🔇 Additional comments (2)
wren-ai-service/src/web/v1/services/ask.py (2)
Line range hint 1-67: LGTM! Clean removal of unused components.
The removal of the async_timer import and user_id field aligns with the PR objectives. The class structure remains clean and well-typed.
272-277: LGTM! Clean integration of SQL pairs retrieval.
The new SQL pairs retrieval pipeline is well-integrated into the existing flow.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
wren-ai-service/src/pipelines/indexing/sql_pairs_preparation.py (1)
149-155: 🛠️ Refactor suggestion**** Disable partial writes if a portion fails
This is similar to a past review comment on partial writes. Consider a transaction or rollback mechanism withinwrite_sql_pairsto ensure atomicity.
🧹 Nitpick comments (5)
wren-ai-service/src/pipelines/indexing/sql_pairs_preparation.py (2)
51-78: Ensure JSON parse resilience when constructing Documents
Although the current logic appears straightforward, consider robust error handling in casesql_pairlacks the"sql"or"intention"keys, or if future changes alter their existence. Returning partial results or raising descriptive exceptions can help diagnose data issues.
136-147: Consider transaction-like approach to prevent partial deletions
If a portion of thesql_pairs_cleaner.runoperation fails, you currently still proceed without rollback or follow-up. Consider implementing an all-or-nothing approach or logging partial failures for clarity.wren-ai-service/tools/config/config.example.yaml (1)
125-135: Remove trailing spaces and ensure pipeline naming consistency
There's trailing whitespace on line 131 flagged by static analysis. Also, confirm that the pipeline names (sql_pairs_preparation, etc.) are consistent with the rest of your service to prevent confusion.- embedder: openai_embedder.text-embedding-3-large + embedder: openai_embedder.text-embedding-3-large🧰 Tools
🪛 yamllint (1.35.1)
[error] 131-131: trailing spaces
(trailing-spaces)
wren-ai-service/tools/config/config.full.yaml (1)
144-158: Tidy up trailing whitespace and ensure naming alignment
Line 150 shows trailing spaces. Remove them to keep the config unambiguous. Also verify that all newly introduced pipeline names match any references in code or docs.- embedder: openai_embedder.text-embedding-3-large + embedder: openai_embedder.text-embedding-3-large🧰 Tools
🪛 yamllint (1.35.1)
[error] 150-150: trailing spaces
(trailing-spaces)
deployment/kustomizations/base/cm.yaml (1)
155-165: Clean up trailing spaces to avoid config parse issues
Static analysis flags trailing spaces on line 161. While minor, such whitespace can cause inconsistent YAML parsing in certain tooling. Remove them to maintain a tidy config.- embedder: openai_embedder.text-embedding-3-large + embedder: openai_embedder.text-embedding-3-large🧰 Tools
🪛 yamllint (1.35.1)
[error] 161-161: trailing spaces
(trailing-spaces)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
deployment/kustomizations/base/cm.yaml(1 hunks)wren-ai-service/src/pipelines/indexing/sql_pairs_preparation.py(1 hunks)wren-ai-service/tools/config/config.example.yaml(1 hunks)wren-ai-service/tools/config/config.full.yaml(1 hunks)
🧰 Additional context used
🪛 yamllint (1.35.1)
deployment/kustomizations/base/cm.yaml
[error] 161-161: trailing spaces
(trailing-spaces)
wren-ai-service/tools/config/config.example.yaml
[error] 131-131: trailing spaces
(trailing-spaces)
wren-ai-service/tools/config/config.full.yaml
[error] 150-150: trailing spaces
(trailing-spaces)
🔇 Additional comments (1)
wren-ai-service/src/pipelines/indexing/sql_pairs_preparation.py (1)
173-217: Pipeline design looks solid; consider error handling for each step
The pipeline orchestrates multiple async tasks. For better reliability, consider adding per-step error handling or a strategy pattern for handling exceptions. This way, subsequent steps can either gracefully skip or fail early with descriptive logs.
| @observe() | ||
| def post_process( | ||
| generate_sql_intention: List[dict], | ||
| sql_pairs: List[SqlPair], | ||
| ) -> List[Dict[str, Any]]: | ||
| intentions = [ | ||
| orjson.loads(result["replies"][0])["intention"] | ||
| for result in generate_sql_intention | ||
| ] | ||
|
|
||
| return [ | ||
| {"id": sql_pair.id, "sql": sql_pair.sql, "intention": intention} | ||
| for sql_pair, intention in zip(sql_pairs, intentions) | ||
| ] | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Validate orjson decoding to handle malformed or missing data
Decoding replies[0] without checks can raise errors if replies is empty or the structure isn't as expected. Safeguard the code by verifying replies length and using try-except around orjson.loads to improve resilience and clarify any data mismatch issues.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
wren-ai-service/src/pipelines/common.py (2)
1-1: Consider maximizing forward-compatible type hints.
Since Python 3.9 supports built-in type hints likelistanddict, gradually migrating away fromtyping.Listandtyping.Dictcan simplify annotations in the future. However, this is a matter of preference and codebase consistency, so it's merely a suggestion.
49-50: Minor formatting observation.
Multiple consecutive blank lines may be unnecessary. Consider reducing them to one blank line for a cleaner code flow. PEP8 typically suggests a single blank line to separate logical sections in a file.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
wren-ai-service/src/pipelines/common.py(2 hunks)wren-ai-service/src/pipelines/retrieval/historical_question.py(0 hunks)
💤 Files with no reviewable changes (1)
- wren-ai-service/src/pipelines/retrieval/historical_question.py
🔇 Additional comments (1)
wren-ai-service/src/pipelines/common.py (1)
51-63: Filter and sorting logic looks good, but consider edge cases.
The run method effectively filters out documents below the score threshold and sorts them in descending order by score. This is correct for typical usage. However, consider the following:
- If any documents do not have a
.scoreattribute, the code may raise anAttributeError. - If the user wants to include some margin beyond equality (e.g.,
document.score > score), a parameter or clarification might help.
Otherwise, the functionality is well organized and easily testable.
Would you like me to write a quick test snippet verifying that documents are filtered and sorted as expected?
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
wren-ai-service/src/globals.py (1)
Line range hint
1-231: Consider grouping related services into sub-containers.As the number of services grows, consider organizing related services (e.g., SQL-related services) into logical sub-containers to improve maintainability and reduce the size of the ServiceContainer class. This could help with:
- Better organization of related functionality
- Easier dependency management
- Improved testability
- Clearer service boundaries
Example structure:
@dataclass class SQLServices: explanation_service: SqlExplanationService regeneration_service: SqlRegenerationService pairs_preparation_service: SqlPairsPreparationService # ... other SQL-related services @dataclass class ServiceContainer: sql_services: SQLServices # ... other service groups
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
wren-ai-service/src/globals.py(8 hunks)
🔇 Additional comments (4)
wren-ai-service/src/globals.py (4)
9-9: LGTM! Clean import organization.
The consolidation of pipeline imports and addition of SQL-related service imports improve code organization and maintainability.
Also applies to: 20-22
39-41: LGTM! Well-structured service container updates.
The addition of SQL-related services maintains consistent naming conventions and properly integrates the new SQL pairs functionality described in the PR objectives.
96-101: LGTM! Proper integration of SQL pairs retrieval.
The SQL pairs retrieval pipeline is well-integrated and properly configured with the necessary components.
221-231: Verify SQL pairs preparation configuration.
The SQL pairs preparation service is properly structured with both preparation and deletion pipelines. However, we should verify that the corresponding pipeline components are properly configured in the settings.
✅ Verification successful
SQL pairs preparation configuration is properly configured
The verification shows that the SQL pairs preparation service is correctly configured with all necessary components:
- Configuration files contain proper settings for both
sql_pairs_preparationandsql_pairs_deletionpipelines with required components:- Document store (qdrant)
- Embedder (openai_embedder.text-embedding-3-large)
- LLM (gpt-4o-mini-2024-07-18)
- Implementation classes (
SqlPairsPreparationandSqlPairsDeletion) are properly defined with necessary dependencies and components - Service structure matches the configuration with proper initialization of document stores and required providers
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify SQL pairs pipeline configuration in settings
# Check for required pipeline components in configuration files
rg -l "sql_pairs_preparation|sql_pairs_deletion" --type yaml
# Check for any TODO or FIXME comments related to SQL pairs configuration
rg -i "todo|fixme|xxx|hack|bug" -C 3 "sql_pairs"
Length of output: 322
Script:
#!/bin/bash
# Let's examine the configuration files to verify the SQL pairs pipeline components
# Check the content of example configuration files
echo "=== Content from docker/config.example.yaml ==="
cat docker/config.example.yaml | grep -A 10 "sql_pairs"
echo -e "\n=== Content from wren-ai-service/tools/config/config.example.yaml ==="
cat wren-ai-service/tools/config/config.example.yaml | grep -A 10 "sql_pairs"
# Let's also check the actual implementation of these pipeline components
echo -e "\n=== SQL Pairs Implementation ==="
rg -A 10 "class SqlPairsPreparation|class SqlPairsDeletion"
Length of output: 7894
async_timer,timerhistorical_questiontohistorical_question_retrievaluser_id, since we don't use it in apisSummary by CodeRabbit
New Features
sql_pairs_preparation,sql_pairs_deletion, andsql_pairs_retrieval.Bug Fixes
Documentation
Chores