chore(wren-ai-service): add sql2question#1112
Conversation
WalkthroughThis pull request introduces a new SQL question generation feature within the Wren AI service. It adds a pipeline step for generating SQL questions using the Changes
Possibly related PRs
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (4)
⏰ Context from checks skipped due to timeout of 90000ms (4)
🔇 Additional comments (3)
Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (3)
wren-ai-service/src/pipelines/generation/sql_question.py (1)
84-86: Add docstring to theSQLQuestionResultclass for clarity.Adding a docstring to the
SQLQuestionResultclass will improve code readability and provide clear documentation of its purpose and attributes.wren-ai-service/src/web/v1/services/sql_question.py (1)
126-128: Uselogger.errorinstead oflogger.exceptionwhen no exception is raised.In the
get_sql_question_resultmethod,logger.exceptionis used without an active exception, which may lead to misleading logs. Uselogger.errorto log the error message without including traceback information.Apply this diff to adjust the logging:
-logger.exception( +logger.error( f"sql question pipeline - OTHERS: {sql_question_result_request.query_id} is not found" )deployment/kustomizations/base/cm.yaml (1)
Line range hint
125-127: Consider documenting the sql_question_generation pipeline.The new pipeline has been consistently added across all configuration files. However, to improve maintainability and onboarding:
- Consider adding documentation about:
- The purpose and functionality of the sql_question_generation pipeline
- Its relationship with other SQL-related pipelines
- Any specific requirements or limitations
- Update the architecture documentation to reflect this new capability
This will help future maintainers understand the system's enhanced capabilities and integration points.
Would you like me to help create a documentation template for this new pipeline?
Also applies to: 139-140, 158-159, 173-174
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
deployment/kustomizations/base/cm.yaml(1 hunks)docker/config.example.yaml(1 hunks)wren-ai-service/src/globals.py(3 hunks)wren-ai-service/src/pipelines/generation/__init__.py(2 hunks)wren-ai-service/src/pipelines/generation/sql_expansion.py(1 hunks)wren-ai-service/src/pipelines/generation/sql_question.py(1 hunks)wren-ai-service/src/web/v1/routers/__init__.py(2 hunks)wren-ai-service/src/web/v1/routers/sql_question.py(1 hunks)wren-ai-service/src/web/v1/services/sql_question.py(1 hunks)wren-ai-service/tools/config/config.example.yaml(1 hunks)wren-ai-service/tools/config/config.full.yaml(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: pytest
- GitHub Check: pytest
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🔇 Additional comments (13)
wren-ai-service/src/pipelines/generation/sql_question.py (4)
49-61: Functionpromptsis well-implemented.The
promptsfunction correctly builds prompts for each SQL query using the providedPromptBuilder. It effectively prepares the inputs for the generation step.
64-68: Efficient asynchronous generation ingenerate_sql_questions.The use of
asyncio.gatheringenerate_sql_questionsallows concurrent processing of prompts, enhancing performance and scalability.
100-116:SQLQuestionclass initialization is appropriately set up.The
__init__method properly initializes the generator and prompt builder components, ensuring that the pipeline is correctly configured with the necessary templates and model arguments.
118-131:runmethod correctly orchestrates pipeline execution.The
runmethod effectively executes the pipeline steps and passes the necessary inputs, making use of the default language configuration when none is provided.wren-ai-service/src/web/v1/routers/__init__.py (1)
16-16: Integration ofsql_questionrouter is correctly implemented.The
sql_questionrouter is properly imported and included in the main API router configuration, enabling the new SQL question endpoints.Also applies to: 34-34
wren-ai-service/src/pipelines/generation/__init__.py (1)
15-15: LGTM! Clean module organization.The addition of
SQLQuestionto both imports and__all__follows good Python module organization practices.Also applies to: 36-36
wren-ai-service/src/web/v1/routers/sql_question.py (1)
21-33: Great documentation!The router's documentation clearly describes the endpoints and their functionality.
wren-ai-service/src/pipelines/generation/sql_expansion.py (1)
161-161: LGTM! Consistent configuration management.The change from timezone parameter to Configuration object aligns with the service's configuration management approach.
wren-ai-service/src/globals.py (1)
22-22: LGTM! Consistent service integration.The SqlQuestionService integration follows the established patterns in the codebase:
- Consistent service declaration
- Proper initialization with pipelines and cache configuration
- Follows the same structure as other services
Also applies to: 43-43, 238-245
docker/config.example.yaml (1)
125-127: Verify engine dependency requirement for sql_question_generation pipeline.Unlike other SQL-related pipelines (e.g., sql_generation, sql_correction), this pipeline doesn't specify an engine dependency. Please verify if this is intentional or if it requires access to the SQL engine for question generation.
✅ Verification successful
Missing engine dependency is consistent with the codebase pattern
The configuration is correct as is. Looking at other SQL-related pipelines in the codebase, only those that execute or manipulate SQL queries (like sql_generation, sql_correction) require the engine dependency. Pipelines focused on natural language processing (like sql_summary, sql_explanation) don't specify an engine, which is consistent with this new sql_question_generation pipeline's purpose.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check if other SQL question-related pipelines in the codebase use engine dependencies rg -A 2 "name: .*question.*" | grep -B 2 "engine:"Length of output: 50
Script:
#!/bin/bash # Check all SQL-related pipeline configurations echo "=== SQL Pipeline Configurations ===" rg -A 3 "name: .*sql.*" docker/config.example.yaml # Look for sql_question_generation implementation echo -e "\n=== SQL Question Generation Implementation ===" ast-grep --pattern 'class $_ { $$$ sql_question_generation($$$) { $$$ } $$$ }'Length of output: 1781
wren-ai-service/tools/config/config.example.yaml (1)
139-140: LGTM! Configuration is consistent with docker environment.The pipeline configuration matches the docker environment setup, maintaining consistency across different deployment scenarios.
wren-ai-service/tools/config/config.full.yaml (1)
158-159: LGTM! Production configuration is properly aligned.The pipeline configuration is consistent with other environments and correctly configured for production use (development: false).
deployment/kustomizations/base/cm.yaml (1)
173-174: Verify service version update requirement.The pipeline configuration is consistent with other environments. However, please verify if the addition of this new feature requires an update to the
WREN_AI_SERVICE_VERSION(currently at "0.12.1").Consider updating the service version to reflect this new feature addition, following semantic versioning principles:
- Minor version bump if this adds functionality in a backward-compatible manner
- Patch version bump if this is just an internal enhancement
Docstrings generation was requested by @cyyeh. * #1112 (comment) The following files were modified: * `wren-ai-service/src/globals.py` * `wren-ai-service/src/pipelines/generation/sql_question.py` * `wren-ai-service/src/web/v1/routers/sql_question.py` * `wren-ai-service/src/web/v1/services/sql_question.py`
|
Note We have generated docstrings for this pull request, at #1113 |
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
wren-ai-service/src/web/v1/services/sql_question.py (1)
16-29: 🛠️ Refactor suggestionSimplify SqlQuestionRequest model using direct field definition.
The current implementation using property decorators for
query_idadds unnecessary complexity. Leverage Pydantic's built-in field validation instead.class SqlQuestionRequest(BaseModel): - _query_id: str | None = None + query_id: Optional[str] = None sqls: list[str] project_id: Optional[str] = None configurations: Optional[Configuration] = Configuration() - - @property - def query_id(self) -> str: - return self._query_id - - @query_id.setter - def query_id(self, query_id: str): - self._query_id = query_id
🧹 Nitpick comments (1)
wren-ai-service/src/web/v1/services/sql_question.py (1)
99-99: Optimize exception logging.Use logger's string interpolation instead of f-strings for better performance in production.
-logger.exception(f"sql question pipeline - OTHERS: {e}") +logger.exception("sql question pipeline - OTHERS: %s", e) -logger.exception( - f"sql question pipeline - OTHERS: {sql_question_result_request.query_id} is not found" -) +logger.exception( + "sql question pipeline - OTHERS: %s is not found", + sql_question_result_request.query_id +)Also applies to: 124-126
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
wren-ai-service/src/web/v1/services/sql_question.py(1 hunks)
🧰 Additional context used
📓 Learnings (1)
wren-ai-service/src/web/v1/services/sql_question.py (1)
Learnt from: cyyeh
PR: Canner/WrenAI#1112
File: wren-ai-service/src/web/v1/services/sql_question.py:101-109
Timestamp: 2025-01-14T07:45:56.117Z
Learning: Warning about accessing `query_id` during exception handling in SQL question service can be skipped as per team's decision. This applies to the pattern where `sql_question_request.query_id` is accessed within the exception handling block.
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: pytest
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🔇 Additional comments (5)
wren-ai-service/src/web/v1/services/sql_question.py (5)
1-12: LGTM! Well-organized imports.The imports are properly organized and all imported modules are utilized in the implementation.
31-48: LGTM! Well-structured response models.The models are well-designed with:
- Proper type constraints using Literal types
- Clear error handling structure
- Appropriate use of Optional fields
69-75: 🛠️ Refactor suggestionAlign results dictionary structure with error response model.
The results dictionary structure should match the error model structure for consistency.
results = { "sql_question_result": {}, "metadata": { - "error_type": "", - "error_message": "", + "error": { + "type": "", + "message": "", + } }, }Likely invalid or redundant comment.
78-82:⚠️ Potential issueValidate query_id before using as cache key.
The
query_idshould be validated before using it as a cache key to prevent potentialNonevalues.query_id = sql_question_request.query_id +if not query_id: + raise ValueError("query_id is required") self._sql_question_results[query_id] = SqlQuestionResultResponse( status="generating", )Likely invalid or redundant comment.
84-90: Add error handling for missing post_process key.The code assumes the "post_process" key exists in the pipeline result. Add proper error handling for this case.
sql_questions_result = ( await self._pipelines["sql_question_generation"].run( sqls=sql_question_request.sqls, configuration=sql_question_request.configurations, - )["post_process"] + ) +if "post_process" not in sql_questions_result: + raise KeyError("Pipeline result missing 'post_process' key") +sql_questions_result = sql_questions_result["post_process"]
Summary by CodeRabbit
New Features
Improvements