Skip to content

chore(wren-ai-service): add sql2question#1112

Merged
paopa merged 3 commits into
mainfrom
chore/ai-service/sql-to-question
Jan 15, 2025
Merged

chore(wren-ai-service): add sql2question#1112
paopa merged 3 commits into
mainfrom
chore/ai-service/sql-to-question

Conversation

@cyyeh
Copy link
Copy Markdown
Member

@cyyeh cyyeh commented Jan 14, 2025

Summary by CodeRabbit

  • New Features

    • Added SQL question generation functionality to the AI service.
    • Introduced new endpoints for generating and retrieving SQL questions.
    • Implemented a new pipeline step using the GPT-4o mini model for SQL question generation.
  • Improvements

    • Enhanced service architecture to support asynchronous SQL question processing.
    • Added caching mechanism for tracking SQL question generation status.

@cyyeh cyyeh added module/ai-service ai-service related ci/ai-service ai-service related labels Jan 14, 2025
@cyyeh cyyeh requested a review from paopa January 14, 2025 07:33
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 14, 2025

Walkthrough

This pull request introduces a new SQL question generation feature within the Wren AI service. It adds a pipeline step for generating SQL questions using the litellm_llm.gpt-4o-mini-2024-07-18 model across multiple files, including service definitions, configuration files, and web routing. The implementation includes a new service for processing SQL questions, endpoints for handling requests, and updates to existing configurations, all while maintaining the integrity of the existing system.

Changes

File Change Summary
deployment/kustomizations/base/cm.yaml Added new sql_question_generation pipeline step to ConfigMap
docker/config.example.yaml Added new sql_question_generation pipeline configuration
wren-ai-service/src/globals.py Added SqlQuestionService to ServiceContainer
wren-ai-service/src/pipelines/generation/__init__.py Imported and exposed SQLQuestion
wren-ai-service/src/pipelines/generation/sql_question.py Created new module with SQL question generation pipeline
wren-ai-service/src/web/v1/routers/__init__.py Integrated SQL question router
wren-ai-service/src/web/v1/routers/sql_question.py Added endpoints for SQL question generation and result retrieval
wren-ai-service/src/web/v1/services/sql_question.py Implemented SqlQuestionService with request/response models
wren-ai-service/tools/config/config.example.yaml Added SQL question generation pipeline to example configs
wren-ai-service/tools/config/config.full.yaml Added SQL question generation pipeline to full configs

Possibly related PRs

🐇 In a world of queries and SQL delight,
A new step has hopped into the light.
With questions to generate, oh what a sight,
Our service now shines, so clever and bright!
From prompts to results, we’ll take flight,
Hopping through data, our future is bright! 🌟


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 49f51a0 and a829311.

📒 Files selected for processing (5)
  • deployment/kustomizations/base/cm.yaml (1 hunks)
  • docker/config.example.yaml (1 hunks)
  • wren-ai-service/src/globals.py (3 hunks)
  • wren-ai-service/tools/config/config.example.yaml (1 hunks)
  • wren-ai-service/tools/config/config.full.yaml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • wren-ai-service/tools/config/config.full.yaml
  • wren-ai-service/tools/config/config.example.yaml
  • docker/config.example.yaml
  • deployment/kustomizations/base/cm.yaml
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: pytest
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (3)
wren-ai-service/src/globals.py (3)

22-22: LGTM! Import follows project conventions

The SqlQuestionService import is correctly placed with other service imports and follows the established pattern.


43-43: LGTM! ServiceContainer attribute properly declared

The sql_question_service attribute is correctly declared with proper type annotation, following the established pattern.


238-245: Verify sql_question_generation pipeline component configuration

While the service initialization follows the established pattern, ensure that the required "sql_question_generation" component is properly configured in the deployment configuration files.

Run this script to verify the configuration:

Consider adding validation for the required pipeline component to prevent runtime errors:

if "sql_question_generation" not in pipe_components:
    raise ValueError("Missing required sql_question_generation pipeline component")
✅ Verification successful

SQL Question Generation pipeline is properly configured

The component is correctly configured in all deployment configurations and the implementation matches the service initialization requirements.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify sql_question_generation configuration exists in config files

# Check deployment config files for sql_question_generation configuration
echo "Checking deployment configurations..."
rg -l "sql_question_generation" deployment/
rg -l "sql_question_generation" docker/
rg -l "sql_question_generation" wren-ai-service/tools/config/

# Check if the component is properly imported and available
echo "Checking component availability..."
rg -l "class SQLQuestion" wren-ai-service/src/pipelines/generation/

Length of output: 597


Script:

#!/bin/bash
# Check the configuration structure in deployment and example files
echo "Checking deployment configuration structure..."
cat deployment/kustomizations/base/cm.yaml | grep -A 10 "sql_question_generation"

echo -e "\nChecking example configuration structure..."
cat docker/config.example.yaml | grep -A 10 "sql_question_generation"

echo -e "\nChecking SQLQuestion class requirements..."
cat wren-ai-service/src/pipelines/generation/sql_question.py | grep -A 10 "class SQLQuestion"

Length of output: 1844

Finishing Touches

  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (3)
wren-ai-service/src/pipelines/generation/sql_question.py (1)

84-86: Add docstring to the SQLQuestionResult class for clarity.

Adding a docstring to the SQLQuestionResult class will improve code readability and provide clear documentation of its purpose and attributes.

wren-ai-service/src/web/v1/services/sql_question.py (1)

126-128: Use logger.error instead of logger.exception when no exception is raised.

In the get_sql_question_result method, logger.exception is used without an active exception, which may lead to misleading logs. Use logger.error to log the error message without including traceback information.

Apply this diff to adjust the logging:

-logger.exception(
+logger.error(
     f"sql question pipeline - OTHERS: {sql_question_result_request.query_id} is not found"
 )
deployment/kustomizations/base/cm.yaml (1)

Line range hint 125-127: Consider documenting the sql_question_generation pipeline.

The new pipeline has been consistently added across all configuration files. However, to improve maintainability and onboarding:

  1. Consider adding documentation about:
    • The purpose and functionality of the sql_question_generation pipeline
    • Its relationship with other SQL-related pipelines
    • Any specific requirements or limitations
  2. Update the architecture documentation to reflect this new capability

This will help future maintainers understand the system's enhanced capabilities and integration points.

Would you like me to help create a documentation template for this new pipeline?

Also applies to: 139-140, 158-159, 173-174

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 93d3e00 and 799f134.

📒 Files selected for processing (11)
  • deployment/kustomizations/base/cm.yaml (1 hunks)
  • docker/config.example.yaml (1 hunks)
  • wren-ai-service/src/globals.py (3 hunks)
  • wren-ai-service/src/pipelines/generation/__init__.py (2 hunks)
  • wren-ai-service/src/pipelines/generation/sql_expansion.py (1 hunks)
  • wren-ai-service/src/pipelines/generation/sql_question.py (1 hunks)
  • wren-ai-service/src/web/v1/routers/__init__.py (2 hunks)
  • wren-ai-service/src/web/v1/routers/sql_question.py (1 hunks)
  • wren-ai-service/src/web/v1/services/sql_question.py (1 hunks)
  • wren-ai-service/tools/config/config.example.yaml (1 hunks)
  • wren-ai-service/tools/config/config.full.yaml (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: pytest
  • GitHub Check: pytest
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (13)
wren-ai-service/src/pipelines/generation/sql_question.py (4)

49-61: Function prompts is well-implemented.

The prompts function correctly builds prompts for each SQL query using the provided PromptBuilder. It effectively prepares the inputs for the generation step.


64-68: Efficient asynchronous generation in generate_sql_questions.

The use of asyncio.gather in generate_sql_questions allows concurrent processing of prompts, enhancing performance and scalability.


100-116: SQLQuestion class initialization is appropriately set up.

The __init__ method properly initializes the generator and prompt builder components, ensuring that the pipeline is correctly configured with the necessary templates and model arguments.


118-131: run method correctly orchestrates pipeline execution.

The run method effectively executes the pipeline steps and passes the necessary inputs, making use of the default language configuration when none is provided.

wren-ai-service/src/web/v1/routers/__init__.py (1)

16-16: Integration of sql_question router is correctly implemented.

The sql_question router is properly imported and included in the main API router configuration, enabling the new SQL question endpoints.

Also applies to: 34-34

wren-ai-service/src/pipelines/generation/__init__.py (1)

15-15: LGTM! Clean module organization.

The addition of SQLQuestion to both imports and __all__ follows good Python module organization practices.

Also applies to: 36-36

wren-ai-service/src/web/v1/routers/sql_question.py (1)

21-33: Great documentation!

The router's documentation clearly describes the endpoints and their functionality.

wren-ai-service/src/pipelines/generation/sql_expansion.py (1)

161-161: LGTM! Consistent configuration management.

The change from timezone parameter to Configuration object aligns with the service's configuration management approach.

wren-ai-service/src/globals.py (1)

22-22: LGTM! Consistent service integration.

The SqlQuestionService integration follows the established patterns in the codebase:

  • Consistent service declaration
  • Proper initialization with pipelines and cache configuration
  • Follows the same structure as other services

Also applies to: 43-43, 238-245

docker/config.example.yaml (1)

125-127: Verify engine dependency requirement for sql_question_generation pipeline.

Unlike other SQL-related pipelines (e.g., sql_generation, sql_correction), this pipeline doesn't specify an engine dependency. Please verify if this is intentional or if it requires access to the SQL engine for question generation.

✅ Verification successful

Missing engine dependency is consistent with the codebase pattern

The configuration is correct as is. Looking at other SQL-related pipelines in the codebase, only those that execute or manipulate SQL queries (like sql_generation, sql_correction) require the engine dependency. Pipelines focused on natural language processing (like sql_summary, sql_explanation) don't specify an engine, which is consistent with this new sql_question_generation pipeline's purpose.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if other SQL question-related pipelines in the codebase use engine dependencies
rg -A 2 "name: .*question.*" | grep -B 2 "engine:"

Length of output: 50


Script:

#!/bin/bash
# Check all SQL-related pipeline configurations
echo "=== SQL Pipeline Configurations ==="
rg -A 3 "name: .*sql.*" docker/config.example.yaml

# Look for sql_question_generation implementation
echo -e "\n=== SQL Question Generation Implementation ==="
ast-grep --pattern 'class $_ {
  $$$
  sql_question_generation($$$) {
    $$$
  }
  $$$
}'

Length of output: 1781

wren-ai-service/tools/config/config.example.yaml (1)

139-140: LGTM! Configuration is consistent with docker environment.

The pipeline configuration matches the docker environment setup, maintaining consistency across different deployment scenarios.

wren-ai-service/tools/config/config.full.yaml (1)

158-159: LGTM! Production configuration is properly aligned.

The pipeline configuration is consistent with other environments and correctly configured for production use (development: false).

deployment/kustomizations/base/cm.yaml (1)

173-174: Verify service version update requirement.

The pipeline configuration is consistent with other environments. However, please verify if the addition of this new feature requires an update to the WREN_AI_SERVICE_VERSION (currently at "0.12.1").

Consider updating the service version to reflect this new feature addition, following semantic versioning principles:

  • Minor version bump if this adds functionality in a backward-compatible manner
  • Patch version bump if this is just an internal enhancement

Comment thread wren-ai-service/src/pipelines/generation/sql_question.py
Comment thread wren-ai-service/src/web/v1/services/sql_question.py
Comment thread wren-ai-service/src/web/v1/services/sql_question.py
Comment thread wren-ai-service/src/web/v1/services/sql_question.py
Comment thread wren-ai-service/src/web/v1/services/sql_question.py
Comment thread wren-ai-service/src/web/v1/routers/sql_question.py
Comment thread wren-ai-service/src/web/v1/routers/sql_question.py
coderabbitai Bot added a commit that referenced this pull request Jan 14, 2025
Docstrings generation was requested by @cyyeh.

* #1112 (comment)

The following files were modified:

* `wren-ai-service/src/globals.py`
* `wren-ai-service/src/pipelines/generation/sql_question.py`
* `wren-ai-service/src/web/v1/routers/sql_question.py`
* `wren-ai-service/src/web/v1/services/sql_question.py`
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 14, 2025

Note

We have generated docstrings for this pull request, at #1113

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
wren-ai-service/src/web/v1/services/sql_question.py (1)

16-29: 🛠️ Refactor suggestion

Simplify SqlQuestionRequest model using direct field definition.

The current implementation using property decorators for query_id adds unnecessary complexity. Leverage Pydantic's built-in field validation instead.

 class SqlQuestionRequest(BaseModel):
-    _query_id: str | None = None
+    query_id: Optional[str] = None
     sqls: list[str]
     project_id: Optional[str] = None
     configurations: Optional[Configuration] = Configuration()
-
-    @property
-    def query_id(self) -> str:
-        return self._query_id
-
-    @query_id.setter
-    def query_id(self, query_id: str):
-        self._query_id = query_id
🧹 Nitpick comments (1)
wren-ai-service/src/web/v1/services/sql_question.py (1)

99-99: Optimize exception logging.

Use logger's string interpolation instead of f-strings for better performance in production.

-logger.exception(f"sql question pipeline - OTHERS: {e}")
+logger.exception("sql question pipeline - OTHERS: %s", e)

-logger.exception(
-    f"sql question pipeline - OTHERS: {sql_question_result_request.query_id} is not found"
-)
+logger.exception(
+    "sql question pipeline - OTHERS: %s is not found",
+    sql_question_result_request.query_id
+)

Also applies to: 124-126

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 799f134 and 49f51a0.

📒 Files selected for processing (1)
  • wren-ai-service/src/web/v1/services/sql_question.py (1 hunks)
🧰 Additional context used
📓 Learnings (1)
wren-ai-service/src/web/v1/services/sql_question.py (1)
Learnt from: cyyeh
PR: Canner/WrenAI#1112
File: wren-ai-service/src/web/v1/services/sql_question.py:101-109
Timestamp: 2025-01-14T07:45:56.117Z
Learning: Warning about accessing `query_id` during exception handling in SQL question service can be skipped as per team's decision. This applies to the pattern where `sql_question_request.query_id` is accessed within the exception handling block.
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: pytest
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (5)
wren-ai-service/src/web/v1/services/sql_question.py (5)

1-12: LGTM! Well-organized imports.

The imports are properly organized and all imported modules are utilized in the implementation.


31-48: LGTM! Well-structured response models.

The models are well-designed with:

  • Proper type constraints using Literal types
  • Clear error handling structure
  • Appropriate use of Optional fields

69-75: 🛠️ Refactor suggestion

Align results dictionary structure with error response model.

The results dictionary structure should match the error model structure for consistency.

 results = {
     "sql_question_result": {},
     "metadata": {
-        "error_type": "",
-        "error_message": "",
+        "error": {
+            "type": "",
+            "message": "",
+        }
     },
 }

Likely invalid or redundant comment.


78-82: ⚠️ Potential issue

Validate query_id before using as cache key.

The query_id should be validated before using it as a cache key to prevent potential None values.

 query_id = sql_question_request.query_id
+if not query_id:
+    raise ValueError("query_id is required")
 
 self._sql_question_results[query_id] = SqlQuestionResultResponse(
     status="generating",
 )

Likely invalid or redundant comment.


84-90: Add error handling for missing post_process key.

The code assumes the "post_process" key exists in the pipeline result. Add proper error handling for this case.

 sql_questions_result = (
     await self._pipelines["sql_question_generation"].run(
         sqls=sql_question_request.sqls,
         configuration=sql_question_request.configurations,
-    )["post_process"]
+    )
+if "post_process" not in sql_questions_result:
+    raise KeyError("Pipeline result missing 'post_process' key")
+sql_questions_result = sql_questions_result["post_process"]

Copy link
Copy Markdown
Contributor

@paopa paopa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/ai-service ai-service related module/ai-service ai-service related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants