chore(wren-ai-service): Add Bird dataset support and enhance configuration documentation by paopa · Pull Request #1427 · Canner/WrenAI

paopa · 2025-03-19T09:59:17Z

This PR updates the evaluation documentation with the following enhancements:

Dataset Support

Added support for the Bird dataset alongside Spider 1.0
Updated dataset preparation instructions to support multiple datasets
Added clear naming conventions for evaluation dataset files

Configuration Documentation

Added detailed instructions for configuring datasources
Included configuration examples for both Spider/Bird datasets and BigQuery
Added documentation for credential handling and environment setup

Evaluation Metrics

Added three new evaluation metrics:
- QuestionToReasoningJudge
- ReasoningToSqlJudge
- SqlSemanticsJudge

Summary by CodeRabbit

Documentation
- Restructured requirements and dataset preparation instructions for clarity.
- Clarified command usage for evaluation dataset setup, now requiring a dataset name as an argument.
- Added guidance for configuring data sources, including built-in support for Spider and Bird datasets and BigQuery integration.
- Introduced new evaluation metrics to enhance assessment of reasoning and SQL generation.

coderabbitai · 2025-03-19T09:59:24Z

Walkthrough

This pull request updates the README for the Wren AI service evaluation framework. The documentation now includes expanded dataset preparation instructions with support for both Spider and Bird datasets, requiring a dataset name argument. It details download locations, directory structures, and output file naming conventions. Additionally, a new section explains configuration for both built-in data sources and BigQuery integration for custom models, and new evaluation metrics focusing on reasoning and SQL generation alignment have been added.

Changes

File Path	Change Summary
`wren-ai-service/…/README.md`	- Updated dataset preparation instructions to support both Spider and Bird datasets, with a command requiring a dataset name. - Added details on download locations, directory structure, and output file naming convention. - Introduced a configuration section for built-in data sources and BigQuery integration including examples for `config.yaml` and `.env.dev`. - Documented new evaluation metrics.

Possibly related PRs

fix(wren-ai-service): add BigQuery configuration settings and improve logging #1258: Enhances the EvalSettings class with new BigQuery configuration fields, directly relating to the updated BigQuery configuration instructions.
chore(wren-ai-service): add bird eval dataset #1321: Introduces a parameterized approach for dataset preparation for the Bird dataset, aligning with the modifications made in this PR.

Suggested reviewers

cyyeh

A Bunny's Tale of Change
Hop along, my code friends, so divine,
With Spider and Bird in one neat line.
BigQuery hops in with credentials so bright,
Guiding data through the day and night.
I nibble on bytes and cheer with glee—
A change well made in our tech family!
🐇✨

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (8)

wren-ai-service/eval/README.md (8)

18-18: Clarify the Header for Dataset Preparation

The header “## Eval Dataset Preparation(If using Spider 1.0 dataset, or Bird dataset)” would benefit from clearer punctuation and spacing. For example, consider revising it to “## Evaluation Dataset Preparation (Spider 1.0 or Bird dataset)” for improved readability and consistency.

26-27: Ensure Consistent Formatting for Dataset Bullets

The bullets for spider1.0 and bird are well defined. However, note that a static analysis hint flagged a loose punctuation mark on these lines. Consider ensuring consistent punctuation (for example, uniformly ending each bullet with a period or none at all) and possibly clarifying that spider1.0 is the default if no dataset is specified.

🧰 Tools

🪛 LanguageTool

[uncategorized] ~26-~26: Loose punctuation mark.
Context: ... datasets for evaluation: - spider1.0: The Spider dataset (default if no datas...

(UNLIKELY_OPENING_PUNCTUATION)

31-35: Review Dataset Download Instructions

The instructions for downloading the dataset into the directory wren-ai-service/tools/dev/etc/<dataset-name> are clear. Please verify that the placeholder <dataset-name> is properly explained elsewhere so users understand that it will be replaced by the actual dataset identifier.

37-41: Validate the Evaluation Dataset Preparation Path

The step “2. Prepares and saves evaluation datasets to:” with the subsequent code block showing the directory wren-ai-service/eval/dataset is straightforward. Confirm that this directory exists or that the system creates it as needed.

63-64: Built-in Datasource Configuration Details

The explanation of how to use the db_path_for_duckdb in the config.yaml file is clear. It might be helpful to specify whether the example path applies to both Spider and Bird datasets or only to one of them.

70-70: Heading for BigQuery Configuration

The heading “### Configuring BigQuery as a Datasource for Other custom MDLs” is clear. For improved readability, consider standardizing the capitalization (for example, “Other Custom MDLs”).

74-74: Credential Encoding Heading

The subheading “#### Encoding the credentials” is concise. A brief mention of best practices for handling sensitive information might further strengthen this section.

94-100: .env.dev Example for BigQuery

The provided example for the .env.dev file mirrors the YAML configuration nicely. It may be beneficial to add a note that this file should not be committed to version control repositories.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6390938 and 18ba849.

📒 Files selected for processing (1)

wren-ai-service/eval/README.md (3 hunks)

🧰 Additional context used

🪛 LanguageTool

wren-ai-service/eval/README.md

[uncategorized] ~26-~26: Loose punctuation mark.
Context: ... datasets for evaluation: - spider1.0: The Spider dataset (default if no datas...

(UNLIKELY_OPENING_PUNCTUATION)

🔇 Additional comments (14)

wren-ai-service/eval/README.md (14)

20-22: Verify the CLI Command for Dataset Preparation

The updated CLI command just prep <dataset-name> clearly indicates that a dataset name must be supplied. Ensure that elsewhere in the documentation or implementation the supported values (e.g., spider1.0 and bird) are explicitly validated.

24-24: Check Introductory Text for Supported Datasets

The sentence “Currently, we support two datasets for evaluation:” is clear. You might consider adding a brief note on default behavior (e.g., which dataset is assumed if none is specified) directly here.

29-29: Clear Explanation of Command Steps

The statement “The command performs two main steps:” effectively introduces the subsequent instructions. This reads well – no further action needed.

43-48: Confirm Naming Conventions for Output Files

The output file naming conventions for the Spider and Bird datasets are explicitly stated. Ensure that downstream processes expect these precise file names and that any changes here are reflected in the system’s configuration.

57-59: Datasource Configuration Section

The new section “## Configure the datasource for prediction and evaluation” is well introduced. It might be worthwhile to confirm that the term “datasource” is used consistently throughout the documentation (or consider using “data source” if that is preferred by your style guidelines).

61-61: Section Heading for Built-in Datasource

The heading “### For Spider or Bird Datasets” clearly delineates the instructions for local (built-in) datasources.

67-69: YAML Example for DuckDB Path

The YAML snippet for setting db_path_for_duckdb is well formulated. Verify that the example path
"etc/bird/minidev/MINIDEV/dev_databases"
accurately reflects the expected directory structure, especially for users working with the Bird dataset.

72-72: BigQuery Datasource Setup Guidance

The instructions for configuring BigQuery access are comprehensive. Ensure that the audience understands any prerequisites (e.g., required permissions or specific project settings) that might be needed.

76-80: Credential Encoding Command

The command
cat <path/to/credentials.json> | base64
is a standard solution for base64 encoding credentials. Ensure that the placeholder <path/to/credentials.json> is clearly understood to be a user-specific file path.

82-82: Configuration in config.yaml Heading

The heading “#### Configuration in config.yaml” is clear and instructive.

84-84: BigQuery Configuration Instructions in YAML

The explanation for adding BigQuery parameters to config.yaml is clear. Consider noting if additional parameters or comments might be needed for more complex setups.

86-90: BigQuery YAML Example

The YAML snippet which details the keys bigquery_project_id, bigquery_dataset_id, and bigquery_credentials is clear. The inline comment explaining that the credentials must be base64 encoded is helpful.

92-92: .env.dev Configuration Heading

The heading “#### Configuration in .env.dev” is appropriately placed for users preferring to use environment variables.

157-159: New Evaluation Metrics Added

The addition of the metrics “QuestionToReasoningJudge”, “ReasoningToSqlJudge”, and “SqlSemanticsJudge” significantly enhances the evaluation framework. Confirm that these new metrics are consistently integrated within the evaluation pipeline and that detailed explanations of each metric are provided elsewhere in the documentation if needed.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

wren-ai-service/eval/README.md (4)
18-18: Refine Header Wording for Clarity
Consider revising the header “## Eval Dataset Preparation(If using Spider 1.0 dataset, or Bird dataset)” for better readability and consistency. For example, adding spaces and a clearer description would improve comprehension.
-## Eval Dataset Preparation(If using Spider 1.0 dataset, or Bird dataset)
+## Evaluation Dataset Preparation (for Spider 1.0 and Bird datasets)
24-28: Ensure Consistent Punctuation in the Dataset List
The bullet list outlining the supported datasets is clear. However, a minor punctuation adjustment may help maintain consistency. For example, consider adding a terminal punctuation mark:
- - `spider1.0`: The Spider dataset (default if no dataset specified)
+ - `spider1.0`: The Spider dataset (default if no dataset specified).
This small change addresses a loose punctuation remark flagged by static analysis.

🧰 Tools

🪛 LanguageTool

[uncategorized] ~26-~26: Loose punctuation mark.
Context: ... datasets for evaluation: - spider1.0: The Spider dataset (default if no datas...

(UNLIKELY_OPENING_PUNCTUATION)

57-102: Validate and Enhance Datasource Configuration Instructions
The new section on configuring the datasource for prediction and evaluation is comprehensive. A few points to consider:

The built-in datasource example shows a configuration using db_path_for_duckdb with a path that appears specific to the Bird dataset (e.g., "etc/bird/minidev/MINIDEV/dev_databases"). If there is a different configuration or example for the Spider dataset, adding it (or clarifying that the same configuration applies) would be helpful.

The BigQuery configuration steps, including credential encoding and the parameters in both config.yaml and .env.dev, are clearly laid out and improve guidance for users handling credentials securely.

Overall, these documentation updates align well with the PR objectives.

157-159: Document the New Evaluation Metrics Further
The addition of evaluation metrics—QuestionToReasoningJudge, ReasoningToSqlJudge, and SqlSemanticsJudge—enhances the framework’s ability to assess reasoning and SQL generation alignment. Consider expanding the documentation (in this file or an ancillary guide) with examples or further explanations of how each metric is calculated and used. This would significantly aid users in understanding and leveraging these metrics.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 18ba849 and 9f0ea09.

📒 Files selected for processing (1)

wren-ai-service/eval/README.md (3 hunks)

🧰 Additional context used

🪛 LanguageTool

wren-ai-service/eval/README.md

[uncategorized] ~26-~26: Loose punctuation mark.
Context: ... datasets for evaluation: - spider1.0: The Spider dataset (default if no datas...

(UNLIKELY_OPENING_PUNCTUATION)

🔇 Additional comments (2)

wren-ai-service/eval/README.md (2)

20-22: Clarify the Dataset Preparation Command
The updated command just prep <dataset-name> clearly indicates that users must supply a dataset name. This change improves usability by making the command explicit.

29-49: Review the Two-Step Dataset Preparation Instructions
The description of the dataset preparation process (downloading and preparing evaluation datasets) is clear and instructive. Ensure that the provided directory paths (e.g., for downloads and outputs) are correct and reflect the repository’s structure. No changes are required here, but verifying the actual paths during integration is recommended.

cyyeh

overall lgtm. could you also mention that we need config.yaml in eval folder? thanks

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

wren-ai-service/eval/README.md (4)

19-19: Refine Eval Dataset Preparation Title
Consider adding a space before the parentheses for improved visual separation. For example, change to:
## Eval Dataset Preparation (If using Spider 1.0 dataset, or Bird dataset)

21-23: Clarify 'just prep' Command Usage
The command now explicitly requires a <dataset-name> argument, which is a welcome improvement. It might be useful to add a brief note on the valid dataset names (e.g., spider1.0 and bird) and mention that the Spider dataset is the default if none is provided.

30-36: Detailed Download Instructions
The description of the download step (pointing to wren-ai-service/tools/dev/etc/<dataset-name>) is clear and informative. Consider including a brief note if this path is configurable or subject to change in different environments.

72-77: BigQuery Datasource Setup
This section efficiently explains how to configure BigQuery as a datasource, including the rationale behind using the .env.dev file for handling sensitive credentials. You might consider adding a reference or link to the official BigQuery documentation for users who want more in-depth information.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9f0ea09 and ec22531.

📒 Files selected for processing (1)

wren-ai-service/eval/README.md (4 hunks)

🧰 Additional context used

🪛 LanguageTool

wren-ai-service/eval/README.md

[uncategorized] ~27-~27: Loose punctuation mark.
Context: ... datasets for evaluation: - spider1.0: The Spider dataset (default if no datas...

(UNLIKELY_OPENING_PUNCTUATION)

🔇 Additional comments (10)

wren-ai-service/eval/README.md (10)

7-10: Improved Requirements Section Formatting
The addition of a numbered list in the requirements section enhances clarity and overall readability. Please ensure that these installation steps remain consistent with other parts of the documentation.

25-29: Clear Dataset Support Details
The section clearly lists the supported datasets (spider1.0 and bird), including the default behavior. This makes it easy for users to understand which dataset will be used if no dataset name is specified.

🧰 Tools

🪛 LanguageTool

[uncategorized] ~27-~27: Loose punctuation mark.
Context: ... datasets for evaluation: - spider1.0: The Spider dataset (default if no datas...

(UNLIKELY_OPENING_PUNCTUATION)

38-42: Clear Dataset Preparation Steps
The instructions for where the evaluation datasets are saved (wren-ai-service/eval/dataset) and the accompanying code snippet provide clear guidance. Confirm that this folder structure matches the current project setup.

44-48: Output File Naming Conventions
The naming conventions for output files (both for Spider and Bird datasets) are well documented. It might be beneficial to reiterate here if these conventions are fixed or might be subject to future updates.

58-66: Datasource Configuration Instructions
The newly added section on configuring the datasource for prediction and evaluation is logically structured and clearly explains both local and BigQuery setups. Ensure that any cross-references to configuration files remain aligned with the instructions provided in this section.

68-70: Example Configuration in config.yaml
The YAML snippet demonstrating the db_path_for_duckdb configuration is clear and provides a good concrete example. Verify that the path provided (etc/bird/minidev/MINIDEV/dev_databases) is consistent with your environment or add a note for customization if needed.

78-82: Credential Encoding Instruction
The command snippet to encode credentials using base64 is straightforward and useful. It may be worthwhile to remind users to check for platform-specific nuances if they are running this command on different operating systems.

84-92: BigQuery config.yaml Example
The example for configuring BigQuery in config.yaml is clear. A minor enhancement might be to explicitly mention that the credentials string should be base64 encoded, as noted in the inline comment.

94-102: .env.dev Configuration Example
The configuration example provided for the .env.dev file is well formatted and underscores the best practice of keeping credentials secure.

158-160: New Evaluation Metrics Added
The addition of the three new evaluation metrics (QuestionToReasoningJudge, ReasoningToSqlJudge, and SqlSemanticsJudge) is clearly documented. Ensure that these metrics are integrated with the evaluation framework and that any related dashboards or reports are updated accordingly.

cyyeh

lgtm

…er#1427) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

paopa added the module/ai-service ai-service related label Mar 19, 2025

github-actions Bot added the wren-ai-service label Mar 19, 2025

paopa force-pushed the chore/update-eval-data-source-doc branch from 18ba849 to 9f0ea09 Compare March 19, 2025 10:02

coderabbitai Bot reviewed Mar 19, 2025

View reviewed changes

cyyeh self-requested a review March 20, 2025 01:57

cyyeh reviewed Mar 20, 2025

View reviewed changes

paopa added 4 commits March 20, 2025 11:05

chore: update doc for eval preparation process

e0f9ee5

chore: update the section for datasource(duckdb, bigquery)

5d861a3

chore: update metrics section for eval

646eaac

chore: mention that need a config.yaml in eval folder

ec22531

paopa force-pushed the chore/update-eval-data-source-doc branch from 75975e8 to ec22531 Compare March 20, 2025 03:10

paopa requested a review from cyyeh March 20, 2025 03:11

coderabbitai Bot reviewed Mar 20, 2025

View reviewed changes

Merge branch 'main' into chore/update-eval-data-source-doc

018cc35

cyyeh approved these changes Mar 20, 2025

View reviewed changes

cyyeh merged commit 5cc7501 into main Mar 20, 2025

cyyeh deleted the chore/update-eval-data-source-doc branch March 20, 2025 03:56

coderabbitai Bot mentioned this pull request Mar 24, 2025

fix(wren-ui): fix language for generate question for SQL pairs #1453

Merged

pull Bot pushed a commit to nagyist/WrenAI that referenced this pull request May 4, 2026

feat(skills): add auto-update notification and skill versioning (Cann…

25c2134

…er#1427) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(wren-ai-service): Add Bird dataset support and enhance configuration documentation#1427

chore(wren-ai-service): Add Bird dataset support and enhance configuration documentation#1427
cyyeh merged 5 commits into
mainfrom
chore/update-eval-data-source-doc

paopa commented Mar 19, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 19, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

cyyeh left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

cyyeh left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

paopa commented Mar 19, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dataset Support

Configuration Documentation

Evaluation Metrics

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cyyeh left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cyyeh left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

paopa commented Mar 19, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 19, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)