Skip to content

chore(wren-ai-service): Add Bird dataset support and enhance configuration documentation#1427

Merged
cyyeh merged 5 commits into
mainfrom
chore/update-eval-data-source-doc
Mar 20, 2025
Merged

chore(wren-ai-service): Add Bird dataset support and enhance configuration documentation#1427
cyyeh merged 5 commits into
mainfrom
chore/update-eval-data-source-doc

Conversation

@paopa
Copy link
Copy Markdown
Contributor

@paopa paopa commented Mar 19, 2025

This PR updates the evaluation documentation with the following enhancements:

Dataset Support

  • Added support for the Bird dataset alongside Spider 1.0
  • Updated dataset preparation instructions to support multiple datasets
  • Added clear naming conventions for evaluation dataset files

Configuration Documentation

  • Added detailed instructions for configuring datasources
  • Included configuration examples for both Spider/Bird datasets and BigQuery
  • Added documentation for credential handling and environment setup

Evaluation Metrics

  • Added three new evaluation metrics:
    • QuestionToReasoningJudge
    • ReasoningToSqlJudge
    • SqlSemanticsJudge

Summary by CodeRabbit

  • Documentation
    • Restructured requirements and dataset preparation instructions for clarity.
    • Clarified command usage for evaluation dataset setup, now requiring a dataset name as an argument.
    • Added guidance for configuring data sources, including built-in support for Spider and Bird datasets and BigQuery integration.
    • Introduced new evaluation metrics to enhance assessment of reasoning and SQL generation.

@paopa paopa added the module/ai-service ai-service related label Mar 19, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 19, 2025

Walkthrough

This pull request updates the README for the Wren AI service evaluation framework. The documentation now includes expanded dataset preparation instructions with support for both Spider and Bird datasets, requiring a dataset name argument. It details download locations, directory structures, and output file naming conventions. Additionally, a new section explains configuration for both built-in data sources and BigQuery integration for custom models, and new evaluation metrics focusing on reasoning and SQL generation alignment have been added.

Changes

File Path Change Summary
wren-ai-service/…/README.md - Updated dataset preparation instructions to support both Spider and Bird datasets, with a command requiring a dataset name.
- Added details on download locations, directory structure, and output file naming convention.
- Introduced a configuration section for built-in data sources and BigQuery integration including examples for config.yaml and .env.dev.
- Documented new evaluation metrics.

Possibly related PRs

Suggested reviewers

  • cyyeh

A Bunny's Tale of Change
Hop along, my code friends, so divine,
With Spider and Bird in one neat line.
BigQuery hops in with credentials so bright,
Guiding data through the day and night.
I nibble on bytes and cheer with glee—
A change well made in our tech family!
🐇✨


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@paopa paopa force-pushed the chore/update-eval-data-source-doc branch from 18ba849 to 9f0ea09 Compare March 19, 2025 10:02
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (8)
wren-ai-service/eval/README.md (8)

18-18: Clarify the Header for Dataset Preparation

The header “## Eval Dataset Preparation(If using Spider 1.0 dataset, or Bird dataset)” would benefit from clearer punctuation and spacing. For example, consider revising it to “## Evaluation Dataset Preparation (Spider 1.0 or Bird dataset)” for improved readability and consistency.


26-27: Ensure Consistent Formatting for Dataset Bullets

The bullets for spider1.0 and bird are well defined. However, note that a static analysis hint flagged a loose punctuation mark on these lines. Consider ensuring consistent punctuation (for example, uniformly ending each bullet with a period or none at all) and possibly clarifying that spider1.0 is the default if no dataset is specified.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~26-~26: Loose punctuation mark.
Context: ... datasets for evaluation: - spider1.0: The Spider dataset (default if no datas...

(UNLIKELY_OPENING_PUNCTUATION)


31-35: Review Dataset Download Instructions

The instructions for downloading the dataset into the directory wren-ai-service/tools/dev/etc/<dataset-name> are clear. Please verify that the placeholder <dataset-name> is properly explained elsewhere so users understand that it will be replaced by the actual dataset identifier.


37-41: Validate the Evaluation Dataset Preparation Path

The step “2. Prepares and saves evaluation datasets to:” with the subsequent code block showing the directory wren-ai-service/eval/dataset is straightforward. Confirm that this directory exists or that the system creates it as needed.


63-64: Built-in Datasource Configuration Details

The explanation of how to use the db_path_for_duckdb in the config.yaml file is clear. It might be helpful to specify whether the example path applies to both Spider and Bird datasets or only to one of them.


70-70: Heading for BigQuery Configuration

The heading “### Configuring BigQuery as a Datasource for Other custom MDLs” is clear. For improved readability, consider standardizing the capitalization (for example, “Other Custom MDLs”).


74-74: Credential Encoding Heading

The subheading “#### Encoding the credentials” is concise. A brief mention of best practices for handling sensitive information might further strengthen this section.


94-100: .env.dev Example for BigQuery

The provided example for the .env.dev file mirrors the YAML configuration nicely. It may be beneficial to add a note that this file should not be committed to version control repositories.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6390938 and 18ba849.

📒 Files selected for processing (1)
  • wren-ai-service/eval/README.md (3 hunks)
🧰 Additional context used
🪛 LanguageTool
wren-ai-service/eval/README.md

[uncategorized] ~26-~26: Loose punctuation mark.
Context: ... datasets for evaluation: - spider1.0: The Spider dataset (default if no datas...

(UNLIKELY_OPENING_PUNCTUATION)

🔇 Additional comments (14)
wren-ai-service/eval/README.md (14)

20-22: Verify the CLI Command for Dataset Preparation

The updated CLI command just prep <dataset-name> clearly indicates that a dataset name must be supplied. Ensure that elsewhere in the documentation or implementation the supported values (e.g., spider1.0 and bird) are explicitly validated.


24-24: Check Introductory Text for Supported Datasets

The sentence “Currently, we support two datasets for evaluation:” is clear. You might consider adding a brief note on default behavior (e.g., which dataset is assumed if none is specified) directly here.


29-29: Clear Explanation of Command Steps

The statement “The command performs two main steps:” effectively introduces the subsequent instructions. This reads well – no further action needed.


43-48: Confirm Naming Conventions for Output Files

The output file naming conventions for the Spider and Bird datasets are explicitly stated. Ensure that downstream processes expect these precise file names and that any changes here are reflected in the system’s configuration.


57-59: Datasource Configuration Section

The new section “## Configure the datasource for prediction and evaluation” is well introduced. It might be worthwhile to confirm that the term “datasource” is used consistently throughout the documentation (or consider using “data source” if that is preferred by your style guidelines).


61-61: Section Heading for Built-in Datasource

The heading “### For Spider or Bird Datasets” clearly delineates the instructions for local (built-in) datasources.


67-69: YAML Example for DuckDB Path

The YAML snippet for setting db_path_for_duckdb is well formulated. Verify that the example path

"etc/bird/minidev/MINIDEV/dev_databases"

accurately reflects the expected directory structure, especially for users working with the Bird dataset.


72-72: BigQuery Datasource Setup Guidance

The instructions for configuring BigQuery access are comprehensive. Ensure that the audience understands any prerequisites (e.g., required permissions or specific project settings) that might be needed.


76-80: Credential Encoding Command

The command

cat <path/to/credentials.json> | base64

is a standard solution for base64 encoding credentials. Ensure that the placeholder <path/to/credentials.json> is clearly understood to be a user-specific file path.


82-82: Configuration in config.yaml Heading

The heading “#### Configuration in config.yaml” is clear and instructive.


84-84: BigQuery Configuration Instructions in YAML

The explanation for adding BigQuery parameters to config.yaml is clear. Consider noting if additional parameters or comments might be needed for more complex setups.


86-90: BigQuery YAML Example

The YAML snippet which details the keys bigquery_project_id, bigquery_dataset_id, and bigquery_credentials is clear. The inline comment explaining that the credentials must be base64 encoded is helpful.


92-92: .env.dev Configuration Heading

The heading “#### Configuration in .env.dev” is appropriately placed for users preferring to use environment variables.


157-159: New Evaluation Metrics Added

The addition of the metrics “QuestionToReasoningJudge”, “ReasoningToSqlJudge”, and “SqlSemanticsJudge” significantly enhances the evaluation framework. Confirm that these new metrics are consistently integrated within the evaluation pipeline and that detailed explanations of each metric are provided elsewhere in the documentation if needed.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
wren-ai-service/eval/README.md (4)

18-18: Refine Header Wording for Clarity
Consider revising the header “## Eval Dataset Preparation(If using Spider 1.0 dataset, or Bird dataset)” for better readability and consistency. For example, adding spaces and a clearer description would improve comprehension.

-## Eval Dataset Preparation(If using Spider 1.0 dataset, or Bird dataset)
+## Evaluation Dataset Preparation (for Spider 1.0 and Bird datasets)

24-28: Ensure Consistent Punctuation in the Dataset List
The bullet list outlining the supported datasets is clear. However, a minor punctuation adjustment may help maintain consistency. For example, consider adding a terminal punctuation mark:

- - `spider1.0`: The Spider dataset (default if no dataset specified)
+ - `spider1.0`: The Spider dataset (default if no dataset specified).

This small change addresses a loose punctuation remark flagged by static analysis.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~26-~26: Loose punctuation mark.
Context: ... datasets for evaluation: - spider1.0: The Spider dataset (default if no datas...

(UNLIKELY_OPENING_PUNCTUATION)


57-102: Validate and Enhance Datasource Configuration Instructions
The new section on configuring the datasource for prediction and evaluation is comprehensive. A few points to consider:

  • The built-in datasource example shows a configuration using db_path_for_duckdb with a path that appears specific to the Bird dataset (e.g., "etc/bird/minidev/MINIDEV/dev_databases"). If there is a different configuration or example for the Spider dataset, adding it (or clarifying that the same configuration applies) would be helpful.
  • The BigQuery configuration steps, including credential encoding and the parameters in both config.yaml and .env.dev, are clearly laid out and improve guidance for users handling credentials securely.

Overall, these documentation updates align well with the PR objectives.


157-159: Document the New Evaluation Metrics Further
The addition of evaluation metrics—QuestionToReasoningJudge, ReasoningToSqlJudge, and SqlSemanticsJudge—enhances the framework’s ability to assess reasoning and SQL generation alignment. Consider expanding the documentation (in this file or an ancillary guide) with examples or further explanations of how each metric is calculated and used. This would significantly aid users in understanding and leveraging these metrics.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 18ba849 and 9f0ea09.

📒 Files selected for processing (1)
  • wren-ai-service/eval/README.md (3 hunks)
🧰 Additional context used
🪛 LanguageTool
wren-ai-service/eval/README.md

[uncategorized] ~26-~26: Loose punctuation mark.
Context: ... datasets for evaluation: - spider1.0: The Spider dataset (default if no datas...

(UNLIKELY_OPENING_PUNCTUATION)

🔇 Additional comments (2)
wren-ai-service/eval/README.md (2)

20-22: Clarify the Dataset Preparation Command
The updated command just prep <dataset-name> clearly indicates that users must supply a dataset name. This change improves usability by making the command explicit.


29-49: Review the Two-Step Dataset Preparation Instructions
The description of the dataset preparation process (downloading and preparing evaluation datasets) is clear and instructive. Ensure that the provided directory paths (e.g., for downloads and outputs) are correct and reflect the repository’s structure. No changes are required here, but verifying the actual paths during integration is recommended.

@cyyeh cyyeh self-requested a review March 20, 2025 01:57
Copy link
Copy Markdown
Member

@cyyeh cyyeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm. could you also mention that we need config.yaml in eval folder? thanks

@paopa paopa force-pushed the chore/update-eval-data-source-doc branch from 75975e8 to ec22531 Compare March 20, 2025 03:10
@paopa paopa requested a review from cyyeh March 20, 2025 03:11
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
wren-ai-service/eval/README.md (4)

19-19: Refine Eval Dataset Preparation Title
Consider adding a space before the parentheses for improved visual separation. For example, change to:
## Eval Dataset Preparation (If using Spider 1.0 dataset, or Bird dataset)


21-23: Clarify 'just prep' Command Usage
The command now explicitly requires a <dataset-name> argument, which is a welcome improvement. It might be useful to add a brief note on the valid dataset names (e.g., spider1.0 and bird) and mention that the Spider dataset is the default if none is provided.


30-36: Detailed Download Instructions
The description of the download step (pointing to wren-ai-service/tools/dev/etc/<dataset-name>) is clear and informative. Consider including a brief note if this path is configurable or subject to change in different environments.


72-77: BigQuery Datasource Setup
This section efficiently explains how to configure BigQuery as a datasource, including the rationale behind using the .env.dev file for handling sensitive credentials. You might consider adding a reference or link to the official BigQuery documentation for users who want more in-depth information.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9f0ea09 and ec22531.

📒 Files selected for processing (1)
  • wren-ai-service/eval/README.md (4 hunks)
🧰 Additional context used
🪛 LanguageTool
wren-ai-service/eval/README.md

[uncategorized] ~27-~27: Loose punctuation mark.
Context: ... datasets for evaluation: - spider1.0: The Spider dataset (default if no datas...

(UNLIKELY_OPENING_PUNCTUATION)

🔇 Additional comments (10)
wren-ai-service/eval/README.md (10)

7-10: Improved Requirements Section Formatting
The addition of a numbered list in the requirements section enhances clarity and overall readability. Please ensure that these installation steps remain consistent with other parts of the documentation.


25-29: Clear Dataset Support Details
The section clearly lists the supported datasets (spider1.0 and bird), including the default behavior. This makes it easy for users to understand which dataset will be used if no dataset name is specified.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~27-~27: Loose punctuation mark.
Context: ... datasets for evaluation: - spider1.0: The Spider dataset (default if no datas...

(UNLIKELY_OPENING_PUNCTUATION)


38-42: Clear Dataset Preparation Steps
The instructions for where the evaluation datasets are saved (wren-ai-service/eval/dataset) and the accompanying code snippet provide clear guidance. Confirm that this folder structure matches the current project setup.


44-48: Output File Naming Conventions
The naming conventions for output files (both for Spider and Bird datasets) are well documented. It might be beneficial to reiterate here if these conventions are fixed or might be subject to future updates.


58-66: Datasource Configuration Instructions
The newly added section on configuring the datasource for prediction and evaluation is logically structured and clearly explains both local and BigQuery setups. Ensure that any cross-references to configuration files remain aligned with the instructions provided in this section.


68-70: Example Configuration in config.yaml
The YAML snippet demonstrating the db_path_for_duckdb configuration is clear and provides a good concrete example. Verify that the path provided (etc/bird/minidev/MINIDEV/dev_databases) is consistent with your environment or add a note for customization if needed.


78-82: Credential Encoding Instruction
The command snippet to encode credentials using base64 is straightforward and useful. It may be worthwhile to remind users to check for platform-specific nuances if they are running this command on different operating systems.


84-92: BigQuery config.yaml Example
The example for configuring BigQuery in config.yaml is clear. A minor enhancement might be to explicitly mention that the credentials string should be base64 encoded, as noted in the inline comment.


94-102: .env.dev Configuration Example
The configuration example provided for the .env.dev file is well formatted and underscores the best practice of keeping credentials secure.


158-160: New Evaluation Metrics Added
The addition of the three new evaluation metrics (QuestionToReasoningJudge, ReasoningToSqlJudge, and SqlSemanticsJudge) is clearly documented. Ensure that these metrics are integrated with the evaluation framework and that any related dashboards or reports are updated accordingly.

Copy link
Copy Markdown
Member

@cyyeh cyyeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@cyyeh cyyeh merged commit 5cc7501 into main Mar 20, 2025
@cyyeh cyyeh deleted the chore/update-eval-data-source-doc branch March 20, 2025 03:56
pull Bot pushed a commit to nagyist/WrenAI that referenced this pull request May 4, 2026
…er#1427)

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants