chore(wren-ai-service): Add Bird dataset support and enhance configuration documentation#1427
Conversation
WalkthroughThis pull request updates the README for the Wren AI service evaluation framework. The documentation now includes expanded dataset preparation instructions with support for both Spider and Bird datasets, requiring a dataset name argument. It details download locations, directory structures, and output file naming conventions. Additionally, a new section explains configuration for both built-in data sources and BigQuery integration for custom models, and new evaluation metrics focusing on reasoning and SQL generation alignment have been added. Changes
Possibly related PRs
Suggested reviewers
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
18ba849 to
9f0ea09
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (8)
wren-ai-service/eval/README.md (8)
18-18: Clarify the Header for Dataset PreparationThe header “## Eval Dataset Preparation(If using Spider 1.0 dataset, or Bird dataset)” would benefit from clearer punctuation and spacing. For example, consider revising it to “## Evaluation Dataset Preparation (Spider 1.0 or Bird dataset)” for improved readability and consistency.
26-27: Ensure Consistent Formatting for Dataset BulletsThe bullets for
spider1.0andbirdare well defined. However, note that a static analysis hint flagged a loose punctuation mark on these lines. Consider ensuring consistent punctuation (for example, uniformly ending each bullet with a period or none at all) and possibly clarifying thatspider1.0is the default if no dataset is specified.🧰 Tools
🪛 LanguageTool
[uncategorized] ~26-~26: Loose punctuation mark.
Context: ... datasets for evaluation: -spider1.0: The Spider dataset (default if no datas...(UNLIKELY_OPENING_PUNCTUATION)
31-35: Review Dataset Download InstructionsThe instructions for downloading the dataset into the directory
wren-ai-service/tools/dev/etc/<dataset-name>are clear. Please verify that the placeholder<dataset-name>is properly explained elsewhere so users understand that it will be replaced by the actual dataset identifier.
37-41: Validate the Evaluation Dataset Preparation PathThe step “2. Prepares and saves evaluation datasets to:” with the subsequent code block showing the directory
wren-ai-service/eval/datasetis straightforward. Confirm that this directory exists or that the system creates it as needed.
63-64: Built-in Datasource Configuration DetailsThe explanation of how to use the
db_path_for_duckdbin theconfig.yamlfile is clear. It might be helpful to specify whether the example path applies to both Spider and Bird datasets or only to one of them.
70-70: Heading for BigQuery ConfigurationThe heading “### Configuring BigQuery as a Datasource for Other custom MDLs” is clear. For improved readability, consider standardizing the capitalization (for example, “Other Custom MDLs”).
74-74: Credential Encoding HeadingThe subheading “#### Encoding the credentials” is concise. A brief mention of best practices for handling sensitive information might further strengthen this section.
94-100: .env.dev Example for BigQueryThe provided example for the
.env.devfile mirrors the YAML configuration nicely. It may be beneficial to add a note that this file should not be committed to version control repositories.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
wren-ai-service/eval/README.md(3 hunks)
🧰 Additional context used
🪛 LanguageTool
wren-ai-service/eval/README.md
[uncategorized] ~26-~26: Loose punctuation mark.
Context: ... datasets for evaluation: - spider1.0: The Spider dataset (default if no datas...
(UNLIKELY_OPENING_PUNCTUATION)
🔇 Additional comments (14)
wren-ai-service/eval/README.md (14)
20-22: Verify the CLI Command for Dataset PreparationThe updated CLI command
just prep <dataset-name>clearly indicates that a dataset name must be supplied. Ensure that elsewhere in the documentation or implementation the supported values (e.g.,spider1.0andbird) are explicitly validated.
24-24: Check Introductory Text for Supported DatasetsThe sentence “Currently, we support two datasets for evaluation:” is clear. You might consider adding a brief note on default behavior (e.g., which dataset is assumed if none is specified) directly here.
29-29: Clear Explanation of Command StepsThe statement “The command performs two main steps:” effectively introduces the subsequent instructions. This reads well – no further action needed.
43-48: Confirm Naming Conventions for Output FilesThe output file naming conventions for the Spider and Bird datasets are explicitly stated. Ensure that downstream processes expect these precise file names and that any changes here are reflected in the system’s configuration.
57-59: Datasource Configuration SectionThe new section “## Configure the datasource for prediction and evaluation” is well introduced. It might be worthwhile to confirm that the term “datasource” is used consistently throughout the documentation (or consider using “data source” if that is preferred by your style guidelines).
61-61: Section Heading for Built-in DatasourceThe heading “### For Spider or Bird Datasets” clearly delineates the instructions for local (built-in) datasources.
67-69: YAML Example for DuckDB PathThe YAML snippet for setting
db_path_for_duckdbis well formulated. Verify that the example path"etc/bird/minidev/MINIDEV/dev_databases"accurately reflects the expected directory structure, especially for users working with the Bird dataset.
72-72: BigQuery Datasource Setup GuidanceThe instructions for configuring BigQuery access are comprehensive. Ensure that the audience understands any prerequisites (e.g., required permissions or specific project settings) that might be needed.
76-80: Credential Encoding CommandThe command
cat <path/to/credentials.json> | base64is a standard solution for base64 encoding credentials. Ensure that the placeholder
<path/to/credentials.json>is clearly understood to be a user-specific file path.
82-82: Configuration in config.yaml HeadingThe heading “#### Configuration in
config.yaml” is clear and instructive.
84-84: BigQuery Configuration Instructions in YAMLThe explanation for adding BigQuery parameters to
config.yamlis clear. Consider noting if additional parameters or comments might be needed for more complex setups.
86-90: BigQuery YAML ExampleThe YAML snippet which details the keys
bigquery_project_id,bigquery_dataset_id, andbigquery_credentialsis clear. The inline comment explaining that the credentials must be base64 encoded is helpful.
92-92: .env.dev Configuration HeadingThe heading “#### Configuration in
.env.dev” is appropriately placed for users preferring to use environment variables.
157-159: New Evaluation Metrics AddedThe addition of the metrics “QuestionToReasoningJudge”, “ReasoningToSqlJudge”, and “SqlSemanticsJudge” significantly enhances the evaluation framework. Confirm that these new metrics are consistently integrated within the evaluation pipeline and that detailed explanations of each metric are provided elsewhere in the documentation if needed.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (4)
wren-ai-service/eval/README.md (4)
18-18: Refine Header Wording for Clarity
Consider revising the header “## Eval Dataset Preparation(If using Spider 1.0 dataset, or Bird dataset)” for better readability and consistency. For example, adding spaces and a clearer description would improve comprehension.-## Eval Dataset Preparation(If using Spider 1.0 dataset, or Bird dataset) +## Evaluation Dataset Preparation (for Spider 1.0 and Bird datasets)
24-28: Ensure Consistent Punctuation in the Dataset List
The bullet list outlining the supported datasets is clear. However, a minor punctuation adjustment may help maintain consistency. For example, consider adding a terminal punctuation mark:- - `spider1.0`: The Spider dataset (default if no dataset specified) + - `spider1.0`: The Spider dataset (default if no dataset specified).This small change addresses a loose punctuation remark flagged by static analysis.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~26-~26: Loose punctuation mark.
Context: ... datasets for evaluation: -spider1.0: The Spider dataset (default if no datas...(UNLIKELY_OPENING_PUNCTUATION)
57-102: Validate and Enhance Datasource Configuration Instructions
The new section on configuring the datasource for prediction and evaluation is comprehensive. A few points to consider:
- The built-in datasource example shows a configuration using
db_path_for_duckdbwith a path that appears specific to the Bird dataset (e.g.,"etc/bird/minidev/MINIDEV/dev_databases"). If there is a different configuration or example for the Spider dataset, adding it (or clarifying that the same configuration applies) would be helpful.- The BigQuery configuration steps, including credential encoding and the parameters in both
config.yamland.env.dev, are clearly laid out and improve guidance for users handling credentials securely.Overall, these documentation updates align well with the PR objectives.
157-159: Document the New Evaluation Metrics Further
The addition of evaluation metrics—QuestionToReasoningJudge, ReasoningToSqlJudge, and SqlSemanticsJudge—enhances the framework’s ability to assess reasoning and SQL generation alignment. Consider expanding the documentation (in this file or an ancillary guide) with examples or further explanations of how each metric is calculated and used. This would significantly aid users in understanding and leveraging these metrics.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
wren-ai-service/eval/README.md(3 hunks)
🧰 Additional context used
🪛 LanguageTool
wren-ai-service/eval/README.md
[uncategorized] ~26-~26: Loose punctuation mark.
Context: ... datasets for evaluation: - spider1.0: The Spider dataset (default if no datas...
(UNLIKELY_OPENING_PUNCTUATION)
🔇 Additional comments (2)
wren-ai-service/eval/README.md (2)
20-22: Clarify the Dataset Preparation Command
The updated commandjust prep <dataset-name>clearly indicates that users must supply a dataset name. This change improves usability by making the command explicit.
29-49: Review the Two-Step Dataset Preparation Instructions
The description of the dataset preparation process (downloading and preparing evaluation datasets) is clear and instructive. Ensure that the provided directory paths (e.g., for downloads and outputs) are correct and reflect the repository’s structure. No changes are required here, but verifying the actual paths during integration is recommended.
cyyeh
left a comment
There was a problem hiding this comment.
overall lgtm. could you also mention that we need config.yaml in eval folder? thanks
75975e8 to
ec22531
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (4)
wren-ai-service/eval/README.md (4)
19-19: Refine Eval Dataset Preparation Title
Consider adding a space before the parentheses for improved visual separation. For example, change to:
## Eval Dataset Preparation (If using Spider 1.0 dataset, or Bird dataset)
21-23: Clarify 'just prep' Command Usage
The command now explicitly requires a<dataset-name>argument, which is a welcome improvement. It might be useful to add a brief note on the valid dataset names (e.g.,spider1.0andbird) and mention that the Spider dataset is the default if none is provided.
30-36: Detailed Download Instructions
The description of the download step (pointing towren-ai-service/tools/dev/etc/<dataset-name>) is clear and informative. Consider including a brief note if this path is configurable or subject to change in different environments.
72-77: BigQuery Datasource Setup
This section efficiently explains how to configure BigQuery as a datasource, including the rationale behind using the.env.devfile for handling sensitive credentials. You might consider adding a reference or link to the official BigQuery documentation for users who want more in-depth information.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
wren-ai-service/eval/README.md(4 hunks)
🧰 Additional context used
🪛 LanguageTool
wren-ai-service/eval/README.md
[uncategorized] ~27-~27: Loose punctuation mark.
Context: ... datasets for evaluation: - spider1.0: The Spider dataset (default if no datas...
(UNLIKELY_OPENING_PUNCTUATION)
🔇 Additional comments (10)
wren-ai-service/eval/README.md (10)
7-10: Improved Requirements Section Formatting
The addition of a numbered list in the requirements section enhances clarity and overall readability. Please ensure that these installation steps remain consistent with other parts of the documentation.
25-29: Clear Dataset Support Details
The section clearly lists the supported datasets (spider1.0andbird), including the default behavior. This makes it easy for users to understand which dataset will be used if no dataset name is specified.🧰 Tools
🪛 LanguageTool
[uncategorized] ~27-~27: Loose punctuation mark.
Context: ... datasets for evaluation: -spider1.0: The Spider dataset (default if no datas...(UNLIKELY_OPENING_PUNCTUATION)
38-42: Clear Dataset Preparation Steps
The instructions for where the evaluation datasets are saved (wren-ai-service/eval/dataset) and the accompanying code snippet provide clear guidance. Confirm that this folder structure matches the current project setup.
44-48: Output File Naming Conventions
The naming conventions for output files (both for Spider and Bird datasets) are well documented. It might be beneficial to reiterate here if these conventions are fixed or might be subject to future updates.
58-66: Datasource Configuration Instructions
The newly added section on configuring the datasource for prediction and evaluation is logically structured and clearly explains both local and BigQuery setups. Ensure that any cross-references to configuration files remain aligned with the instructions provided in this section.
68-70: Example Configuration in config.yaml
The YAML snippet demonstrating thedb_path_for_duckdbconfiguration is clear and provides a good concrete example. Verify that the path provided (etc/bird/minidev/MINIDEV/dev_databases) is consistent with your environment or add a note for customization if needed.
78-82: Credential Encoding Instruction
The command snippet to encode credentials usingbase64is straightforward and useful. It may be worthwhile to remind users to check for platform-specific nuances if they are running this command on different operating systems.
84-92: BigQuery config.yaml Example
The example for configuring BigQuery inconfig.yamlis clear. A minor enhancement might be to explicitly mention that the credentials string should be base64 encoded, as noted in the inline comment.
94-102: .env.dev Configuration Example
The configuration example provided for the.env.devfile is well formatted and underscores the best practice of keeping credentials secure.
158-160: New Evaluation Metrics Added
The addition of the three new evaluation metrics (QuestionToReasoningJudge, ReasoningToSqlJudge, and SqlSemanticsJudge) is clearly documented. Ensure that these metrics are integrated with the evaluation framework and that any related dashboards or reports are updated accordingly.
…er#1427) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This PR updates the evaluation documentation with the following enhancements:
Dataset Support
Configuration Documentation
Evaluation Metrics
Summary by CodeRabbit