[core] feat: add multi-cloud storage support for benchmark results#127
Conversation
- Add support for AWS S3, Azure Blob, GCS, and GitHub storage providers - Extend storage package with parse and validation functions for each provider - Update BuildStorageArgs to generate appropriate CLI arguments for genai-bench - Add comprehensive test coverage for all storage types - Update benchmark documentation with multi-cloud examples and best practices
There was a problem hiding this comment.
Summary of Changes
Hello @slin1237, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly enhances the BenchmarkJob functionality by introducing robust multi-cloud storage support for benchmark results. It allows users to store their benchmark outputs in various popular cloud providers like AWS S3, Azure Blob Storage, Google Cloud Storage, and even GitHub Releases, in addition to the existing OCI Object Storage. The changes involve core logic for parsing and validating storage URIs, generating appropriate command-line arguments for the underlying genai-bench tool, and updating user-facing documentation with comprehensive examples and best practices.
Highlights
- Multi-Cloud Storage Expansion: Expanded BenchmarkJob's result storage capabilities beyond OCI Object Storage to include AWS S3, Azure Blob Storage, Google Cloud Storage, and GitHub Releases, aligning with
genai-bench's multi-cloud support. - Storage URI Parsing and Validation: Introduced new utility functions and data structures within the
pkg/utils/storagepackage to parse and validate URIs for S3, Azure, GCS, and GitHub storage, ensuring correct format and component extraction. - Dynamic CLI Argument Generation: Enhanced the
BuildStorageArgsfunction inpkg/controller/v1beta1/benchmark/utilsto dynamically generate appropriategenai-benchCLI arguments, including provider-specific authentication parameters, based on the selected storage type. - Comprehensive Documentation Updates: Updated the
site/content/en/docs/concepts/benchmark.mddocumentation with detailed examples, URI formats, and authentication options for all newly supported cloud storage providers, along with best practices for multi-cloud usage. - Improved Benchmark Job Status Logic: Adjusted the status reconciliation logic in
pkg/controller/v1beta1/benchmark/controller.goto prioritize 'Failed' status over 'Completed' for BenchmarkJobs, ensuring accurate reporting when a job completes with errors.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
This pull request introduces multi-cloud storage support for benchmark results, enhancing the capabilities of BenchmarkJob. The changes include updates to utility functions, tests, and documentation. I have provided suggestions to improve maintainability and documentation clarity.
| if storageSpec.Parameters != nil { | ||
| params := *storageSpec.Parameters | ||
| // Check if account name is provided in parameters | ||
| if accountName, ok := params["azure_account_name"]; ok { | ||
| args = append(args, "--storage-azure-account-name", accountName) | ||
| } else { | ||
| args = append(args, "--storage-azure-account-name", components.AccountName) | ||
| } | ||
| // Azure credentials | ||
| if accountKey, ok := params["azure_account_key"]; ok { | ||
| args = append(args, "--storage-azure-account-key", accountKey) | ||
| } | ||
| if connString, ok := params["azure_connection_string"]; ok { | ||
| args = append(args, "--storage-azure-connection-string", connString) | ||
| } | ||
| if sasToken, ok := params["azure_sas_token"]; ok { | ||
| args = append(args, "--storage-azure-sas-token", sasToken) | ||
| } | ||
| } else { | ||
| // Even without parameters, we need to add the account name | ||
| args = append(args, "--storage-azure-account-name", components.AccountName) | ||
| } |
There was a problem hiding this comment.
The logic for handling the Azure account name is a bit redundant. It can be simplified to improve readability and reduce code duplication. You can determine the account name to use first (from parameters or the URI) and then handle the other credential parameters.
// Handle storage parameters for Azure. The account name is required and will be
// taken from parameters if present, otherwise it's taken from the URI.
accountName := components.AccountName
if storageSpec.Parameters != nil {
params := *storageSpec.Parameters
if name, ok := params["azure_account_name"]; ok {
accountName = name
}
// Azure credentials
if accountKey, ok := params["azure_account_key"]; ok {
args = append(args, "--storage-azure-account-key", accountKey)
}
if connString, ok := params["azure_connection_string"]; ok {
args = append(args, "--storage-azure-connection-string", connString)
}
if sasToken, ok := params["azure_sas_token"]; ok {
args = append(args, "--storage-azure-sas-token", sasToken)
}
}
args = append(args, "--storage-azure-account-name", accountName)| storageUri: "az://myaccount/mycontainer/path/to/results" | ||
| # Or: "az://myaccount.blob.core.windows.net/mycontainer/path/to/results" | ||
| parameters: | ||
| azure_account_name: "myaccount" # Optional: Storage account name |
There was a problem hiding this comment.
The comment for azure_account_name states that it's "Optional", which could be misleading for users. The storage account name is a required piece of information for Azure Blob Storage. While it's optional to specify in the parameters block, that's only because it can be parsed from the storageUri. To avoid confusion, clarify that the account name must be provided either in the URI or in the parameters.
| azure_account_name: "myaccount" # Optional: Storage account name | |
| azure_account_name: "myaccount" # Required: Azure storage account name |
) - Add support for AWS S3, Azure Blob, GCS, and GitHub storage providers - Extend storage package with parse and validation functions for each provider - Update BuildStorageArgs to generate appropriate CLI arguments for genai-bench - Add comprehensive test coverage for all storage types - Update benchmark documentation with multi-cloud examples and best practices
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR adds multi-cloud storage support to BenchmarkJob, enabling users to store benchmark results in their preferred cloud storage provider.
Previously, only OCI Object Storage was supported.
The implementation extends the storage package to support:
s3://bucket/prefix)az://account/container/blob)gs://bucket/object)github://owner/repo@tag)This aligns with genai-bench's new multi-cloud capabilities and allows users to integrate benchmark results storage with their existing cloud
infrastructure.
Changes:
Storage Package (
pkg/utils/storage/):GetStorageType()andValidateStorageURI()to handle new typesBenchmark Utils (
pkg/controller/v1beta1/benchmark/utils/):BuildStorageArgs()to generate correct CLI arguments for genai-benchDocumentation (
site/content/en/docs/concepts/benchmark.md):Tests:
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?