Skip to content

[core] feat: add multi-cloud storage support for benchmark results#127

Merged
slin1237 merged 1 commit into
mainfrom
slin/benchmark-update
Jul 3, 2025
Merged

[core] feat: add multi-cloud storage support for benchmark results#127
slin1237 merged 1 commit into
mainfrom
slin/benchmark-update

Conversation

@slin1237
Copy link
Copy Markdown
Collaborator

@slin1237 slin1237 commented Jul 3, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR adds multi-cloud storage support to BenchmarkJob, enabling users to store benchmark results in their preferred cloud storage provider.
Previously, only OCI Object Storage was supported.

The implementation extends the storage package to support:

  • AWS S3 (s3://bucket/prefix)
  • Azure Blob Storage (az://account/container/blob)
  • Google Cloud Storage (gs://bucket/object)
  • GitHub Releases (github://owner/repo@tag)

This aligns with genai-bench's new multi-cloud capabilities and allows users to integrate benchmark results storage with their existing cloud
infrastructure.

Changes:

  1. Storage Package (pkg/utils/storage/):

    • Added parse functions for S3, Azure, GCS, and GitHub storage URIs
    • Added validation functions for each storage type
    • Updated GetStorageType() and ValidateStorageURI() to handle new types
  2. Benchmark Utils (pkg/controller/v1beta1/benchmark/utils/):

    • Extended BuildStorageArgs() to generate correct CLI arguments for genai-bench
    • Added support for provider-specific authentication parameters
  3. Documentation (site/content/en/docs/concepts/benchmark.md):

    • Added examples for each storage provider
    • Documented URI formats and authentication options
    • Added best practices for multi-cloud storage
    • Added link to official genai-bench documentation
  4. Tests:

    • Added comprehensive test coverage for all new storage types
    • Updated existing tests to reflect new functionality

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

  • The implementation follows the URI patterns and CLI arguments documented in genai-bench's multi-cloud documentation
  • All existing OCI storage functionality remains unchanged for backward compatibility
  • The PR includes extensive test coverage with all tests passing
  • Documentation has been updated with clear examples for each storage provider

Does this PR introduce a user-facing change?

Add multi-cloud storage support for BenchmarkJob results. Users can now store benchmark results in AWS S3, Azure Blob Storage, Google Cloud Storage,
 and GitHub Releases in addition to OCI Object Storage.

  - Add support for AWS S3, Azure Blob, GCS, and GitHub storage providers
  - Extend storage package with parse and validation functions for each provider
  - Update BuildStorageArgs to generate appropriate CLI arguments for genai-bench
  - Add comprehensive test coverage for all storage types
  - Update benchmark documentation with multi-cloud examples and best practices
@slin1237 slin1237 requested a review from CatherineSue as a code owner July 3, 2025 20:56
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @slin1237, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the BenchmarkJob functionality by introducing robust multi-cloud storage support for benchmark results. It allows users to store their benchmark outputs in various popular cloud providers like AWS S3, Azure Blob Storage, Google Cloud Storage, and even GitHub Releases, in addition to the existing OCI Object Storage. The changes involve core logic for parsing and validating storage URIs, generating appropriate command-line arguments for the underlying genai-bench tool, and updating user-facing documentation with comprehensive examples and best practices.

Highlights

  • Multi-Cloud Storage Expansion: Expanded BenchmarkJob's result storage capabilities beyond OCI Object Storage to include AWS S3, Azure Blob Storage, Google Cloud Storage, and GitHub Releases, aligning with genai-bench's multi-cloud support.
  • Storage URI Parsing and Validation: Introduced new utility functions and data structures within the pkg/utils/storage package to parse and validate URIs for S3, Azure, GCS, and GitHub storage, ensuring correct format and component extraction.
  • Dynamic CLI Argument Generation: Enhanced the BuildStorageArgs function in pkg/controller/v1beta1/benchmark/utils to dynamically generate appropriate genai-bench CLI arguments, including provider-specific authentication parameters, based on the selected storage type.
  • Comprehensive Documentation Updates: Updated the site/content/en/docs/concepts/benchmark.md documentation with detailed examples, URI formats, and authentication options for all newly supported cloud storage providers, along with best practices for multi-cloud usage.
  • Improved Benchmark Job Status Logic: Adjusted the status reconciliation logic in pkg/controller/v1beta1/benchmark/controller.go to prioritize 'Failed' status over 'Completed' for BenchmarkJobs, ensuring accurate reporting when a job completes with errors.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces multi-cloud storage support for benchmark results, enhancing the capabilities of BenchmarkJob. The changes include updates to utility functions, tests, and documentation. I have provided suggestions to improve maintainability and documentation clarity.

Comment on lines +247 to +268
if storageSpec.Parameters != nil {
params := *storageSpec.Parameters
// Check if account name is provided in parameters
if accountName, ok := params["azure_account_name"]; ok {
args = append(args, "--storage-azure-account-name", accountName)
} else {
args = append(args, "--storage-azure-account-name", components.AccountName)
}
// Azure credentials
if accountKey, ok := params["azure_account_key"]; ok {
args = append(args, "--storage-azure-account-key", accountKey)
}
if connString, ok := params["azure_connection_string"]; ok {
args = append(args, "--storage-azure-connection-string", connString)
}
if sasToken, ok := params["azure_sas_token"]; ok {
args = append(args, "--storage-azure-sas-token", sasToken)
}
} else {
// Even without parameters, we need to add the account name
args = append(args, "--storage-azure-account-name", components.AccountName)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for handling the Azure account name is a bit redundant. It can be simplified to improve readability and reduce code duplication. You can determine the account name to use first (from parameters or the URI) and then handle the other credential parameters.

		// Handle storage parameters for Azure. The account name is required and will be
		// taken from parameters if present, otherwise it's taken from the URI.
		accountName := components.AccountName
		if storageSpec.Parameters != nil {
			params := *storageSpec.Parameters
			if name, ok := params["azure_account_name"]; ok {
				accountName = name
			}

			// Azure credentials
			if accountKey, ok := params["azure_account_key"]; ok {
				args = append(args, "--storage-azure-account-key", accountKey)
			}
			if connString, ok := params["azure_connection_string"]; ok {
				args = append(args, "--storage-azure-connection-string", connString)
			}
			if sasToken, ok := params["azure_sas_token"]; ok {
				args = append(args, "--storage-azure-sas-token", sasToken)
			}
		}
		args = append(args, "--storage-azure-account-name", accountName)

storageUri: "az://myaccount/mycontainer/path/to/results"
# Or: "az://myaccount.blob.core.windows.net/mycontainer/path/to/results"
parameters:
azure_account_name: "myaccount" # Optional: Storage account name
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment for azure_account_name states that it's "Optional", which could be misleading for users. The storage account name is a required piece of information for Azure Blob Storage. While it's optional to specify in the parameters block, that's only because it can be parsed from the storageUri. To avoid confusion, clarify that the account name must be provided either in the URI or in the parameters.

Suggested change
azure_account_name: "myaccount" # Optional: Storage account name
azure_account_name: "myaccount" # Required: Azure storage account name

@slin1237 slin1237 merged commit dd04792 into main Jul 3, 2025
23 checks passed
@zhyncs zhyncs deleted the slin/benchmark-update branch July 4, 2025 02:45
slin1237 added a commit that referenced this pull request Dec 22, 2025
)

- Add support for AWS S3, Azure Blob, GCS, and GitHub storage providers
  - Extend storage package with parse and validation functions for each provider
  - Update BuildStorageArgs to generate appropriate CLI arguments for genai-bench
  - Add comprehensive test coverage for all storage types
  - Update benchmark documentation with multi-cloud examples and best practices
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant