Skip to content

feat: pull gcp token from env variables#3583

Merged
eddyxu merged 3 commits intolance-format:mainfrom
alex766:gcp-token-env-variables
Mar 26, 2025
Merged

feat: pull gcp token from env variables#3583
eddyxu merged 3 commits intolance-format:mainfrom
alex766:gcp-token-env-variables

Conversation

@alex766
Copy link
Copy Markdown
Contributor

@alex766 alex766 commented Mar 21, 2025

after #3511, we discovered that we also needed support for setting the token through environment variables, so this sets storage options with the "google_storage_token" env variable

@github-actions
Copy link
Copy Markdown
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 21, 2025

Codecov Report

Attention: Patch coverage is 33.33333% with 4 lines in your changes missing coverage. Please review.

Project coverage is 78.67%. Comparing base (9203377) to head (2c212bd).
Report is 30 commits behind head on main.

Files with missing lines Patch % Lines
rust/lance-io/src/object_store.rs 33.33% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3583      +/-   ##
==========================================
- Coverage   78.77%   78.67%   -0.10%     
==========================================
  Files         254      258       +4     
  Lines       95451    96817    +1366     
  Branches    95451    96817    +1366     
==========================================
+ Hits        75188    76172     +984     
- Misses      17156    17578     +422     
+ Partials     3107     3067      -40     
Flag Coverage Δ
unittests 78.67% <33.33%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@alex766 alex766 changed the title pull google_storage_token from env variables feat: pull gcp token from env variables Mar 21, 2025
@github-actions github-actions Bot added the enhancement New feature or request label Mar 21, 2025
Copy link
Copy Markdown
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested this? I'm confused as to why this would work if it's not parseable from GoogleConfigKey?

@alex766
Copy link
Copy Markdown
Contributor Author

alex766 commented Mar 21, 2025

Have you tested this? I'm confused as to why this would work if it's not parseable from GoogleConfigKey?

@wjones127 Yea, we tested it by setting the environment variable of "google_storage_token", not passing in storage options, and making sure that datasets can be created and read.

For clarification, this PR address the fact that "google_storage_token" is not in the GoogleConfigKey enums, as arrow does not yet support token-based auth for GCP (I filed an issue here apache/arrow-rs-object-store#300). We're using token-based auth for our case and need the environment variable to be picked up and set in storage_options, so we're adding a special case check for it to get picked up if it exists, despite not being a parseable from GoogleConfigKey

@wjones127
Copy link
Copy Markdown
Contributor

Have you tested this? I'm confused as to why this would work if it's not parseable from GoogleConfigKey?

@wjones127 Yea, we tested it by setting the environment variable of "google_storage_token", not passing in storage options, and making sure that datasets can be created and read.

For clarification, this PR address the fact that "google_storage_token" is not in the GoogleConfigKey enums, as arrow does not yet support token-based auth for GCP (I filed an issue here apache/arrow-rs-object-store#300). We're using token-based auth for our case and need the environment variable to be picked up and set in storage_options, so we're adding a special case check for it to get picked up if it exists, despite not being a parseable from GoogleConfigKey

If ObjectStore doesn't support using that kind of authentication, then how is it working? Doesn't object store need to use the token in order for the GCS to accept connections from object stores?

Our current work around involves creating a GcpCredentialProvider and then using the builder's with_credentials to authenticate with an access token.

It sounds like you are using some sort of workaround here where you are manually constructing the store yourself. Does this mean you are just using storage_options as a temporary place to hold that key, and then extracting it later somehow?

@alex766
Copy link
Copy Markdown
Contributor Author

alex766 commented Mar 25, 2025

If ObjectStore doesn't support using that kind of authentication, then how is it working? Doesn't object store need to use the token in order for the GCS to accept connections from object stores?

Because object store doesn't support that kind of auth in it's methods, we made the change (https://github.com/lancedb/lance/pull/3511/files) to lance to directly instantiate the credential provider on lance side when parsing storage options. Because objectstore doesn't have the token providing fully built in, that means it doesn't work with GoogleConfigKey::from_str( so we have to add our own parsing in this commit. Here, we are adding support for environment variables to be picked up as well because for some use cases, we authenticate through environment variables and not the passed in storage_options.

It sounds like you are using some sort of workaround here where you are manually constructing the store yourself. Does this mean you are just using storage_options as a temporary place to hold that key, and then extracting it later somehow?

Your statement is pretty correct, it is effectively a temporary place to hold the key until we hit the code from the PR above where we directly build the credentials. When object-store adds the direct access-token support, both this and the previous PR should be able to be reverted. Let me know if that makes sense!

Copy link
Copy Markdown
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining the context.

@wjones127
Copy link
Copy Markdown
Contributor

It looks like you have a few simple lint fixes you'll need to make before we can merge.

@eddyxu eddyxu merged commit 20cde3b into lance-format:main Mar 26, 2025
27 checks passed
Jay-ju pushed a commit to Jay-ju/lance that referenced this pull request Mar 28, 2025
after lance-format#3511, we discovered that we
also needed support for setting the token through environment variables,
so this sets storage options with the "google_storage_token" env
variable

---------

Co-authored-by: Alexandra Li <alexandra.li@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants