-
Notifications
You must be signed in to change notification settings - Fork 3k
Python: Add Google Cloud Storage support #6906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: Add Google Cloud Storage support #6906
Conversation
68cc990 to
5345bb9
Compare
|
@Buktoria this is awesome, thanks for working on this. Let me know when it is ready for review! |
2e59713 to
aa58ac2
Compare
aa58ac2 to
626aa7e
Compare
python/Makefile
Outdated
|
|
||
| test: | ||
| poetry run coverage run --source=pyiceberg/ -m pytest tests/ -m "not s3 and not adlfs" ${PYTEST_ARGS} | ||
| poetry run coverage run --source=pyiceberg/ -m pytest tests/ -m "not s3 and not adlfs and not gcs" ${PYTEST_ARGS} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will get prettier once #6398 has been merged
9c5a826 to
a3d430b
Compare
|
@Fokko This PR is ready to be reviewed |
JonasJ-ap
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your great work!
| } | ||
| return GcsFileSystem(**client_kwargs) | ||
| else: | ||
| return GcsFileSystem() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Question] Is it possible to take out this if...else..., like:
elif scheme in {"gs", "gcs"}:
client_kwargs = {
"access_token": self.properties.get("gs.access-token"),
"credential_token_expiration": self.properties.get("gs.credential-token-expiration"),
}
return GcsFileSystem(**client_kwargs)?
Is there any problem if access_token is not given but credential_token_expiration is provided? Thank you in advance for your answer.
|
@Buktoria thanks for raising this. I was out for a couple of weeks, could you resolve the merge conflicts? I'll review the PR asap. |
Fokko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Buktoria Sorry for pinging you again? Would you have any time to fix the merge conflicts? It would be great to get this into 0.4.0
|
@Buktoria Gentle ping! :) |
|
Hey Fokko. I have been away, and just got back over the weekend. I will make sure to wrap this up this week. So sorry for the delay. |
|
@Buktoria No problem, just checking if it is still on your list. Would be really cool to get this in. |
|
@Buktoria Gente ping from my side. Would be great to get this in. |
python/pyiceberg/io/pyarrow.py
Outdated
| } | ||
| return S3FileSystem(**client_kwargs) | ||
| elif scheme in {"gs", "gcs"}: | ||
| if access_token := self.properties.get("gs.access-token"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this property should be gs.token based on the properties in the readme
python/pyiceberg/io/pyarrow.py
Outdated
| } | ||
| return S3FileSystem(**client_kwargs) | ||
| elif scheme in {"gs", "gcs"}: | ||
| if access_token := self.properties.get("gs.access-token"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add gs.credential-token-expiration to the docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed it. I've added it to the docs now.
| | adlfs.client-secret | oCA3R6P\*ka#oa1Sms2J74z... | The client-secret | | ||
| | adlfs.client-secret | oCA3R6P\*ka#oa1Sms2J74z... | The client-secret | | ||
|
|
||
| ### Google Cloud Storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if the Python GCS properties could be consistent with the Java properties where possible. Java already has some properties defined so starting with those would help avoid any backwards compatibility issues (rather than changing them). There is also a PR to add OAuth2 access token properties on the Java side.
71e7135 to
d17d112
Compare
d17d112 to
8b2ba33
Compare
Co-authored-by: Fokko Driesprong <fokko@apache.org>
Fokko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for picking this up, it looks like the CI is failing. Can you run pip3 install pre-commit && pre-commit run --all-files?
| | Key | Example | Description | | ||
| | ----------------------- | ----------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| | adlfs.connection-string | AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqF...;BlobEndpoint=http://localhost/ | A [connection string](https://learn.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string). This could be used to use FileIO with any adlfs-compatible object storage service that has a different endpoint (like [azurite](https://github.com/azure/azurite)). | | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove this unrelated change?
| | s3fs | S3FS as a FileIO implementation to interact with the object store | | ||
| | adlfs | ADLFS as a FileIO implementation to interact with the object store | | ||
| | snappy | Support for snappy Avro compression | | ||
| | gcs | GCS as the FileIO implementation to interact with the object store | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| | gcs | GCS as the FileIO implementation to interact with the object store | | |
| | gcsfs | GCS as the FileIO implementation to interact with the object store | |
Adding Google Cloud Storage support to the Python Iceberg CLI
gcsfspython libGcsFileSystemclass