Introduce StorageConnector for GCS#14611
Closed
LakshSingla wants to merge 9 commits intoapache:masterfrom
Closed
Conversation
Contributor
Author
|
Parking this for now, since the current library doesn’t support chunked downloads, and uploads, and Druid is bound to the library because Guava cannot be updated for a while. Will update the PR with a list of requirements and the versions of the libraries required for enabling this connector. Working on Azure connector in the meantime. |
10 tasks
Contributor
Author
|
Closed in favor of #15398 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR adds the storage connector to interact with GCS using the API functions exposed in
google-api-services-storage. It will allow Durable storage and MSQ's interactive APIs to work with GCS.This also refactors the currently available S3 connector so that the chunking downloads that is currently done by the S3 connector can be extended to other connectors.
Due to the current versions of libraries used, the connector has the following 3 improvement areas:
Currently, due to the limitations of
google-api-services-storageand the version used by it, we can't use multipart uploads or streaming uploads. Therefore GCS connector writes the intermediate contents to a file and uploads them in a single go. There are composite objects, however, the functionality seems incorrect. This can be improved once we upgrade the libraries.For fetching the file, there is a
isChunkedDownloadsflag which controls if we want to download in chunks using the range header, https://cloud.google.com/storage/docs/xml-api/reference-headers#range, however since it can be ignored, the functionality is kept behind a flag for now. Fetching using range isn't supported in the library currently.All delete requests are done individually and not in a batched manner.
This implementation can be improved provided that we use the
google-cloud-storagelibrary instead of thegoogle-api-services-storagelibrary, though that would require a rehaul of the currently existing Google functions.Release note
To be added
Key changed/added classes in this PR
GoogleStorageConnectorOurBarTheirBazThis PR has: