add google cloud storage InputSource for native batch#8907
add google cloud storage InputSource for native batch#8907gianm merged 6 commits intoapache:masterfrom
Conversation
jon-wei
left a comment
There was a problem hiding this comment.
Patch LGTM aside from one minor comment, would be nice to support some kind of wildcarding in a later patch, but this has parity with the existing firehose impl
| } | ||
|
|
||
| @Provides | ||
| public GoogleStorage getRestS3Service() |
There was a problem hiding this comment.
This should probably have a google-related name instead
There was a problem hiding this comment.
oh i see, you fixed already, cool
| final String bucket = uri.getAuthority(); | ||
| final String key = GoogleUtils.extractGoogleCloudStorageObjectKey(uri); | ||
| final GoogleByteSource byteSource = new GoogleByteSource(storage, bucket, key); | ||
| return CompressionUtils.decompress(byteSource.openStream(), uri.toString()); |
There was a problem hiding this comment.
minor: uri.getPath() is more appropriate here, since it will return the decoded version, whereas uri.toString() URI-encodes everything. It probably won't matter for this particular call site, but still seems like good form to me.
There was a problem hiding this comment.
changed to getPath, tested that the extension works as expected with this change in the actual google clouds.
|
|
||
| @JsonCreator | ||
| public GoogleCloudStorageInputSource( | ||
| @JacksonInject("googleStorage") GoogleStorage storage, |
There was a problem hiding this comment.
There isn't generally a need for a string here. The type is enough.
* add google cloud storage InputSource for native batch * rename * checkstyle * fix * fix spelling * review comments
Description
Another follow up to #8823, this PR adds a Google CloudStorage
InputSourceandInputEntityimplementation allowing it to be used with the new native batch indexing interfaces. This implementation differs from theStaticGoogleBlobStoreFirehoseFactoryin that it uses aurislist like theS3InputSourcerather than ablobslist.In a later PR, I think it would be nice to add a
prefixesoption that lists bucket path contents similar to the s3 extension, but I think out of scope of this PR.This PR has:
Key changed/added classes in this PR
GoogleCloudStorageInputSourceGoogleCloudStorageEntity