S3 input source by clintropolis · Pull Request #8903 · apache/druid

clintropolis · 2019-11-19T03:08:18Z

Description

Following up to #8823, this PR adds an S3 InputSource and InputEntity implementation allowing it to be used with the new native batch indexing interfaces. This is currently re-uses the same configuration options as the s3 static firehose, but as an InputSource:

...
    "ioConfig": {
      "type": "index_parallel",
      "inputSource": {
        "type": "s3",
        "uris": ["s3://some/path/file.json"]
      },
      "inputFormat": {
        "type": "json"
      },
      "appendToExisting": false
    },
...

This PR has:

been self-reviewed.
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths.
added integration tests.
been tested in a test Druid cluster.

Key changed/added classes in this PR

S3InputSource
S3Entity

ccaominh · 2019-11-19T06:00:28Z

Travis failure looks related to PR changes:

[ERROR]   Run 1: S3InputSourceTest.testWithPrefixesSplit:131 expected:<[s3://foo/bar/file.gz, s3://bar/foo/file2.gz]> but was:<[s3://foo/bar, s3://bar/foo]>

https://travis-ci.org/apache/incubator-druid/jobs/613814127

ccaominh · 2019-11-19T06:04:00Z

  }

  protected InputSourceReader fixedFormatReader(InputRowSchema inputRowSchema, @Nullable File temporaryDirectory)
+      throws IOException


TeamCity is flagging this: The declared exception IOException is never thrown in this method, nor in its derivables

Ah yeah, nothing implements this method yet, I was preemptively adding it because InputSource.createSplits is allows to throw an IOException and readers will most likely be calling this method. I can remove for now, though I suspect it will likely need to be added back later.

clintropolis · 2019-11-19T09:18:34Z

Travis failure looks related to PR changes:

Yeah, made a mistake cleaning up the code for PR and pointed the mock to return the wrong uris, fixed.

ccaominh · 2019-11-19T18:53:00Z

+  public Stream<InputSplit<URI>> createSplits(InputFormat inputFormat, @Nullable SplitHintSpec splitHintSpec)
+      throws IOException
+  {
+    if (cacheSplitUris == null) {


The javadoc for SplittableInputSource.createSplits() notes that implementations should NOT cache the created splits in memory.

@jihoonson Is my understanding of SplittleInputSource.createSplits() correct?

Yes, it shouldn’t be cached in memory.

modified, uris are already in memory so if given an explicit list there is nothing to be chill about.

prefixes are handled with an iterator that was based on a previous iterator implementation in s3Utils. This iterator uses list objects calls on each prefix in batches of 1024 objects (with a fallback to getObjectMetadata if a specific 403 is encountered), and creates an iterator on that set of summaries which is drained to the outer iterator. When the batch (or single summary) is done, it moves on to the next prefix and repeats, until all prefixes in the list have been iterated. Retries are baked into each call, so the caller of this method doesn't have to worry about such things.

hmm, i should probably add something like this to a javadoc of the iterator method 😢

Ok, I'm fine with this change since it fixes the race between createSplits() and getNumSplits().

ccaominh · 2019-11-19T19:00:43Z

+    this.uri = uri;
+  }
+
+  @Nullable


I believe this never returns null. If it does, then uri should be checked for null below in open before calling uri.getAuthority().

removed @Nullable

ccaominh · 2019-11-19T19:00:52Z

+import java.io.InputStream;
+import java.net.URI;
+
+public class S3Entity implements InputEntity


Can be package-private

ccaominh · 2019-11-19T19:02:48Z

+  );
+
+  @Test
+  public void testSerde() throws Exception


Consider splitting this into two separate tests (one for uris and one for prefixes).

ccaominh · 2019-11-19T19:04:08Z

+  {
+    final ObjectMapper mapper = createS3ObjectMapper();
+
+    final List<URI> uris = Arrays.asList(


Consider making uris and prefixes static final variables since they're used in a few tests.

i went ahead and did this though i'm not sure I actually like the change, since looking at each test it's a lot less obvious what it is testing

ccaominh · 2019-11-19T19:09:28Z

+  }
+
+  @Override
+  protected InputSourceReader formattableReader(


Unit test coverage is missing for this method

added test through test of read method, which mocks the s3 client to do the list and getMetadata operations and mocked get object method that "returns" S3Object with csv file content

ccaominh · 2019-11-19T19:11:04Z

+        );
+        objects.addAll(Lists.newArrayList(objectSummaryIterator));
+      }
+      catch (AmazonS3Exception outerException) {


Unit test coverage is missing for the exception handling here and may be worth adding to verify the retry logic.

added tests for this and more

gianm · 2019-11-20T14:58:29Z

+      if (s3Object == null) {
+        throw new ISE("Failed to get an s3 object for bucket[%s] and key[%s]", bucket, key);
+      }
+      return CompressionUtils.decompress(s3Object.getObjectContent(), uri.toString());


minor: It doesn't matter much, but uri.getPath() would be better here, because uri.toString() URI-encodes its values.

Actually, key would be even better.

oops, I meant to switch this to getPath to be consistent with google extension after your review of this there and forgot, but yeah key is better, fixed this and google extension to use key.

gianm · 2019-11-20T15:54:16Z

+   * {@link ServerSideEncryptingAmazonS3#getObjectMetadata} to check if the 'prefix' is an object in the event the
+   * list objects call responds with a 403 http status code
+   */
+  private static Iterator<InputSplit<URI>> objectFetchingIterator(


This code looks like it could be shareable between the input source and the firehose. If true, please accomplish that sharing, ideally by having the firehose call into the input source.

It is, i was just ignoring the firehose.

I have moved this iterator to S3Utils and changed the signature to be Iterator<S3ObjectSummary>, and modified S3Utils.objectSummaryIterator, used by StaticS3FirehoseFactory and S3TimestampVersionedDataFinder, to take a URI instead of a bucket and key (mildly unfortunate since we will be converting it back to bucket and key, but all the callers have URI available), and defer the logic to this newer iterator.

ccaominh

LGTM 👍

ccaominh · 2019-11-20T23:07:21Z

+    <dependency>
+      <groupId>joda-time</groupId>
+      <artifactId>joda-time</artifactId>
+      <version>2.10.5</version>


Can remove the version here so it uses the one defined in the root POM.

ccaominh · 2019-11-20T23:26:26Z

+                             originalAuthority;
+    final String path = originalPath.startsWith("/") ? originalPath.substring(1) : originalPath;
+
+    return URI.create(StringUtils.format("s3://%s/%s", authority, path));


Since authority and path are both strings, string concatenation may be better than StringUtils.format() here

ccaominh · 2019-11-20T23:30:57Z

+    EasyMock.expect(S3_CLIENT.listObjectsV2(EasyMock.anyObject(ListObjectsV2Request.class))).andReturn(result).once();
+  }
+
+  private static void addExpectedNonPrefixObjectsWithNoListPermission(URI uri)


Parameter uri is not used in the method

gianm · 2019-11-21T02:24:40Z

+                             originalAuthority;
+    final String path = originalPath.startsWith("/") ? originalPath.substring(1) : originalPath;
+
+    return URI.create(StringUtils.format("s3://%s/%s", authority, path));


This is bad, because it won't encode funny characters in path. Imagine the path has a ? in it. It needs to be URI-encoded, or else pulling the key out later won't work. The tricky characters are / (which you don't want to encode) and ?, #, and others (which you do).

StringUtils.urlEncode might help you here.

Alternatively, don't use URIs internally, instead use bucket/key pairs.

How about adding this method to CloudObjectLocation:

public URI toUri() { // Encode path, except leave '/' characters unencoded return URI.create(StringUtils.format("s3://%s/%s", bucket, StringUtils.urlEncode(path).replace("%2F", "/")); }

And using it everywhere that is doing this sort of concatenation today.

It won't handle weird, invalid bucket names but it's better than the simple concatenation happening now, and weird paths are more likely anyway. For extra credit you could include validation for the bucket, throwing an error if it's not valid (AWS, Google, etc all have rules for what's a valid bucket, you could do a loose superset of them).

FYI this method in S3Utils was only called by StaticS3FirehoseFactory, not the new stuff, so I wasn't worrying about it too much because we presumably will remove it in a future release. That said, I went ahead and did the thing to fix it

vogievetsky · 2019-11-22T16:11:26Z

I noticed that the type of this is s3 (vs static-s3). I am 👍 on the change but should there be a release notes tag? My auto firehose to input source converter got caught out by this.

ccaominh · 2019-11-22T18:53:08Z

+      if (!this.uris.isEmpty() && !this.prefixes.isEmpty()) {
+        throw new IAE("uris and prefixes cannot be used together");
+      }
+
+      if (this.uris.isEmpty() && this.prefixes.isEmpty()) {
+        throw new IAE("uris or prefixes must be specified");
+      }


Optionally, simplify to:

if (this.uris.isEmpty() == this.prefixes.isEmpty()) { throw new IAE("exactly one of either uris or prefixes must be specified) }

ccaominh · 2019-11-22T18:54:08Z

  }

-  @JsonProperty
+  @JsonProperty("uris")


Is the ("uris") needed?

ccaominh · 2019-11-22T18:54:48Z

+  @JsonProperty("objects")
+  public List<CloudObjectLocation> getObject()


I think if you rename the getter to getObjects() then you wont need the ("objects").

ccaominh · 2019-11-22T19:23:02Z

      org.apache.commons.io.FileUtils.forceMkdir(outDir);

-      final URI uri = URI.create(StringUtils.format("s3://%s/%s", s3Coords.bucket, s3Coords.path));
+      final URI uri = URI.create(StringUtils.format("s3://%s/%s", s3Coords.getBucket(), s3Coords.getPath()));


Is the problem that described in https://github.com/apache/incubator-druid/pull/8903/files/a4f6ae9ae2f81381d865e87ec5e1219d275f299c..7125e3e94bd468bc82c95fad68538890a23c69ee#r348869424 possible here when the URI created here gets passed to the CloudObjectLocation constructor?

Is there a test to check the handling of tricky characters?

There is not such a test afaik, I guess can look into this, or maybe as a follow-up since this isn't really new code and sort of feels like the scope of this PR is creeping

This code is definitely buggy, please at least include a comment warning future devs as much. (You aren't doing much here besides mechanical refactoring, so I wouldn't insist on fixing it or adding tests, but the comment is nice.)

Example of the bug:

scala> URI.create(String.format("s3://%s/%s", "mybucket", "path/to/myobject?question")).getPath res1: String = /path/to/myobject

Also:

scala> URI.create(String.format("s3://%s/%s", "mybucket", "path/to/100%myobject")).getPath java.lang.IllegalArgumentException: Malformed escape pair at index 25: s3://mybucket/path/to/100%myobject at java.net.URI.create(URI.java:852) ... 28 elided Caused by: java.net.URISyntaxException: Malformed escape pair at index 25: s3://mybucket/path/to/100%myobject at java.net.URI$Parser.fail(URI.java:2848) at java.net.URI$Parser.scanEscape(URI.java:2978) at java.net.URI$Parser.scan(URI.java:3001) at java.net.URI$Parser.checkChars(URI.java:3019) at java.net.URI$Parser.parseHierarchical(URI.java:3105) at java.net.URI$Parser.parse(URI.java:3053) at java.net.URI.<init>(URI.java:588) at java.net.URI.create(URI.java:850) ... 28 more

IMHO we should always leave code better than we found it. Small bugs like this are not worth putting into an issue, and will likely never get worked on, but some poor soul somewhere on the interwebs will run into it and bang their head against it.

Fixing bugs is not scope creep...

This is not a regression and so doesn't have to be fixed in this PR. It's up to the author in this case.

IMHO we should always leave code better than we found it. Small bugs like this are not worth putting into an issue, and will likely never get worked on, but some poor soul somewhere on the interwebs will run into it and bang their head against it.

I don't think this will be happening for this bug. This bug is pretty critical and should be fixed as soon as possible.

I went ahead and fixed this for most of S3 by refactoring to use CloudObjectLocation to ensure URI handling is good, though the footprint of this PR has grown a lot, which is what I was worried about. In some sense, this is sort of related to the work done in #6761. I have opened #8941 to finish the remaining issues.

ccaominh · 2019-11-22T19:24:01Z

  private static final String MIMETYPE_JETS3T_DIRECTORY = "application/x-directory";
  private static final Logger log = new Logger(S3Utils.class);

+  public static final int MAX_S3_RETRIES = 10;


ccaominh · 2019-11-22T19:26:57Z

+        );
      }
-      return CompressionUtils.decompress(s3Object.getObjectContent(), key);
+      return s3Object.getObjectContent();


Should this decompress the stream like it did before?

Oh this is my bad. S3Entity is a RetryingInputEntity and the returned input stream here is wrapped with RetryingInputStream. Decompression logic should be done on RetryingInputStream.

May be worth adding a test

ccaominh · 2019-11-22T19:30:42Z

+|type|This should be `s3`.|N/A|yes|
+|uris|JSON array of URIs where s3 files to be ingested are located.|N/A|`uris` or `prefixes` must be set|
+|prefixes|JSON array of URI prefixes for the locations of s3 files to be ingested.|N/A|`uris` or `prefixes` must be set|
+


With your latest changes, need to add another row for objects here and update the required value for the other columns based on the presence of objects.

Ah, I actually wasn't going to document objects because it's primarily used internally for the splits for parallel subtasks to avoid converting bucket/path back into a URI, but I guess if people prefer to put in an array of objects instead of an array of uris I guess there is no harm in documenting it.

The only harm will be making it something we can't get rid of in the future. If you want a method to have public visibility but not be a public API, we should note that on the method.

ccaominh · 2019-11-22T19:34:54Z

+        if (keyString.startsWith("/")) {
+          keyString = keyString.substring(1);
+        }


This logic is used in a few places (e.g., CloudObjectLocation(URI uri). May be useful to add a helper function.

Added StringUtils.maybeRemoveLeadingSlash

jihoonson · 2019-11-22T21:28:58Z

        getRetryCondition(),
        RetryUtils.DEFAULT_MAX_TRIES
    );
+    return CompressionUtils.decompress(retryingInputStream, getDecompressionPath());


nit: thank you for fixing this! I think The javadoc of readFrom and readFromStart now should mention the returned inputStream shouldn't decompress. I think I'm going to raise a PR for adding some unit tests for this bug and maybe I can update javadoc in my PR unless you want to do it here.

jihoonson · 2019-11-22T21:30:25Z

      InputRowSchema inputRowSchema,
      @Nullable InputFormat inputFormat,
-      File temporaryDirectory
+      @Nullable File temporaryDirectory


Is this nullable now? I think it shouldn't be nullable if it's null only in unit tests.

oops, i don't remember adding this so I suspect I did this on accident, will fix

clintropolis · 2019-11-22T23:41:52Z

I noticed that the type of this is s3 (vs static-s3). I am 👍 on the change but should there be a release notes tag? My auto firehose to input source converter got caught out by this.

I think all of the new native batch stuff will have to be addressed in the release notes. However all of the old stuff is still there so static-s3 still actually works for now if you are using the old parser/firehose way of doing things.

ccaominh

LGTM 👍

gianm · 2019-11-23T07:19:56Z

I noticed that the type of this is s3 (vs static-s3). I am 👍 on the change but should there be a release notes tag? My auto firehose to input source converter got caught out by this.

@vogievetsky — "auto firehose to input source converter" sounds a bit scary, I don't think special effort was put into making the sources consistent with the firehoses (rather, I believe the effort was instead put into making them consistent with other input sources). So there may be other pitfalls if you are assuming they can be converted without special knowledge of the specific source type.

gianm · 2019-11-23T07:22:29Z


+
+  /**
+   * Get path to decompress a compressed stream for the entity


I had trouble making sense of what this comment means, could you please consider rewording it?

At first glance it sounds like a path on local disk that the compressed stream will be decompressed to, but looking at implementations, that doesn't seem right.

At second glance it looks like it's the filename corresponding to the input entity, and is just used to figure out if it needs to be decompressed or not. The javadoc should say something like that.

redid javadocs, renamed method to more generic getPath since it's also a correct name given usage and maybe this is useful for other things

gianm · 2019-11-23T07:26:19Z

@@ -32,30 +33,39 @@ public interface RetryingInputEntity extends InputEntity
  @Override
  default InputStream open() throws IOException


Now that readFromStart() and readFrom(long) are clearer, could you please also update the javadocs for InputEntity#open as well, to say whether or not it should decompress?

added javadocs mentioning that default open implementation handle decompression

gianm · 2019-11-23T07:32:15Z

+   * Directly opens an {@link InputStream} on the input entity. Decompression should be handled externally, this should
+   * return the raw stream for the object.
   */
  default InputStream readFromStart() throws IOException


nit: These seem like "internal" methods that aren't actually meant to be called by users of the interface. In that case RetryingInputEntity probably makes more sense as an abstract class than as an interface, with these methods marked protected. I won't insist it be changed, but if it stays an interface, it'd be nice for the javadocs to say that external callers aren't meant to use these methods.

The reason is a general assumption that any method on an interface is meant for users of the interface.

refactored to abstract class

gianm · 2019-11-23T07:35:29Z

+  @Override
+  public String toString()
+  {
+    return "CloudObjectLocation {"


The formatting is a little weird here.

indeed, fixed, not really sure how it got mangled, i used intellij to generate it originally

gianm · 2019-11-23T07:40:54Z

+    <dependency>
+      <groupId>joda-time</groupId>
+      <artifactId>joda-time</artifactId>
+      <version>2.10.5</version>


Please do remove the version here, it should get pulled it via dependencyManagement from the parent.

gianm · 2019-11-23T07:42:45Z

+    }
+  }
+
+  private S3InputSource(ServerSideEncryptingAmazonS3 s3Client, CloudObjectLocation inputSplit)


nit: IMO, it's better to have only one constructor that actually does stuff, and have the others call this(...). It makes invariants easier to get right.

Or just have one constructor, period, and use static creator methods for other styles of creation.

removed extra constructor

gianm · 2019-11-23T07:44:44Z

+  {
+    this.s3Client = Preconditions.checkNotNull(s3Client, "s3Client");
+    this.uris = uris == null ? new ArrayList<>() : uris;
+    this.prefixes = prefixes == null ? new ArrayList<>() : prefixes;


Is there any reason for the inconsistency here: uris and prefixes are set to empty lists if they come in as null, but objects isn't?

burned again for initially starting by porting over the S3StaticFirehoseFactory, reworked this to use nulls... though as I'm writing this comment I realize I should probably treat empties the same as null and not consider that invalid... will fix PR again

gianm · 2019-11-23T07:45:14Z

+      }
+
+      for (final URI inputURI : this.uris) {
+        Preconditions.checkArgument("s3".equals(inputURI.getScheme()), "input uri scheme == s3 (%s)", inputURI);


nit: be nice to extract "s3" into a constant like SCHEME.

gianm · 2019-11-23T07:47:39Z

      org.apache.commons.io.FileUtils.forceMkdir(outDir);

-      final URI uri = URI.create(StringUtils.format("s3://%s/%s", s3Coords.bucket, s3Coords.path));
+      final URI uri = URI.create(StringUtils.format("s3://%s/%s", s3Coords.getBucket(), s3Coords.getPath()));


This code is definitely buggy, please at least include a comment warning future devs as much. (You aren't doing much here besides mechanical refactoring, so I wouldn't insist on fixing it or adding tests, but the comment is nice.)

Example of the bug:

scala> URI.create(String.format("s3://%s/%s", "mybucket", "path/to/myobject?question")).getPath res1: String = /path/to/myobject

Also:

scala> URI.create(String.format("s3://%s/%s", "mybucket", "path/to/100%myobject")).getPath java.lang.IllegalArgumentException: Malformed escape pair at index 25: s3://mybucket/path/to/100%myobject at java.net.URI.create(URI.java:852) ... 28 elided Caused by: java.net.URISyntaxException: Malformed escape pair at index 25: s3://mybucket/path/to/100%myobject at java.net.URI$Parser.fail(URI.java:2848) at java.net.URI$Parser.scanEscape(URI.java:2978) at java.net.URI$Parser.scan(URI.java:3001) at java.net.URI$Parser.checkChars(URI.java:3019) at java.net.URI$Parser.parseHierarchical(URI.java:3105) at java.net.URI$Parser.parse(URI.java:3053) at java.net.URI.<init>(URI.java:588) at java.net.URI.create(URI.java:850) ... 28 more

gianm · 2019-11-23T08:12:51Z

+                             originalAuthority;
+    final String path = originalPath.startsWith("/") ? originalPath.substring(1) : originalPath;
+
+    return URI.create(StringUtils.format("s3://%s/%s", authority, path));


How about adding this method to CloudObjectLocation:

public URI toUri() { // Encode path, except leave '/' characters unencoded return URI.create(StringUtils.format("s3://%s/%s", bucket, StringUtils.urlEncode(path).replace("%2F", "/")); }

And using it everywhere that is doing this sort of concatenation today.

It won't handle weird, invalid bucket names but it's better than the simple concatenation happening now, and weird paths are more likely anyway. For extra credit you could include validation for the bucket, throwing an error if it's not valid (AWS, Google, etc all have rules for what's a valid bucket, you could do a loose superset of them).

vogievetsky · 2019-11-23T08:47:47Z

@gianm I made a thing that if you paste in a firehose based input spec into the data loader it will be magically converted to an input source based one. I need that because the data loader will soon only work with input sources. Do you think this is a bad idea?

I know there are a lot of Druid users that have ingestion specs saved somewhere outside of Druid what were you imagining these people will do to get onto the new format? Convert it by hand? I figured that the data loader could be helpful there.

vogievetsky · 2019-11-23T08:50:47Z

Also if the data loader does not have that feature should there still be a paragraph in the release notes that guides people how to convert between the specs? Or is it just "here are the new docs, figure it out"?

gianm · 2019-11-23T09:14:03Z

@vogievetsky let's continue this conversation in #8933

ccaominh · 2019-11-25T18:23:14Z

+ * implementations. {@link #bucket} and {@link #path} should NOT be URL encoded.
+ *
+ * The intention is that this is used as a common representation for storage objects as an alternative to dealing in
+ * {@link URI} directly, but still provide a mechansim to round-trip with a URI.


If you need to push another commit later, there's a typo here: mechnsim -> mechanism

gianm

Just had one more comment on the new stuff, thanks @clintropolis

gianm · 2019-11-26T02:21:09Z

+  {
+    this.bucket = Preconditions.checkNotNull(StringUtils.maybeRemoveTrailingSlash(bucket));
+    this.path = Preconditions.checkNotNull(StringUtils.maybeRemoveLeadingSlash(path));
+    Preconditions.checkArgument(this.bucket.equals(StringUtils.urlEncode(this.bucket)));


This exception might get thrown in response to user input, so please add a nice error message. As is, the user would get an IllegalArgumentException with no message.

gianm

LGTM, thanks @clintropolis

clintropolis added 2 commits November 18, 2019 18:26

add s3 input source for native batch ingestion

aff58ee

add docs

8d14aec

clintropolis added the Area - Batch Ingestion label Nov 19, 2019

Merge remote-tracking branch 'upstream/master' into s3-input-source

33a02e8

ccaominh reviewed Nov 19, 2019

View reviewed changes

fixes

ef4cc31

checkstyle

11cb7cc

ccaominh reviewed Nov 19, 2019

View reviewed changes

clintropolis added 6 commits November 19, 2019 19:48

lazy splits

5622807

Merge remote-tracking branch 'upstream/master' into s3-input-source

92c8f93

Merge remote-tracking branch 'upstream/master' into s3-input-source

786e526

fixes and hella tests

f7bb70e

Merge remote-tracking branch 'upstream/master' into s3-input-source

0af2d6e

fix it

a4f6ae9

gianm reviewed Nov 20, 2019

View reviewed changes

clintropolis added 4 commits November 20, 2019 12:48

re-use better iterator

d23a131

use key

765a6f4

Merge remote-tracking branch 'upstream/master' into s3-input-source

008ddac

javadoc and checkstyle

23d63ec

ccaominh approved these changes Nov 20, 2019

View reviewed changes

clintropolis added 2 commits November 20, 2019 16:40

exception

a16758b

oops

7125e3e

gianm reviewed Nov 21, 2019

View reviewed changes

clintropolis added 3 commits November 20, 2019 22:49

refactor to use S3Coords instead of URI

4f221c2

remove unused code, add retrying stream to handle s3 stream

74fd2eb

remove unused parameter

b4527c8

clintropolis mentioned this pull request Nov 21, 2019

use retry-able stream for native batch google input source #8921

Closed

2 tasks

Merge remote-tracking branch 'upstream/master' into s3-input-source

5d35b80

serde test

33cee3c

ccaominh reviewed Nov 22, 2019

View reviewed changes

clintropolis added 2 commits November 22, 2019 13:00

refactor and such

fd3c7ae

now with the ability to compile

3f6d548

jihoonson reviewed Nov 22, 2019

View reviewed changes

fix signature and javadocs

a0677b8

ccaominh approved these changes Nov 23, 2019

View reviewed changes

gianm reviewed Nov 23, 2019

View reviewed changes

gianm mentioned this pull request Nov 23, 2019

Assistance migrating from firehoses/parsers to input sources/formats #8933

Closed

clintropolis added 4 commits November 23, 2019 20:23

Merge remote-tracking branch 'upstream/master' into s3-input-source

7622c1f

fix conflicts yet again, fix S3 uri stuffs

50746a6

more tests, enforce uri for bucket

8e96e55

javadoc

283d457

ccaominh reviewed Nov 25, 2019

View reviewed changes

clintropolis added 3 commits November 25, 2019 13:00

oops

bdb22ca

abstract class instead of interface

bcca782

null or empty

4283769

clintropolis mentioned this pull request Nov 26, 2019

ensure chill URI escaping is done in all the places #8941

Open

gianm reviewed Nov 26, 2019

View reviewed changes

better error

81ebdd4

gianm approved these changes Nov 26, 2019

View reviewed changes

clintropolis merged commit 4458113 into apache:master Nov 26, 2019

clintropolis deleted the s3-input-source branch November 26, 2019 06:31

jon-wei added this to the 0.17.0 milestone Dec 17, 2019

		@JsonProperty("objects")
		public List<CloudObjectLocation> getObject()



		/**
		* Get path to decompress a compressed stream for the entity

		@@ -32,30 +33,39 @@ public interface RetryingInputEntity extends InputEntity
		@Override
		default InputStream open() throws IOException

Conversation

clintropolis commented Nov 19, 2019

Description

Key changed/added classes in this PR

Uh oh!

ccaominh commented Nov 19, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clintropolis commented Nov 19, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clintropolis Nov 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clintropolis Nov 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jihoonson Nov 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ccaominh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vogievetsky commented Nov 22, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clintropolis Nov 20, 2019 •

edited

Loading

clintropolis Nov 20, 2019 •

edited

Loading

jihoonson Nov 20, 2019 •

edited

Loading