Add Compression and File utilities for Zip and GZ handling#22
Conversation
|
I think I want to add one or two more options for the utilities. give me a min |
cd6228b to
253e643
Compare
|
Added more |
253e643 to
309955e
Compare
|
get --> build in the docs shouldn't be a change. Will fix |
309955e to
3da9158
Compare
|
I'm adding a few more cases, closing until they are in |
3da9158 to
280fe0a
Compare
83437d3 to
994a976
Compare
There was a problem hiding this comment.
can we call this something else and reference the bug?
There was a problem hiding this comment.
Bug is referenced below, but I'll put it in the method comments and change the name.
|
It was asked which parts are new. A very large part is new. Here's the prior CompressionUtil which did not really have retries or data consistency checks. FileUtils is new |
b676362 to
95776df
Compare
|
After a long conversation with @xvrl I am in agreement that the checksum approach previously presented did not solve enough of a problem, and any solution implemented at the level I was going for probably isn't going to be flexible enough in a reasonable manner. As such checksumming is left out of this implementation to be exercised by the callers if needed, to be verified against their source of truth as they desire, and to be implemented on the portion of the data which is deemed at risk. To have all those considerations in the code and not suffer problems with implementations that do not share the same set of risk criteria simply wasn't working out. Checksumming has been removed from the PR. |
1ea906d to
15151b0
Compare
There was a problem hiding this comment.
I think all the functions in this class need at least a little bit of documentation, especially around what their return values mean, when they will or won't throw exceptions, and what the retry behavior is.
Also, do we actually need all of these? More functions are more convenient but also more code to test and maintain.
8a22f18 to
406a579
Compare
There was a problem hiding this comment.
this one's missing javadocs for some reason
There was a problem hiding this comment.
For some of the "older" functions I did not add java docs, but rather copied them directly from druid-api. I can add for these though
There was a problem hiding this comment.
I think it makes sense to document those as well, since they are new to java-util. Otherwise, we would just leave them in druid-api. We also made some changes to the return values.
406a579 to
50f7642
Compare
There was a problem hiding this comment.
can we make those javadocs as well?
50f7642 to
7f3239d
Compare
7f3239d to
368f7c3
Compare
Add Compression and File utilities for Zip and GZ handling
* Requires druid-io/druid-api#37 * Requires metamx/java-util#22 * Moves the puller logic to use a more standard workflow going through java-util helpers instead of re-writing the handlers for each impl * General workflow goes like this: 1) LoadSpec makes sure the correct Puller is called with the correct parameters. 2) The Puller sets up general information like how to make an InputStream, how to find a file name (for .gz files for example), and when to retry. 3) CompressionUtils does most of the heavy lifting when it can
I moved these from druid-api and added / expanded the functionality.