TAJO-2069: Implement finding the total size of all objects in a bucket with AWS SDK.#953
TAJO-2069: Implement finding the total size of all objects in a bucket with AWS SDK.#953blrunner wants to merge 0 commit intoapache:masterfrom
Conversation
|
Here is my benchmark results as follows. Configuration
Contents summary time
|
|
Removed TAJO-2063(#952) dependency. |
|
I wonder why the time taken by getTotalSize() is not proportional to the number of directories. It shows faster speed for more directories sometimes. |
|
There may be various reasons : local network connection, and the health of Amazon's servers, AWS SDK retry mechanism. |
|
If they are reasons, you can mitigate those overheads by testing several times and averaging the results. |
tajo-storage/tajo-storage-s3/pom.xml
Outdated
|
|
||
| <dependency> | ||
| <groupId>org.apache.hadoop</groupId> | ||
| <artifactId>hadoop-aws</artifactId> |
There was a problem hiding this comment.
hadoop-aws is included in 2.6.0 and higher
If you add hadoop-aws, We should discuss hadoop compatibility
|
@jihoonson @jinossy |
|
I removed hadoop-aws dependency and added Amazon SDK dependency. |
|
Here is my second benchmark results as follows.
|
|
Finished test successfully as following:
|
|
This PR had been moved to #1024. |
Not yet implemented unit test cases and it depends on TAJO-2063 (#952).