This repository was archived by the owner on May 12, 2021. It is now read-only.
TAJO-2030: Use list S3 files using AmazonS3Client instead of using S3A.#932
Closed
blrunner wants to merge 4 commits intoapache:masterfrom
Closed
TAJO-2030: Use list S3 files using AmazonS3Client instead of using S3A.#932blrunner wants to merge 4 commits intoapache:masterfrom
blrunner wants to merge 4 commits intoapache:masterfrom
Conversation
…into TAJO-2030
…into TAJO-2030
…into TAJO-2030
Contributor
Author
|
This patch depends on hadoop-aws. I'm going to implement it afresh after resolving #953. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The code for S3 bulk listing is fully implemented in
TajoS3FileSystem. Honestly, my code is heavily based onPrestoS3FileSystem. AndTajoS3FileSystemextendsS3AFileSystembecausePrestoS3FileSystemdoesn't support some methods for file writing, for example,FileSystem::mkdir.Here is my benchmark results as follows.
Configuration
Queries
Results : Partition Pruning
of partitions | S3AFileSystem | TajoS3FileSystem | Improvement
-------------------|----------------------|--------------------------|-------------------
1 | 1088 ms | 607 ms | 1.79x
30 | 5421 ms | 3414 ms | 1.58x
90 | 15776 ms | 7927 ms | 1.99x
151 | 24060 ms | 14912 ms | 1.61x
334 | 45397 ms | 32247 ms | 1.40x
Results : Query Finished time
of partitions | S3AFileSystem | TajoS3FileSystem | Improvement
-------------------|----------------------|--------------------------|-------------------
1 | 3.99 sec | 2.726 sec | 1.46x
30 | 15.447 sec | 12.416 sec | 1.24
90 | 40.153 sec | 31.593 sec | 1.27x
151 | 66.038 sec | 44.604 sec | 1.48x
334 | 137.137 sec | 90.419 sec | 1.51x