TAJO-2069: Implement finding the total size of all objects in a bucket with AWS SDK. by blrunner · Pull Request #1024 · apache/tajo

blrunner · 2016-05-16T07:53:11Z

See following issues

When creating external table, Tajo calls FileSystem::getContentSummary to get the table volume in TableSpace::createTable. This API will call S3 client api to loop recursively all sub directories of the specified path. It will become a huge bottleneck with a large partitioned table. We need to improve it for AWS Tajo users. Here is my benchmark results as follows.

Configuration

EC2 instance type : c3.xlarge
Tajo version : 0.12.0-SNAPSHOT
Cluster: 1 master, 1 worker

Contents summary time

of directories | S3AFileSystem | S3FileTableSpace | Improvement

-------------------|----------------------|--------------------------|-------------------
5 | 1056.5 ms | 136.2 ms | 7.8x
365 | 56549 ms | 153.8 ms | 367.7x
730 | 113007.5 ms | 193.2 ms | 585x
1095 | 168567 ms | 215.7 ms | 781.5x
1460 | 228129.5 ms | 234.2 ms | 974.1x

…t with AWS SDK.

jinossy · 2016-05-18T01:47:00Z

Why you close the PR? the conversation is lost.

blrunner · 2016-05-18T01:55:59Z

@jinossy

Honestly, I made a miss while rebasing this branch. And then github closed previous PR automatically.
You can find previous conversations following issue. #953

jinossy · 2016-05-18T02:04:42Z

@blrunner OK
please add your test environment description. I will test this on AWS

…into TAJO-2069

blrunner · 2016-05-19T08:32:07Z

@jinossy

I generated partitioned tables on HDFS, and then uploaded output files to S3 with aws sdk, finally created external table on ec2. Here are my test environment.

Hadoop version: apache hadoop 2.7.1
Basic data : TPC-H 1G data set
CTAS for partitioned table with HDFS


CREATE TABLE lineitem_p1 (
  l_orderkey INT8, l_partkey INT8, l_suppkey INT8, l_linenumber INT8, l_quantity FLOAT8, 
  l_extendedprice FLOAT8, l_discount FLOAT8, l_tax FLOAT8, l_returnflag TEXT, l_linestatus TEXT,
  l_commitdate DATE, l_receiptdate DATE, l_shipinstruct TEXT, l_shipmode TEXT, l_comment TEXT
) 
USING TEXT WITH ('text.delimiter'='|') 
PARTITION BY COLUMN(l_shipdate text)
AS 
SELECT L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY,
L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_COMMITDATE,   L_RECEIPTDATE, L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT, L_SHIPDATE FROM LINEITEM
where l_shipdate < '1992-01-07';

CREATE TABLE lineitem_p2 (
  l_orderkey INT8, l_partkey INT8, l_suppkey INT8, l_linenumber INT8, l_quantity FLOAT8, 
  l_extendedprice FLOAT8, l_discount FLOAT8, l_tax FLOAT8, l_returnflag TEXT, l_linestatus TEXT,
  l_commitdate DATE, l_receiptdate DATE, l_shipinstruct TEXT, l_shipmode TEXT, l_comment TEXT
) 
USING TEXT WITH ('text.delimiter'='|') 
PARTITION BY COLUMN(l_shipdate text)
AS SELECT L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT, L_SHIPDATE FROM LINEITEM
where l_shipdate < '1993-01-01';

CREATE TABLE lineitem_p3 (
  l_orderkey INT8, l_partkey INT8, l_suppkey INT8, l_linenumber INT8, l_quantity FLOAT8, 
  l_extendedprice FLOAT8, l_discount FLOAT8, l_tax FLOAT8, l_returnflag TEXT, l_linestatus TEXT,
  l_commitdate DATE, l_receiptdate DATE, l_shipinstruct TEXT, l_shipmode TEXT, l_comment TEXT
) 
USING TEXT WITH ('text.delimiter'='|') 
PARTITION BY COLUMN(l_shipdate text)
AS SELECT L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT, L_SHIPDATE FROM LINEITEM
where l_shipdate < '1994-01-01';

CREATE TABLE lineitem_p4 (
  l_orderkey INT8, l_partkey INT8, l_suppkey INT8, l_linenumber INT8, l_quantity FLOAT8, 
  l_extendedprice FLOAT8, l_discount FLOAT8, l_tax FLOAT8, l_returnflag TEXT, l_linestatus TEXT,
  l_commitdate DATE, l_receiptdate DATE, l_shipinstruct TEXT, l_shipmode TEXT, l_comment TEXT
) 
USING TEXT WITH ('text.delimiter'='|') 
PARTITION BY COLUMN(l_shipdate text)
AS SELECT L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT, L_SHIPDATE FROM LINEITEM
where l_shipdate < '1995-01-01';

CREATE TABLE lineitem_p5 (
  l_orderkey INT8, l_partkey INT8, l_suppkey INT8, l_linenumber INT8, l_quantity FLOAT8, 
  l_extendedprice FLOAT8, l_discount FLOAT8, l_tax FLOAT8, l_returnflag TEXT, l_linestatus TEXT,
  l_commitdate DATE, l_receiptdate DATE, l_shipinstruct TEXT, l_shipmode TEXT, l_comment TEXT
) 
USING TEXT WITH ('text.delimiter'='|') 
PARTITION BY COLUMN(l_shipdate text)
AS SELECT L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT, L_SHIPDATE FROM LINEITEM
where l_shipdate < '1996-01-01';

DDL for creating external table with S3

CREATE EXTERNAL TABLE lineitem_p1 (
l_orderkey INT8, l_partkey INT8, l_suppkey INT8, l_linenumber INT8, l_quantity FLOAT8, 
l_extendedprice FLOAT8, l_discount FLOAT8, l_tax FLOAT8, l_returnflag TEXT, l_linestatus TEXT,
l_commitdate DATE, l_receiptdate DATE, l_shipinstruct TEXT, l_shipmode TEXT, l_comment TEXT
) 
USING TEXT WITH ('text.delimiter'='|') 
PARTITION BY COLUMN(l_shipdate text)
LOCATION 's3://jhjung-us/tpch/lineitem_p1';

CREATE EXTERNAL TABLE lineitem_p2 (
l_orderkey INT8, l_partkey INT8, l_suppkey INT8, l_linenumber INT8, l_quantity FLOAT8, 
l_extendedprice FLOAT8, l_discount FLOAT8, l_tax FLOAT8, l_returnflag TEXT, l_linestatus TEXT,
l_commitdate DATE, l_receiptdate DATE, l_shipinstruct TEXT, l_shipmode TEXT, l_comment TEXT
) 
USING TEXT WITH ('text.delimiter'='|') 
PARTITION BY COLUMN(l_shipdate text)
LOCATION 's3://jhjung-us/tpch/lineitem_p2';

CREATE EXTERNAL TABLE lineitem_p3 (
l_orderkey INT8, l_partkey INT8, l_suppkey INT8, l_linenumber INT8, l_quantity FLOAT8, 
l_extendedprice FLOAT8, l_discount FLOAT8, l_tax FLOAT8, l_returnflag TEXT, l_linestatus TEXT,
l_commitdate DATE, l_receiptdate DATE, l_shipinstruct TEXT, l_shipmode TEXT, l_comment TEXT
) 
USING TEXT WITH ('text.delimiter'='|') 
PARTITION BY COLUMN(l_shipdate text)
LOCATION 's3://jhjung-us/tpch/lineitem_p3';

CREATE EXTERNAL TABLE lineitem_p4 (
l_orderkey INT8, l_partkey INT8, l_suppkey INT8, l_linenumber INT8, l_quantity FLOAT8, 
l_extendedprice FLOAT8, l_discount FLOAT8, l_tax FLOAT8, l_returnflag TEXT, l_linestatus TEXT,
l_commitdate DATE, l_receiptdate DATE, l_shipinstruct TEXT, l_shipmode TEXT, l_comment TEXT
) 
USING TEXT WITH ('text.delimiter'='|') 
PARTITION BY COLUMN(l_shipdate text)
LOCATION 's3://jhjung-us/tpch/lineitem_p4';

CREATE EXTERNAL TABLE lineitem_p5 (
l_orderkey INT8, l_partkey INT8, l_suppkey INT8, l_linenumber INT8, l_quantity FLOAT8, 
l_extendedprice FLOAT8, l_discount FLOAT8, l_tax FLOAT8, l_returnflag TEXT, l_linestatus TEXT,
l_commitdate DATE, l_receiptdate DATE, l_shipinstruct TEXT, l_shipmode TEXT, l_comment TEXT
) 
USING TEXT WITH ('text.delimiter'='|') 
PARTITION BY COLUMN(l_shipdate text)
LOCATION 's3://jhjung-us/tpch/lineitem_p5';

Configuration for S3 implementation

  <property>
    <name>fs.s3.impl</name>
    <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
  </property>

  <property>
    <name>fs.s3a.impl</name>
    <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
  </property>

  <property>
    <name>fs.s3n.impl</name>
    <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
  </property>

jinossy · 2016-05-20T01:26:11Z

I will test. Thanks

…into TAJO-2069

blrunner · 2016-05-25T05:01:29Z

I updated this PR as following:

Remove unnecessary modifications
Add mockup tests
Avoid to use S3Tablespace less than hadoop 2.6.0.
Refactor the pom file of s3 module

I found that it ran as expected on local cluster and EMR. Also it calculated the volume of multi level partitioned table successfully with following table:

CREATE external TABLE lineitem_multilevel_p1 (
l_orderkey INT8, l_partkey INT8, l_suppkey INT8, l_linenumber INT8, l_quantity FLOAT8, 
l_extendedprice FLOAT8, l_discount FLOAT8, l_tax FLOAT8, l_returnflag TEXT, l_linestatus TEXT,
l_commitdate TEXT, l_shipinstruct TEXT, l_shipmode TEXT, l_comment TEXT
) 
USING TEXT WITH ('text.delimiter'='|') 
PARTITION BY COLUMN(l_shipdate text, l_receiptdate text)
location 's3a://jhjung-us/tpch/lineitem_multilevel_p1';

Additionally, I added codes for comparing this PR and FileSystem::getContentsSummary to my gist at next site : https://gist.github.com/blrunner/9a8e585ff18a809afb87d8f07d94e345. I found that the result of S3Tablespace::calculateSize is always equals to the result of FileSystem::getContentsSummary. Also I found that FileSystem::listStatus had been called recursively while calling FileSystem::getContentsSummary. It seems that the cause of performance difference is listing directories recursively.

jinossy · 2016-05-30T02:34:04Z

Guys,

I found the improved performance reason. If there is not set the delimiter, the listObjects return a list of summary information about the objects. it reduce the requests to aws.

please see the below comments

http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/ListObjectsRequest.html

Contains options to return a list of summary information about the objects in the specified bucket. Depending on the request parameters, additional information is returned, such as common prefixes if a delimiter was specified. List results are always returned in lexicographic (alphabetical) order.

Buckets can contain a virtually unlimited number of keys, and the complete results of a list query can be extremely large. To manage large result sets, Amazon S3 uses pagination to split them into multiple responses. Always check the ObjectListing.isTruncated() method to see if the returned listing is complete, or if callers need to make additional calls to get more results. Alternatively, use the AmazonS3Client.listNextBatchOfObjects(ObjectListing) method as an easy way to get the next page of object listings.

Calling setDelimiter(String) sets the delimiter, allowing groups of keys that share the delimiter-terminated prefix to be included in the returned listing. This allows applications to organize and browse their keys hierarchically, similar to how a file system organizes files into directories. These common prefixes can be retrieved through the ObjectListing.getCommonPrefixes() method.

For example, consider a bucket that contains the following keys:

"foo/bar/baz"
"foo/bar/bash"
"foo/bar/bang"
"foo/boo"
If calling listObjects with a prefix value of "foo/" and a delimiter value of "/" on this bucket, an ObjectListing is returned that contains one key ("foo/boo") and one entry in the common prefixes list ("foo/bar/"). To see deeper into the virtual hierarchy, make another call to listObjects setting the prefix parameter to any interesting common prefix to list the individual keys under that prefix.
The total number of keys in a bucket doesn't substantially affect list performance, nor does the presence or absence of additional request parameters.

jihoonson · 2016-05-30T02:36:50Z

Thanks for sharing. It sounds reasonable.

jinossy · 2016-08-28T07:53:20Z

rebase please

…into TAJO-2069 Conflicts: tajo-project/pom.xml

blrunner · 2016-08-29T13:15:26Z

@jinossy

Rebased. :-)

jinossy · 2016-08-29T13:38:46Z

+1 LGTM!
Ship it!

blrunner · 2016-08-29T14:02:51Z

@jinossy

Thanks for your review.
I'll ship it. :-)

TAJO-2069: Implement finding the total size of all objects in a bucke…

6e4ddd9

…t with AWS SDK.

blrunner mentioned this pull request May 16, 2016

TAJO-2111: Optimize Partition Table Split Computation for Amazon S3 #994

Closed

blrunner mentioned this pull request May 18, 2016

TAJO-2069: Implement finding the total size of all objects in a bucket with AWS SDK. #953

Closed

Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo …

a0c7029

…into TAJO-2069

blrunner added 17 commits May 20, 2016 17:43

Use configurations of S3AFileSystem

735c773

Remove unnecessary modifications

fe61e40

Remove unnecessary unit tests

83bcfca

Clean up S3TableSpace

dedd49d

Remove unused packages

c15c000

Implement normal unit test cases

30dd01b

Rmeove unused method

5c8d99a

Clean up pom files

0e7da2c

Remove unnecessary codes

3cd12ac

Clean up dependencies of s3 module

8d47723

Remove unnecessary modification

004d9b1

Remove unnecessary modifications

2819df8

Add more dependencies

4de642a

Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo …

f2c8c63

…into TAJO-2069

Never use S3Tablespace less than hadoop 2.6.0

d20b1c8

Add exception handling

6615635

Add ClassNotFoundException handling

158cc62

blrunner added 3 commits May 29, 2016 16:24

Remove NoClassDefFoundError handling codes for AmazonS3

aa6375c

Add NoClassDefFoundError handling codes to S3TableSpace

0bbd297

Add descriptions for FileTablespace::calculateSize

499a3f5

blrunner added 4 commits May 30, 2016 10:38

Borrow credential provides from hadoop

b00c7d7

Add unit test cases for several prefixes

17abec2

Add some descriptions for provides

1258d4b

Add UnsupportedException to MockAmazonS3 and MockS3FileSystem

660ca38

blrunner added 3 commits May 30, 2016 11:54

Clean up some codes of S3TableSpace

5346472

Remove httpclient dependency

3728219

Add httpclient dependency again and remove exception throwing codes

15e4978

Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo …

c00332a

…into TAJO-2069 Conflicts: tajo-project/pom.xml

asfgit closed this in 4f35c28 Aug 29, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

TAJO-2069: Implement finding the total size of all objects in a bucket with AWS SDK.#1024

TAJO-2069: Implement finding the total size of all objects in a bucket with AWS SDK.#1024
blrunner wants to merge 30 commits intoapache:masterfrom
blrunner:TAJO-2069

blrunner commented May 16, 2016

Uh oh!

jinossy commented May 18, 2016

Uh oh!

blrunner commented May 18, 2016

Uh oh!

jinossy commented May 18, 2016

Uh oh!

blrunner commented May 19, 2016 •

edited

Loading

Uh oh!

jinossy commented May 20, 2016

Uh oh!

blrunner commented May 25, 2016

Uh oh!

jinossy commented May 30, 2016

Uh oh!

jihoonson commented May 30, 2016

Uh oh!

jinossy commented Aug 28, 2016

Uh oh!

blrunner commented Aug 29, 2016

Uh oh!

jinossy commented Aug 29, 2016

Uh oh!

blrunner commented Aug 29, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

blrunner commented May 16, 2016

Configuration

Contents summary time

of directories | S3AFileSystem | S3FileTableSpace | Improvement

Uh oh!

jinossy commented May 18, 2016

Uh oh!

blrunner commented May 18, 2016

Uh oh!

jinossy commented May 18, 2016

Uh oh!

blrunner commented May 19, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jinossy commented May 20, 2016

Uh oh!

blrunner commented May 25, 2016

Uh oh!

jinossy commented May 30, 2016

Uh oh!

jihoonson commented May 30, 2016

Uh oh!

jinossy commented Aug 28, 2016

Uh oh!

blrunner commented Aug 29, 2016

Uh oh!

jinossy commented Aug 29, 2016

Uh oh!

blrunner commented Aug 29, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

blrunner commented May 19, 2016 •

edited

Loading