Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.

Comments

TAJO-2069: Implement finding the total size of all objects in a bucket with AWS SDK.#1024

Closed
blrunner wants to merge 30 commits intoapache:masterfrom
blrunner:TAJO-2069
Closed

TAJO-2069: Implement finding the total size of all objects in a bucket with AWS SDK.#1024
blrunner wants to merge 30 commits intoapache:masterfrom
blrunner:TAJO-2069

Conversation

@blrunner
Copy link
Contributor

See following issues

When creating external table, Tajo calls FileSystem::getContentSummary to get the table volume in TableSpace::createTable. This API will call S3 client api to loop recursively all sub directories of the specified path. It will become a huge bottleneck with a large partitioned table. We need to improve it for AWS Tajo users. Here is my benchmark results as follows.

Configuration

  • EC2 instance type : c3.xlarge
  • Tajo version : 0.12.0-SNAPSHOT
  • Cluster: 1 master, 1 worker

Contents summary time

of directories | S3AFileSystem | S3FileTableSpace | Improvement

-------------------|----------------------|--------------------------|-------------------
5 | 1056.5 ms | 136.2 ms | 7.8x
365 | 56549 ms | 153.8 ms | 367.7x
730 | 113007.5 ms | 193.2 ms | 585x
1095 | 168567 ms | 215.7 ms | 781.5x
1460 | 228129.5 ms | 234.2 ms | 974.1x

@jinossy
Copy link
Member

jinossy commented May 18, 2016

Why you close the PR? the conversation is lost.

@blrunner
Copy link
Contributor Author

@jinossy

Honestly, I made a miss while rebasing this branch. And then github closed previous PR automatically.
You can find previous conversations following issue. #953

@jinossy
Copy link
Member

jinossy commented May 18, 2016

@blrunner OK
please add your test environment description. I will test this on AWS

@blrunner
Copy link
Contributor Author

blrunner commented May 19, 2016

@jinossy

I generated partitioned tables on HDFS, and then uploaded output files to S3 with aws sdk, finally created external table on ec2. Here are my test environment.

  • Hadoop version: apache hadoop 2.7.1
  • Basic data : TPC-H 1G data set
  • CTAS for partitioned table with HDFS

CREATE TABLE lineitem_p1 (
  l_orderkey INT8, l_partkey INT8, l_suppkey INT8, l_linenumber INT8, l_quantity FLOAT8, 
  l_extendedprice FLOAT8, l_discount FLOAT8, l_tax FLOAT8, l_returnflag TEXT, l_linestatus TEXT,
  l_commitdate DATE, l_receiptdate DATE, l_shipinstruct TEXT, l_shipmode TEXT, l_comment TEXT
) 
USING TEXT WITH ('text.delimiter'='|') 
PARTITION BY COLUMN(l_shipdate text)
AS 
SELECT L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY,
L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_COMMITDATE,   L_RECEIPTDATE, L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT, L_SHIPDATE FROM LINEITEM
where l_shipdate < '1992-01-07';

CREATE TABLE lineitem_p2 (
  l_orderkey INT8, l_partkey INT8, l_suppkey INT8, l_linenumber INT8, l_quantity FLOAT8, 
  l_extendedprice FLOAT8, l_discount FLOAT8, l_tax FLOAT8, l_returnflag TEXT, l_linestatus TEXT,
  l_commitdate DATE, l_receiptdate DATE, l_shipinstruct TEXT, l_shipmode TEXT, l_comment TEXT
) 
USING TEXT WITH ('text.delimiter'='|') 
PARTITION BY COLUMN(l_shipdate text)
AS SELECT L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT, L_SHIPDATE FROM LINEITEM
where l_shipdate < '1993-01-01';

CREATE TABLE lineitem_p3 (
  l_orderkey INT8, l_partkey INT8, l_suppkey INT8, l_linenumber INT8, l_quantity FLOAT8, 
  l_extendedprice FLOAT8, l_discount FLOAT8, l_tax FLOAT8, l_returnflag TEXT, l_linestatus TEXT,
  l_commitdate DATE, l_receiptdate DATE, l_shipinstruct TEXT, l_shipmode TEXT, l_comment TEXT
) 
USING TEXT WITH ('text.delimiter'='|') 
PARTITION BY COLUMN(l_shipdate text)
AS SELECT L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT, L_SHIPDATE FROM LINEITEM
where l_shipdate < '1994-01-01';

CREATE TABLE lineitem_p4 (
  l_orderkey INT8, l_partkey INT8, l_suppkey INT8, l_linenumber INT8, l_quantity FLOAT8, 
  l_extendedprice FLOAT8, l_discount FLOAT8, l_tax FLOAT8, l_returnflag TEXT, l_linestatus TEXT,
  l_commitdate DATE, l_receiptdate DATE, l_shipinstruct TEXT, l_shipmode TEXT, l_comment TEXT
) 
USING TEXT WITH ('text.delimiter'='|') 
PARTITION BY COLUMN(l_shipdate text)
AS SELECT L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT, L_SHIPDATE FROM LINEITEM
where l_shipdate < '1995-01-01';

CREATE TABLE lineitem_p5 (
  l_orderkey INT8, l_partkey INT8, l_suppkey INT8, l_linenumber INT8, l_quantity FLOAT8, 
  l_extendedprice FLOAT8, l_discount FLOAT8, l_tax FLOAT8, l_returnflag TEXT, l_linestatus TEXT,
  l_commitdate DATE, l_receiptdate DATE, l_shipinstruct TEXT, l_shipmode TEXT, l_comment TEXT
) 
USING TEXT WITH ('text.delimiter'='|') 
PARTITION BY COLUMN(l_shipdate text)
AS SELECT L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT, L_SHIPDATE FROM LINEITEM
where l_shipdate < '1996-01-01';
  • DDL for creating external table with S3
CREATE EXTERNAL TABLE lineitem_p1 (
l_orderkey INT8, l_partkey INT8, l_suppkey INT8, l_linenumber INT8, l_quantity FLOAT8, 
l_extendedprice FLOAT8, l_discount FLOAT8, l_tax FLOAT8, l_returnflag TEXT, l_linestatus TEXT,
l_commitdate DATE, l_receiptdate DATE, l_shipinstruct TEXT, l_shipmode TEXT, l_comment TEXT
) 
USING TEXT WITH ('text.delimiter'='|') 
PARTITION BY COLUMN(l_shipdate text)
LOCATION 's3://jhjung-us/tpch/lineitem_p1';

CREATE EXTERNAL TABLE lineitem_p2 (
l_orderkey INT8, l_partkey INT8, l_suppkey INT8, l_linenumber INT8, l_quantity FLOAT8, 
l_extendedprice FLOAT8, l_discount FLOAT8, l_tax FLOAT8, l_returnflag TEXT, l_linestatus TEXT,
l_commitdate DATE, l_receiptdate DATE, l_shipinstruct TEXT, l_shipmode TEXT, l_comment TEXT
) 
USING TEXT WITH ('text.delimiter'='|') 
PARTITION BY COLUMN(l_shipdate text)
LOCATION 's3://jhjung-us/tpch/lineitem_p2';

CREATE EXTERNAL TABLE lineitem_p3 (
l_orderkey INT8, l_partkey INT8, l_suppkey INT8, l_linenumber INT8, l_quantity FLOAT8, 
l_extendedprice FLOAT8, l_discount FLOAT8, l_tax FLOAT8, l_returnflag TEXT, l_linestatus TEXT,
l_commitdate DATE, l_receiptdate DATE, l_shipinstruct TEXT, l_shipmode TEXT, l_comment TEXT
) 
USING TEXT WITH ('text.delimiter'='|') 
PARTITION BY COLUMN(l_shipdate text)
LOCATION 's3://jhjung-us/tpch/lineitem_p3';

CREATE EXTERNAL TABLE lineitem_p4 (
l_orderkey INT8, l_partkey INT8, l_suppkey INT8, l_linenumber INT8, l_quantity FLOAT8, 
l_extendedprice FLOAT8, l_discount FLOAT8, l_tax FLOAT8, l_returnflag TEXT, l_linestatus TEXT,
l_commitdate DATE, l_receiptdate DATE, l_shipinstruct TEXT, l_shipmode TEXT, l_comment TEXT
) 
USING TEXT WITH ('text.delimiter'='|') 
PARTITION BY COLUMN(l_shipdate text)
LOCATION 's3://jhjung-us/tpch/lineitem_p4';

CREATE EXTERNAL TABLE lineitem_p5 (
l_orderkey INT8, l_partkey INT8, l_suppkey INT8, l_linenumber INT8, l_quantity FLOAT8, 
l_extendedprice FLOAT8, l_discount FLOAT8, l_tax FLOAT8, l_returnflag TEXT, l_linestatus TEXT,
l_commitdate DATE, l_receiptdate DATE, l_shipinstruct TEXT, l_shipmode TEXT, l_comment TEXT
) 
USING TEXT WITH ('text.delimiter'='|') 
PARTITION BY COLUMN(l_shipdate text)
LOCATION 's3://jhjung-us/tpch/lineitem_p5';
  • Configuration for S3 implementation
  <property>
    <name>fs.s3.impl</name>
    <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
  </property>

  <property>
    <name>fs.s3a.impl</name>
    <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
  </property>

  <property>
    <name>fs.s3n.impl</name>
    <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
  </property>

@jinossy
Copy link
Member

jinossy commented May 20, 2016

I will test. Thanks

@blrunner
Copy link
Contributor Author

I updated this PR as following:

  • Remove unnecessary modifications
  • Add mockup tests
  • Avoid to use S3Tablespace less than hadoop 2.6.0.
  • Refactor the pom file of s3 module

I found that it ran as expected on local cluster and EMR. Also it calculated the volume of multi level partitioned table successfully with following table:

CREATE external TABLE lineitem_multilevel_p1 (
l_orderkey INT8, l_partkey INT8, l_suppkey INT8, l_linenumber INT8, l_quantity FLOAT8, 
l_extendedprice FLOAT8, l_discount FLOAT8, l_tax FLOAT8, l_returnflag TEXT, l_linestatus TEXT,
l_commitdate TEXT, l_shipinstruct TEXT, l_shipmode TEXT, l_comment TEXT
) 
USING TEXT WITH ('text.delimiter'='|') 
PARTITION BY COLUMN(l_shipdate text, l_receiptdate text)
location 's3a://jhjung-us/tpch/lineitem_multilevel_p1';

Additionally, I added codes for comparing this PR and FileSystem::getContentsSummary to my gist at next site : https://gist.github.com/blrunner/9a8e585ff18a809afb87d8f07d94e345. I found that the result of S3Tablespace::calculateSize is always equals to the result of FileSystem::getContentsSummary. Also I found that FileSystem::listStatus had been called recursively while calling FileSystem::getContentsSummary. It seems that the cause of performance difference is listing directories recursively.

@jinossy
Copy link
Member

jinossy commented May 30, 2016

Guys,

I found the improved performance reason. If there is not set the delimiter, the listObjects return a list of summary information about the objects. it reduce the requests to aws.

please see the below comments

http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/ListObjectsRequest.html

Contains options to return a list of summary information about the objects in the specified bucket. Depending on the request parameters, additional information is returned, such as common prefixes if a delimiter was specified. List results are always returned in lexicographic (alphabetical) order.

Buckets can contain a virtually unlimited number of keys, and the complete results of a list query can be extremely large. To manage large result sets, Amazon S3 uses pagination to split them into multiple responses. Always check the ObjectListing.isTruncated() method to see if the returned listing is complete, or if callers need to make additional calls to get more results. Alternatively, use the AmazonS3Client.listNextBatchOfObjects(ObjectListing) method as an easy way to get the next page of object listings.

Calling setDelimiter(String) sets the delimiter, allowing groups of keys that share the delimiter-terminated prefix to be included in the returned listing. This allows applications to organize and browse their keys hierarchically, similar to how a file system organizes files into directories. These common prefixes can be retrieved through the ObjectListing.getCommonPrefixes() method.

For example, consider a bucket that contains the following keys:

"foo/bar/baz"
"foo/bar/bash"
"foo/bar/bang"
"foo/boo"
If calling listObjects with a prefix value of "foo/" and a delimiter value of "/" on this bucket, an ObjectListing is returned that contains one key ("foo/boo") and one entry in the common prefixes list ("foo/bar/"). To see deeper into the virtual hierarchy, make another call to listObjects setting the prefix parameter to any interesting common prefix to list the individual keys under that prefix.
The total number of keys in a bucket doesn't substantially affect list performance, nor does the presence or absence of additional request parameters.

@jihoonson
Copy link
Contributor

Thanks for sharing. It sounds reasonable.

@jinossy
Copy link
Member

jinossy commented Aug 28, 2016

rebase please

…into TAJO-2069

Conflicts:
	tajo-project/pom.xml
@blrunner
Copy link
Contributor Author

@jinossy

Rebased. :-)

@jinossy
Copy link
Member

jinossy commented Aug 29, 2016

+1 LGTM!
Ship it!

@blrunner
Copy link
Contributor Author

@jinossy

Thanks for your review.
I'll ship it. :-)

@asfgit asfgit closed this in 4f35c28 Aug 29, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants