Skip to content

Improve S3 upload speeds using aws transfer manager#17674

Merged
gianm merged 5 commits intoapache:masterfrom
confluentinc:apache-improve-s3-uploads
Apr 1, 2025
Merged

Improve S3 upload speeds using aws transfer manager#17674
gianm merged 5 commits intoapache:masterfrom
confluentinc:apache-improve-s3-uploads

Conversation

@suraj-goel
Copy link
Copy Markdown
Contributor

@suraj-goel suraj-goel commented Jan 28, 2025

Fixes #17669.

Description

  • Improve S3 Upload speeds using AWS Transfer Manager.
  • Benchmark Results:

Screenshot 2025-01-28 at 11 45 04 AM

  • As per the benchmark results, minimumUploadPartSize/multipartUploadThreshold should be set to 20 MB for best performance.

Release note

Improved S3 Upload Performance in Druid

  • Integrated S3 Transfer Manager to enhance the speed and efficiency of S3 uploads within Druid. This update significantly reduces upload times of segments to S3.
  • This feature is enabled by default. To disable it
    druid.storage.transfer.useTransferManager=false
    
  • To adjust part sizes, please set the following in common.runtime.properties
    druid.storage.transfer.minimumUploadPartSize=20971520
    druid.storage.transfer.multipartUploadThreshold=20971520

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

private MockAmazonS3Client()
{
super(new AmazonS3Client(), new NoopServerSideEncryption());
super(new AmazonS3Client(), new NoopServerSideEncryption(), new S3TransferConfig());

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation

Invoking [AmazonS3Client.AmazonS3Client](1) should be avoided because it has been deprecated.
@suraj-goel suraj-goel marked this pull request as ready for review January 28, 2025 07:01
// https://github.com/aws/aws-sdk-java/issues/2285
return true;
} else if (e instanceof InterruptedException) {
Thread.interrupted(); // Clear interrupted state and not retry
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to clear the interrupted flag.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention is to gracefully stop the retry operations.
Although, The flag would already be cleared when the InterruptedException is thrown.

Let me know if your suggestion is to modify it.

Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for your contribution!

This change will need a release note, since the usage of the transfer manager by default involves multipart uploads, so people may need to set up a lifecycle rule to clean up unused parts: https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpu-abort-incomplete-mpu-lifecycle-config.html

@gianm gianm merged commit 76bcc8e into apache:master Apr 1, 2025
gianm added a commit to gianm/druid that referenced this pull request Apr 1, 2025
kfaraz pushed a commit that referenced this pull request Apr 1, 2025
@cryptoe
Copy link
Copy Markdown
Contributor

cryptoe commented Apr 1, 2025

@suraj-goel Since the feature is enabled by default, could you please update the description ?

@kgyrtkirk kgyrtkirk added this to the 33.0.0 milestone Apr 14, 2025
@cryptoe
Copy link
Copy Markdown
Contributor

cryptoe commented Apr 16, 2025

@suraj-goel Adjusted the PR description. Please have a look.

@capistrant
Copy link
Copy Markdown
Contributor

@suraj-goel For your benchmarking, I've got some questions if you don't mind -

  1. what file sizes where you uploading?
  2. Were you uploading to cold s3 paths or warm paths?

Troubleshooting some s3 throttling and wondering if the default of 20MB parts for big enough files can flood a cold s3 path too quickly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add the capability to speed up S3 uploads using AWS transfer manager

6 participants