(AWS) Docs: List all AWS S3 properties from all language impl. by Neuw84 · Pull Request #11383 · apache/iceberg

Neuw84 · 2024-10-23T18:28:28Z

As @hsiang-c made another pull request building a table here I didn't want to collide.

Fixes List all AWS S3 properties in the docs #10674

Therefore, I added:

Added Amazon MSK Connect as option.
Added HTTP client advice when high throughput scenarios (not just tune the retries but also number of connections).
Added specific configs for data prefetching on EMR 7.1.0

As personal opinion, if using AWS SDKs most of the properties shouldn't be there ( there is a standard way of configure them, prioritize them, etc). However, is clear that using 3rd party libraries in different languages would require info like the tables @hsiang-c has built.

The problem with this is that different libraries will have different configs ( on the same language).

In my personal opinion instead of dividing by language, maybe by library (but here maybe just adding a link to the corresponding doc page should be enough)? And having a separate section/table for AWS SDKs supported ones (anything using official libraries will have the same config, no matter the language)?

Thanks!

Added Amazon MSK Connect as option. Added HTTP client advice when high throughput scenarios. Added specific configs for data prefetching on EMR 7.1.0

danielcweeks · 2024-10-30T23:21:32Z

docs/docs/aws.md

+For versions after 7.1.0 there is an specific config that can be used to enable data prefecth optimization. You just need to add the following property on your Spark config.
+
+```shell
+spark.sql.iceberg.data-prefetch.enabled=true


I don't believe this is an Iceberg property. If this is specific to EMR, I don't believe it should be included here.

this is specific to EMR yes ( internal iceberg runtime), however we are on the "aws" docs page.

I think that stating that you can add that parameter to improve the performance of Iceberg workloads on EMR is good to have/know?

danielcweeks · 2024-10-30T23:32:12Z

docs/docs/aws.md

+**Note that for workloads with exceptionally high throughput against tables that S3 where you will likely to increase Retries, you will also like to increase the number of connections for the HTTP client**
+
+```shell
+spark.sql.catalog.my_catalog.http-client.apache.max-connections=200


This doesn't look like an Iceberg setting from what I can tell. If this is EMR specific, it should not be included here.

It is a thing of AWS SDK and Spark ( not specifically to EMR). If you use Spark on your laptop writing to S3 and you are on this high throughput write scenario you will likely tune the parameter.

Any spark runtime will use this ( maybe photon runtime do use another S3 client but I don´t have that info :) ).

On the previous case agree with you that is super specific to EMR and it may or not be added on the aws "docs".

I mean, we are speaking about a AWS docs in this page ( the parameter is quite specific to the S3 client of the AWS SDK).

danielcweeks · 2024-10-30T23:34:02Z

@Neuw84 it looks like we're duplicating what should be EMR documentation here. We already link off to the EMR docs, so I don't feel this is the right place for putting specific configuration info.

Neuw84 · 2024-10-31T10:58:00Z

@danielcweeks let me know your thoughts on the comments ( agree on the specific one about EMR, although for me it does not hurt as we are on aws docs page).

What are your thoughts about the S3 clients info?

github-actions · 2024-12-01T00:19:29Z

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

github-actions · 2024-12-09T00:17:22Z

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

Update aws.md

d405582

Added Amazon MSK Connect as option. Added HTTP client advice when high throughput scenarios. Added specific configs for data prefetching on EMR 7.1.0

github-actions bot added the docs label Oct 23, 2024

Neuw84 changed the title ~~(AWS) Docs: List all AWS S3 properties from all language impl. #10674~~ (AWS) Docs: List all AWS S3 properties from all language impl. fixes #10674 Oct 23, 2024

Neuw84 changed the title ~~(AWS) Docs: List all AWS S3 properties from all language impl. fixes #10674~~ (AWS) Docs: List all AWS S3 properties from all language impl. Oct 23, 2024

danielcweeks reviewed Oct 30, 2024

View reviewed changes

github-actions bot added the stale label Dec 1, 2024

github-actions bot closed this Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(AWS) Docs: List all AWS S3 properties from all language impl.#11383

(AWS) Docs: List all AWS S3 properties from all language impl.#11383
Neuw84 wants to merge 1 commit intoapache:mainfrom
Neuw84:iceberg-10674-aws-docs

Neuw84 commented Oct 23, 2024 •

edited

Loading

Uh oh!

danielcweeks Oct 30, 2024

Uh oh!

Neuw84 Oct 31, 2024 •

edited

Loading

Uh oh!

danielcweeks Oct 30, 2024

Uh oh!

Neuw84 Oct 31, 2024 •

edited

Loading

Uh oh!

danielcweeks commented Oct 30, 2024

Uh oh!

Neuw84 commented Oct 31, 2024

Uh oh!

github-actions bot commented Dec 1, 2024

Uh oh!

github-actions bot commented Dec 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Neuw84 commented Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danielcweeks Oct 30, 2024

Choose a reason for hiding this comment

Uh oh!

Neuw84 Oct 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danielcweeks Oct 30, 2024

Choose a reason for hiding this comment

Uh oh!

Neuw84 Oct 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danielcweeks commented Oct 30, 2024

Uh oh!

Neuw84 commented Oct 31, 2024

Uh oh!

github-actions bot commented Dec 1, 2024

Uh oh!

github-actions bot commented Dec 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Neuw84 commented Oct 23, 2024 •

edited

Loading

Neuw84 Oct 31, 2024 •

edited

Loading

Neuw84 Oct 31, 2024 •

edited

Loading