[SPARK-45719][K8S][TESTS] Upgrade AWS SDK to v2 for Kubernetes IT #43510

junyuc25 · 2023-10-24T14:25:23Z

What changes were proposed in this pull request?

As Spark is moving to 4.0, one of the major improvement is to upgrade AWS SDK to v2,
as tracked in this parent Jira: https://issues.apache.org/jira/browse/SPARK-44124.

Currently, some tests in this module (i.e. DepsTestsSuite) uses S3 client which requires
AWS credentials during initialization.

As part of the SDK upgrade, the main purpose of this PR is to upgrading AWS SDK to v2
for the Kubernetes integration tests module.

Why are the changes needed?

As the GA of AWS SDK v2, the SDKv1 has entered maintenance mode where its future
release are only limited to address critical bug and security issues. More details
about the SDK maintenance policy can be found here: https://docs.aws.amazon.com/sdkref/latest/guide/maint-policy.html.
To keep Spark’s dependent softwares up to date, we should consider upgrading the SDK to v2.

Does this PR introduce any user-facing change?

No because this change only impacts the integration tests codes.

How was this patch tested?

The existing integration tests in the k8s integration test module passed

Was this patch authored or co-authored using generative AI tooling?

No

steveloughran

Nothing i'm worried about here. it is only mino after all.

steveloughran · 2023-10-31T17:35:55Z

pom.xml

    <aws.kinesis.client.version>1.12.0</aws.kinesis.client.version>
    <!-- Should be consistent with Kinesis client dependency -->
    <aws.java.sdk.version>1.11.655</aws.java.sdk.version>
+    <aws.java.sdk.v2.version>2.20.128</aws.java.sdk.v2.version>


hadoop is @ 2.20.160 already

Thanks. I've updated the version.

…ests module

LantaoJin · 2023-11-06T07:58:54Z

.github/workflows/build_and_test.yml

          kubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/v1.8.0/installer/volcano-development.yaml || true
          eval $(minikube docker-env)
-          build/sbt -Psparkr -Pkubernetes -Pvolcano -Pkubernetes-integration-tests -Dspark.kubernetes.test.driverRequestCores=0.5 -Dspark.kubernetes.test.executorRequestCores=0.2 -Dspark.kubernetes.test.volcanoMaxConcurrencyJobNum=1 -Dtest.exclude.tags=local "kubernetes-integration-tests/test"
+          build/sbt -Phadoop-3 -Psparkr -Pkubernetes -Pvolcano -Pkubernetes-integration-tests -Dspark.kubernetes.test.driverRequestCores=0.5 -Dspark.kubernetes.test.executorRequestCores=0.2 -Dspark.kubernetes.test.volcanoMaxConcurrencyJobNum=1 -Dtest.exclude.tags=local "kubernetes-integration-tests/test"


why this change is needed for this PR? "hadoop-3" is a default hadoop profile

Without explicitly activating this profile, I'm seeing compilation issues during build:

[ERROR] /Users/xxx/repositories/Spark-upgrade-k8s/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala:30: not found: object software [ERROR] /Users/xxx/repositories/Spark-upgrade-k8s/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala:31: not found: object software [ERROR] /Users/xxx/repositories/Spark-upgrade-k8s/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala:32: not found: object software [ERROR] /Users/xxx/repositories/Spark-upgrade-k8s/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala:33: not found: object software [ERROR] /Users/xxx/repositories/Spark-upgrade-k8s/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala:34: not found: object software [ERROR] /Users/xxx/repositories/Spark-upgrade-k8s/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala:309: not found: type S3Client [ERROR] /Users/xxx/repositories/Spark-upgrade-k8s/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala:310: not found: value AwsBasicCredentials [ERROR] /Users/xxx/repositories/Spark-upgrade-k8s/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala:311: not found: value S3Client [ERROR] /Users/xxx/repositories/Spark-upgrade-k8s/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala:312: not found: value StaticCredentialsProvider [ERROR] /Users/xxx/repositories/Spark-upgrade-k8s/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala:314: not found: value Region [ERROR] /Users/xxx/repositories/Spark-upgrade-k8s/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala:323: not found: value CreateBucketRequest [ERROR] /Users/xxx/repositories/Spark-upgrade-k8s/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala:342: not found: value PutObjectRequest

In the pom file of this integration test module, hadoop-3 profile is set to <activeByDefault>true</activeByDefault>. This setting means, when another profile in this pom is activated via -P in command line (i.e. -Pvolcano in this case), the hadoop-3 profile would be deactivated. This behavior is also explained here. Since the volcano profile is explicitly activated, we would also need to explicitly activate hadoop-3 to address the above errors.

LantaoJin · 2023-11-06T08:16:19Z

...ration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala

  val BUCKET = "spark"
  val ACCESS_KEY = "minio"
  val SECRET_KEY = "miniostorage"
+  val REGION = "us-west-2"


How does v1 work without REGION?

When initializing a v2 S3 client, it is required to specify a region (either via codes, environment variables or system properties etc).
But when initializing a v1 S3 client, region is not mandatory. For instance, I was able to list all buckets across regions with the following.

val s3client = new AmazonS3Client() val response1 = s3client.listBuckets() println(response1)

the whole region thing is a real pain...for AWS itself you are now expected to declare the region and have it work everything out; for third party just the endpoint and any region string should suffice. See HADOOP-18908 for our ongoing struggles there

junyuc25 · 2023-11-13T15:54:40Z

Hi @dongjoon-hyun, wonder if you could take a look?

dongjoon-hyun

+1, LGTM.
Thank you, @junyuc25 , @steveloughran , @LantaoJin .

Merged to master for Apache Spark 4.0.0.

dongjoon-hyun · 2023-11-15T22:10:23Z

Welcome to the Apache Spark community, @junyuc25 !
I added you to the Apache Spark contributor group and assigned SPARK-45719 to you.

### What changes were proposed in this pull request? This PR aims to ban `AWS SKD for Java v1`. We migrated to v2 via the following. - #45583 - #43510 ### Why are the changes needed? To ensure the migration to AWS SDK for Java v2 because of the following the end of support schedule. `v2` is strongly recommended since July. - https://aws.amazon.com/blogs/developer/announcing-end-of-support-for-aws-sdk-for-java-v1-x-on-december-31-2025/ > AWS SDK for Java v1.x will enter maintenance mode on July 31, 2024, and reach end-of-support on December 31, 2025. ### Does this PR introduce _any_ user-facing change? No, this PR only prevents mixing this old dependency in the future. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45759 from dongjoon-hyun/SPARK-47632. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

junyuc25 marked this pull request as ready for review October 24, 2023 14:25

github-actions bot added KUBERNETES BUILD INFRA labels Oct 24, 2023

junyuc25 marked this pull request as draft October 27, 2023 08:07

junyuc25 changed the title ~~[Don't merge or review][WIP] Test k8s changes~~ [Don't merge or review][WIP] Upgrade AWS SDK to v2 for Kubernetes integration tests module Oct 31, 2023

junyuc25 changed the title ~~[Don't merge or review][WIP] Upgrade AWS SDK to v2 for Kubernetes integration tests module~~ [SPARK-45719][K8S] Upgrade AWS SDK to v2 for Kubernetes integration tests module Oct 31, 2023

junyuc25 marked this pull request as ready for review October 31, 2023 06:25

steveloughran reviewed Oct 31, 2023

View reviewed changes

junyuche added 2 commits November 1, 2023 18:34

[SPARK-45719][K8S] Upgrade AWS SDK to v2 for Kubernetes integration t…

5bce4db

…ests module

Change sdk version to 2.20.160

281ed1c

junyuc25 force-pushed the junyuc25/k8s-test branch from 4acf575 to 281ed1c Compare November 1, 2023 11:34

LantaoJin reviewed Nov 6, 2023

View reviewed changes

dongjoon-hyun changed the title ~~[SPARK-45719][K8S] Upgrade AWS SDK to v2 for Kubernetes integration tests module~~ [SPARK-45719][K8S][TESTS] Upgrade AWS SDK to v2 for Kubernetes IT Nov 15, 2023

dongjoon-hyun approved these changes Nov 15, 2023

View reviewed changes

dongjoon-hyun closed this in f6a59ed Nov 15, 2023

dongjoon-hyun mentioned this pull request Mar 28, 2024

[SPARK-47632][BUILD] Ban com.amazonaws:aws-java-sdk-bundle dependency #45759

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-45719][K8S][TESTS] Upgrade AWS SDK to v2 for Kubernetes IT #43510

[SPARK-45719][K8S][TESTS] Upgrade AWS SDK to v2 for Kubernetes IT #43510

Uh oh!

junyuc25 commented Oct 24, 2023 •

edited

Loading

Uh oh!

steveloughran left a comment

Uh oh!

steveloughran Oct 31, 2023

Uh oh!

junyuc25 Nov 2, 2023

Uh oh!

LantaoJin Nov 6, 2023

Uh oh!

junyuc25 Nov 8, 2023

Uh oh!

LantaoJin Nov 6, 2023

Uh oh!

junyuc25 Nov 8, 2023

Uh oh!

steveloughran Nov 8, 2023

Uh oh!

junyuc25 commented Nov 13, 2023

Uh oh!

dongjoon-hyun left a comment

Uh oh!

dongjoon-hyun commented Nov 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-45719][K8S][TESTS] Upgrade AWS SDK to v2 for Kubernetes IT #43510

[SPARK-45719][K8S][TESTS] Upgrade AWS SDK to v2 for Kubernetes IT #43510

Uh oh!

Conversation

junyuc25 commented Oct 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

steveloughran left a comment

Choose a reason for hiding this comment

Uh oh!

steveloughran Oct 31, 2023

Choose a reason for hiding this comment

Uh oh!

junyuc25 Nov 2, 2023

Choose a reason for hiding this comment

Uh oh!

LantaoJin Nov 6, 2023

Choose a reason for hiding this comment

Uh oh!

junyuc25 Nov 8, 2023

Choose a reason for hiding this comment

Uh oh!

LantaoJin Nov 6, 2023

Choose a reason for hiding this comment

Uh oh!

junyuc25 Nov 8, 2023

Choose a reason for hiding this comment

Uh oh!

steveloughran Nov 8, 2023

Choose a reason for hiding this comment

Uh oh!

junyuc25 commented Nov 13, 2023

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Nov 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

junyuc25 commented Oct 24, 2023 •

edited

Loading