-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-45719][K8S][TESTS] Upgrade AWS SDK to v2 for Kubernetes IT #43510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
steveloughran
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing i'm worried about here. it is only mino after all.
pom.xml
Outdated
| <aws.kinesis.client.version>1.12.0</aws.kinesis.client.version> | ||
| <!-- Should be consistent with Kinesis client dependency --> | ||
| <aws.java.sdk.version>1.11.655</aws.java.sdk.version> | ||
| <aws.java.sdk.v2.version>2.20.128</aws.java.sdk.v2.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hadoop is @ 2.20.160 already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I've updated the version.
4acf575 to
281ed1c
Compare
| kubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/v1.8.0/installer/volcano-development.yaml || true | ||
| eval $(minikube docker-env) | ||
| build/sbt -Psparkr -Pkubernetes -Pvolcano -Pkubernetes-integration-tests -Dspark.kubernetes.test.driverRequestCores=0.5 -Dspark.kubernetes.test.executorRequestCores=0.2 -Dspark.kubernetes.test.volcanoMaxConcurrencyJobNum=1 -Dtest.exclude.tags=local "kubernetes-integration-tests/test" | ||
| build/sbt -Phadoop-3 -Psparkr -Pkubernetes -Pvolcano -Pkubernetes-integration-tests -Dspark.kubernetes.test.driverRequestCores=0.5 -Dspark.kubernetes.test.executorRequestCores=0.2 -Dspark.kubernetes.test.volcanoMaxConcurrencyJobNum=1 -Dtest.exclude.tags=local "kubernetes-integration-tests/test" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change is needed for this PR? "hadoop-3" is a default hadoop profile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without explicitly activating this profile, I'm seeing compilation issues during build:
[ERROR] /Users/xxx/repositories/Spark-upgrade-k8s/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala:30: not found: object software
[ERROR] /Users/xxx/repositories/Spark-upgrade-k8s/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala:31: not found: object software
[ERROR] /Users/xxx/repositories/Spark-upgrade-k8s/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala:32: not found: object software
[ERROR] /Users/xxx/repositories/Spark-upgrade-k8s/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala:33: not found: object software
[ERROR] /Users/xxx/repositories/Spark-upgrade-k8s/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala:34: not found: object software
[ERROR] /Users/xxx/repositories/Spark-upgrade-k8s/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala:309: not found: type S3Client
[ERROR] /Users/xxx/repositories/Spark-upgrade-k8s/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala:310: not found: value AwsBasicCredentials
[ERROR] /Users/xxx/repositories/Spark-upgrade-k8s/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala:311: not found: value S3Client
[ERROR] /Users/xxx/repositories/Spark-upgrade-k8s/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala:312: not found: value StaticCredentialsProvider
[ERROR] /Users/xxx/repositories/Spark-upgrade-k8s/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala:314: not found: value Region
[ERROR] /Users/xxx/repositories/Spark-upgrade-k8s/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala:323: not found: value CreateBucketRequest
[ERROR] /Users/xxx/repositories/Spark-upgrade-k8s/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala:342: not found: value PutObjectRequest
In the pom file of this integration test module, hadoop-3 profile is set to <activeByDefault>true</activeByDefault>. This setting means, when another profile in this pom is activated via -P in command line (i.e. -Pvolcano in this case), the hadoop-3 profile would be deactivated. This behavior is also explained here. Since the volcano profile is explicitly activated, we would also need to explicitly activate hadoop-3 to address the above errors.
| val BUCKET = "spark" | ||
| val ACCESS_KEY = "minio" | ||
| val SECRET_KEY = "miniostorage" | ||
| val REGION = "us-west-2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does v1 work without REGION?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When initializing a v2 S3 client, it is required to specify a region (either via codes, environment variables or system properties etc).
But when initializing a v1 S3 client, region is not mandatory. For instance, I was able to list all buckets across regions with the following.
val s3client = new AmazonS3Client()
val response1 = s3client.listBuckets()
println(response1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the whole region thing is a real pain...for AWS itself you are now expected to declare the region and have it work everything out; for third party just the endpoint and any region string should suffice. See HADOOP-18908 for our ongoing struggles there
|
Hi @dongjoon-hyun, wonder if you could take a look? |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
Thank you, @junyuc25 , @steveloughran , @LantaoJin .
Merged to master for Apache Spark 4.0.0.
|
Welcome to the Apache Spark community, @junyuc25 ! |
### What changes were proposed in this pull request? This PR aims to ban `AWS SKD for Java v1`. We migrated to v2 via the following. - #45583 - #43510 ### Why are the changes needed? To ensure the migration to AWS SDK for Java v2 because of the following the end of support schedule. `v2` is strongly recommended since July. - https://aws.amazon.com/blogs/developer/announcing-end-of-support-for-aws-sdk-for-java-v1-x-on-december-31-2025/ > AWS SDK for Java v1.x will enter maintenance mode on July 31, 2024, and reach end-of-support on December 31, 2025. ### Does this PR introduce _any_ user-facing change? No, this PR only prevents mixing this old dependency in the future. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45759 from dongjoon-hyun/SPARK-47632. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
What changes were proposed in this pull request?
As Spark is moving to 4.0, one of the major improvement is to upgrade AWS SDK to v2,
as tracked in this parent Jira: https://issues.apache.org/jira/browse/SPARK-44124.
Currently, some tests in this module (i.e. DepsTestsSuite) uses S3 client which requires
AWS credentials during initialization.
As part of the SDK upgrade, the main purpose of this PR is to upgrading AWS SDK to v2
for the Kubernetes integration tests module.
Why are the changes needed?
As the GA of AWS SDK v2, the SDKv1 has entered maintenance mode where its future
release are only limited to address critical bug and security issues. More details
about the SDK maintenance policy can be found here: https://docs.aws.amazon.com/sdkref/latest/guide/maint-policy.html.
To keep Spark’s dependent softwares up to date, we should consider upgrading the SDK to v2.
Does this PR introduce any user-facing change?
No because this change only impacts the integration tests codes.
How was this patch tested?
The existing integration tests in the k8s integration test module passed
Was this patch authored or co-authored using generative AI tooling?
No