-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-27950][DSTREAM] Configurable dynamodb url so kinesis-asl works with localstack #45619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Could you set |
|
On it now, will paste shortly. |
|
Thank you, @murrelljenna ! |
|
|
||
| val withOptionalConfiguration = dynamoDBEndpointUrl match { | ||
| case Some(endpoint) => baseClientLibConfiguration | ||
| .withDynamoDBEndpoint(endpoint).withMetricsLevel(metricsLevel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. Shall we remove .withMetricsLevel(metricsLevel) like the following because we already have it in the above?
- .withDynamoDBEndpoint(endpoint).withMetricsLevel(metricsLevel)
+ .withDynamoDBEndpoint(endpoint)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, good catch
|
So I said I'd paste shortly - turns out it'd take around 3 hours haha. It seems I have no failed tests but some suites didn't run. I can post the whole output but that would be enormous. This is with kinesis tests enabled. |
|
Just as a heads up, I'll be unavailable tomorrow and monday, but will be back on this pr come tuesday. |
|
Ya, let's continue next week because the above result looks insufficient to me. I expected the following with |
|
Weird, is there a way to just run the kinesis tests? Even with ENABLE_KINESIS_TESTS=1, dev/run-tests still spends 3 hours running every single suite of tests |
|
@murrelljenna Could you try the following command? (#45619 (comment))
|
|
Thanks! That was way faster. This is likely some local laptop issue I'm having, but I'm getting class not found errors. Also, I have not set up to talk to any particular aws resources on this laptop for spark, so I'm not sure how these tests will work. |
|
Ya, unfortunately, that could be one of the root cause why Apache Spark community was unable to approve and accept this domain like the previous PR. |
|
BTW, could you give me some more data points, please, @murrelljenna ? What is the result when you run the test on the vanilla |
|
Ah, I see, that may be a complicating factor. Here is my output on master. It looks pretty much like the same output |
|
Spark is using java 17, correct? |
|
This error occurs here: Am I expected to have credentials to my own aws in order to run this? |
|
Do I need to set up my own aws environment to run these tests? |
|
@dongjoon-hyun , do these tests generally require one to configure their own aws kinesis environment? Have you run these tests before? This change is pretty simple, it'd be a shame if the pr were to die again. |
|
|
I see, well in theory the kinesis tests could use localstack kinesis (ironically its this PR that would make that possible), but that would be a bit of an effort. I feel like this pr is at an impasse then, no? Are there any community members that have the infrastructure to run these tests? |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |

This PR is a rebased and updated version of a previous pr by @etspaceman from 2019. This PR allows the spark kinesis library to work with localstack instances of kinesis.
I have not added any changes to @etspaceman 's original code, except to remove the
cloudwatchMetricsLevelvalue in the original PR, as subsequent PRs have added that same value in since.What changes were proposed in this pull request?
Currently the spark kinesis library allows you to override the kinesis endpoint, but not the dynamodb endpoint the kinesis consumer will use to manage leases. This PR adds that configurable url into the kinesis input stream builder.
Why are the changes needed?
Localstack has only become more popular for integration testing since the original PR. Currently, one cannot use spark streaming and localstack kinesis together, as the kinesis consumer will only attempt to manage leases through a remote aws dynamo instance.
Does this PR introduce any user-facing change?
This PR will allow the user to specify their own dynamo endpoint, much like how they can currently specify their own kinesis endpoint.
How was this patch tested?
This PR adds unit tests to cover this new value.