-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-27950][DSTREAMS][Kinesis] dynamoDBEndpointUrl and cloudWatchMetricsLevel for Kinesis #24801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ok to test |
|
Thank you for making your PR to Apache Spark, @etspaceman . Could you file an Apache JIRA issue, https://issues.apache.org/jira/projects/SPARK ? |
|
Test build #106181 has finished for PR 24801 at commit
|
|
Test build #106183 has finished for PR 24801 at commit
|
|
|
||
| /** | ||
| * Sets the AWS DynamoDB endpoint URL. Defaults to | ||
| * "https://dynamodb.us-east-1.amazonaws.com" if no custom value is specified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From where does this default coming from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The KCL will attempt to connect to the typical dynamo instance in the configured region. The KinesisInputDStream's default for the region is us-east-1, so naturally the default dynamodb endpoint would be dynamodb.us-east-1.amazonaws.com.
It is not explicitly typed anywhere though. I could update this to be more indicative of the value in the KinesisReceiver (an Option), and specify that the default is None if that is preferable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's better to do it just like DEFAULT_KINESIS_ENDPOINT_URL.
|
We're closing this PR because it hasn't been updated in a while. If you'd like to revive this PR, please reopen it! |
|
Could someone please merge this changes and publish. I am also experiencing same issue connecting to dynamoDB. Do not want default url for endpoint. I want to specify custom URL since my server do not have access to internet |
|
According to @tprabh509 request, this PR is reopened. |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
|
Could someone please merge this one? |
|
@calvin-pietersen . I can reopen this for more visibility to this PR but |
|
Test build #134994 has finished for PR 24801 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
@etspaceman any plans to try to merge this one? |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
|
@etspaceman Can this PR be merged ? Thanks, as this can unblock some of local and integration testing pieces for kinesis streaming. |
|
I am not able to perform merges @bkosaraju . That would need to be merged by the spark team. |
|
Hi @dongjoon-hyun , Shall we request from spark team to Merge this PR (if everything is passed from QA side) - Thanks |
|
Is it possible to opena nd merge this one, please? Would help a lot if we could use localstack with spark. Thanks. |
|
This pr is incredibly necessary. Where are the maintainers on this? I'd be happy to verify that this code works by grabbing it locally and using it with my own integration tests. |
|
@dongjoon-hyun @gaborgsomogyi Who on the spark team can take a look at this? Is there work I can do to help get this moving? |
|
To @murrelljenna , if you want to revive this, please make a new PR while keeping the original authorship of @etspaceman . That's the best way to get more attentions on this effort at least for Apache Spark 4.0.0. BTW, I have no environment to verify this PR's contribution. I guess the most community members are in the same situation. |
|
To @dongjoon-hyun , thanks, working on that now |
|
FYI, I'm working on #45583 which upgrades to AWS SDK v2. |
|
Sounds good, I've gotten this PR back up here: I haven't touched anything other than removing |
What changes were proposed in this pull request?
I was researching getting Spark’s Kinesis integration running locally against
localstack. We found this issue, and it creates a complication: localstack/localstack#677Effectively, we need to be able to redirect calls for Kinesis, DynamoDB and Cloudwatch in order for the KCL to properly use the
localstackinfrastructure. We have successfully done this with the KCL (both 1.x and 2.x), but with Spark’s integration we are unable to configure DynamoDB and Cloudwatch’s endpoints:https://github.com/apache/spark/blob/master/external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisReceiver.scala#L162
This PR adds optional configuration values to the interfaces for dynamoDBEndpointUrl and cloudWatchMetricsLevel.
Why cloudWatchMetricsLevel instead of cloudWatchEndpointUrl? Because the 1.x version of the KCL does not expose a means of configuring the cloudWatchEndpointUrl. Localstack users can instead disable metrics entirely by setting the cloudWatchMetricsLevel to Some(MetricsLevel.NONE)
How was this patch tested?
Existing unit tests were expanded to check that these values were set.
Please review https://spark.apache.org/contributing.html before opening a pull request.