Skip to content

Conversation

@etspaceman
Copy link
Contributor

What changes were proposed in this pull request?

I was researching getting Spark’s Kinesis integration running locally against localstack. We found this issue, and it creates a complication: localstack/localstack#677

Effectively, we need to be able to redirect calls for Kinesis, DynamoDB and Cloudwatch in order for the KCL to properly use the localstack infrastructure. We have successfully done this with the KCL (both 1.x and 2.x), but with Spark’s integration we are unable to configure DynamoDB and Cloudwatch’s endpoints:

https://github.com/apache/spark/blob/master/external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisReceiver.scala#L162

This PR adds optional configuration values to the interfaces for dynamoDBEndpointUrl and cloudWatchMetricsLevel.

Why cloudWatchMetricsLevel instead of cloudWatchEndpointUrl? Because the 1.x version of the KCL does not expose a means of configuring the cloudWatchEndpointUrl. Localstack users can instead disable metrics entirely by setting the cloudWatchMetricsLevel to Some(MetricsLevel.NONE)

How was this patch tested?

Existing unit tests were expanded to check that these values were set.

Please review https://spark.apache.org/contributing.html before opening a pull request.

@dongjoon-hyun
Copy link
Member

ok to test

@dongjoon-hyun
Copy link
Member

Thank you for making your PR to Apache Spark, @etspaceman . Could you file an Apache JIRA issue, https://issues.apache.org/jira/projects/SPARK ?

@SparkQA
Copy link

SparkQA commented Jun 5, 2019

Test build #106181 has finished for PR 24801 at commit d7343a8.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@etspaceman etspaceman changed the title dynamoDBEndpointUrl and cloudWatchMetricsLevel for Kinesis [SPARK-27950][DSTREAMS][Kinesis] dynamoDBEndpointUrl and cloudWatchMetricsLevel for Kinesis Jun 5, 2019
@etspaceman
Copy link
Contributor Author

@SparkQA
Copy link

SparkQA commented Jun 5, 2019

Test build #106183 has finished for PR 24801 at commit a42ce6e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


/**
* Sets the AWS DynamoDB endpoint URL. Defaults to
* "https://dynamodb.us-east-1.amazonaws.com" if no custom value is specified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From where does this default coming from?

Copy link
Contributor Author

@etspaceman etspaceman Jun 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The KCL will attempt to connect to the typical dynamo instance in the configured region. The KinesisInputDStream's default for the region is us-east-1, so naturally the default dynamodb endpoint would be dynamodb.us-east-1.amazonaws.com.

It is not explicitly typed anywhere though. I could update this to be more indicative of the value in the KinesisReceiver (an Option), and specify that the default is None if that is preferable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's better to do it just like DEFAULT_KINESIS_ENDPOINT_URL.

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while.
This isn't a judgement on the merit of the PR in any way. It's just
a way of keeping the PR queue manageable.

If you'd like to revive this PR, please reopen it!

@github-actions github-actions bot added the Stale label Dec 28, 2019
@github-actions github-actions bot closed this Dec 29, 2019
@tprabh509
Copy link

Could someone please merge this changes and publish. I am also experiencing same issue connecting to dynamoDB. Do not want default url for endpoint. I want to specify custom URL since my server do not have access to internet

@dongjoon-hyun
Copy link
Member

According to @tprabh509 request, this PR is reopened.

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions
Copy link

github-actions bot commented Oct 6, 2020

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Oct 6, 2020
@github-actions github-actions bot closed this Oct 7, 2020
@calvin-pietersen
Copy link

Could someone please merge this one?

@dongjoon-hyun
Copy link
Member

@calvin-pietersen . I can reopen this for more visibility to this PR but kinesis and dynamoDB are not the resource types I use before.

@SparkQA
Copy link

SparkQA commented Feb 8, 2021

Test build #134994 has finished for PR 24801 at commit a42ce6e.

  • This patch fails R style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 8, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39577/

@SparkQA
Copy link

SparkQA commented Feb 8, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39577/

@calvin-pietersen
Copy link

@etspaceman any plans to try to merge this one?

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label May 21, 2021
@github-actions github-actions bot closed this May 22, 2021
@bkosaraju
Copy link

@etspaceman Can this PR be merged ? Thanks, as this can unblock some of local and integration testing pieces for kinesis streaming.

@etspaceman
Copy link
Contributor Author

I am not able to perform merges @bkosaraju . That would need to be merged by the spark team.

@bkosaraju
Copy link

Hi @dongjoon-hyun , Shall we request from spark team to Merge this PR (if everything is passed from QA side) - Thanks

@anatol-ju
Copy link

Is it possible to opena nd merge this one, please? Would help a lot if we could use localstack with spark. Thanks.

@murrelljenna
Copy link

This pr is incredibly necessary. Where are the maintainers on this?

I'd be happy to verify that this code works by grabbing it locally and using it with my own integration tests.

@murrelljenna
Copy link

murrelljenna commented Mar 18, 2024

@dongjoon-hyun @gaborgsomogyi Who on the spark team can take a look at this? Is there work I can do to help get this moving?

@dongjoon-hyun
Copy link
Member

To @murrelljenna , if you want to revive this, please make a new PR while keeping the original authorship of @etspaceman . That's the best way to get more attentions on this effort at least for Apache Spark 4.0.0.

BTW, I have no environment to verify this PR's contribution. I guess the most community members are in the same situation.

@murrelljenna
Copy link

To @dongjoon-hyun , thanks, working on that now

@dongjoon-hyun
Copy link
Member

FYI, I'm working on #45583 which upgrades to AWS SDK v2.

@murrelljenna
Copy link

Sounds good, I've gotten this PR back up here:

#45619

I haven't touched anything other than removing cloudwatchMetricsLevel, as that has been added in since this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants