Skip to content

Web console, adding Apache Kafka and AWS Kinesis to the data loader#7643

Merged
clintropolis merged 7 commits intoapache:masterfrom
implydata:streaming-data-loader
May 17, 2019
Merged

Web console, adding Apache Kafka and AWS Kinesis to the data loader#7643
clintropolis merged 7 commits intoapache:masterfrom
implydata:streaming-data-loader

Conversation

@vogievetsky
Copy link
Copy Markdown
Contributor

This PR is part of #7502 and the UI counterpart for (and depends on) #7566

Adding the flow for the streaming datasources:

image

@fjy fjy modified the milestone: 0.15.0 May 13, 2019
@fjy fjy added this to the 0.16.0 milestone May 14, 2019
@vogievetsky vogievetsky mentioned this pull request May 16, 2019
@vogievetsky
Copy link
Copy Markdown
Contributor Author

If you want to have a look at this in action: https://youtu.be/tAEp5BXVHYE

Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm 👍

To test I pulled branch the and merged in #7566 and ran through loader with local running kafka broker, started a supervisor, got data* 🤘

One thing that I'm not certain is intended or not, useEarliestOffset: true is set by default on the sampler, but not in the final spec unless you hit the option. So out of the box, for a new supervisor, the supervisor spec you post will start from the end offset of the stream instead of the beginning like the data you sampled to create your spec with. I'm not sure if this is a big deal, so approving anyway, but maybe it would be nice to detect if the supervisor already exists or not, and if not start from the beginning?

@vogievetsky
Copy link
Copy Markdown
Contributor Author

That is a good point about useEarliestOffset. I was going with sticking to the defaults but will re-evaluate there. I think that should not be a merge blocker for this.

@clintropolis clintropolis merged commit be16e4a into apache:master May 17, 2019
@clintropolis clintropolis deleted the streaming-data-loader branch May 17, 2019 21:02
jihoonson pushed a commit to implydata/druid-public that referenced this pull request Jun 26, 2019
…pache#7643)

* adding kafka and kinesis to the data loader

* feature detect

* copy fixes

* wording fixes

* added missing spec type

* increase timeout

* Call it Google Cloud Storage
gianm pushed a commit to implydata/druid-public that referenced this pull request Jul 4, 2019
…pache#7643)

* adding kafka and kinesis to the data loader

* feature detect

* copy fixes

* wording fixes

* added missing spec type

* increase timeout

* Call it Google Cloud Storage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants