Completely rework the Druid getting started process#2216
Conversation
|
I love this one. 👍 |
|
@fjy could you gzip |
|
@himanshug @navis @gianm @pjain1 added clustering docs. More changes to come. |
|
@himanshug addressed comments |
There was a problem hiding this comment.
I don't see why there's a $ in front of zookeeper-3.4.6.tar.gz on this line and the one before.
There was a problem hiding this comment.
I guess most people trying this will know to put these each in background or run each in a different window or whatever, but it's tempting to cut/paste this whole thing to execute...
|
@rasahner addressed comments |
|
👍 |
There was a problem hiding this comment.
The first part of the "also" isn't really an "also" - necessary re-ingestion because of possible errors is exactly what has been being discussed. I'd replace both sentences with something like
"Hybrid streaming also gives you the option to re-ingest your data if you needed to revise it for any reason."
|
+1 when author thinks it is ready. |
There was a problem hiding this comment.
Sorry if my comments were confusing. Here's my recommended text for this whole section. I think it's not necessary to say anything right here about queries not caring how the data was ingested - it potentially adds more confusion than it takes away.
You can combine batch and streaming methods in a hybrid batch/streaming architecture. In a hybrid architecture, you use a streaming method to do initial ingestion, and then periodically re-ingest older data in batch mode (typically every few hours, or nightly). When Druid re-ingests data for a time range, the new data automatically replaces the data from the earlier ingestion.
All streaming ingestion methods currently supported by Druid do introduce the possibility of dropped or duplicated messages in certain failure scenarios, and batch re-ingestion eliminates this potential source of error for historical data.
Batch re-ingestion also gives you the option to re-ingest your data if you needed to revise it for any reason.
|
I have no other comments. |
f82e1c7 to
067bfda
Compare
Completely rework the Druid getting started process
This PR depends on some CSS changes to Druid docs which are coming in a separate PR. The updated pages will not render correctly without those changes.
This will rework the Druid getting started process to be very similar to Imply's recommended getting started process, which was mostly written by @gianm . The packaging of Druid will also be similar to what Imply is doing.