add note on consistency of results for sys.segments queries#7034
add note on consistency of results for sys.segments queries#7034jihoonson merged 7 commits intoapache:masterfrom
Conversation
|
|
||
| ### SEGMENTS table | ||
| Segments table provides details on all Druid segments, whether they are published yet or not. | ||
| Note that if a segment is served by more than one realtime tasks(multiple realtime replicas), then the results may vary between the sys.segments queries for columns such as `size`, `num_rows` etc., until the segment is served by a historical eventually. |
There was a problem hiding this comment.
There should be a space between tasks and (multiple
|
|
||
| ### SEGMENTS table | ||
| Segments table provides details on all Druid segments, whether they are published yet or not. | ||
| Note that if a segment is served by more than one realtime tasks(multiple realtime replicas), then the results may vary between the sys.segments queries for columns such as `size`, `num_rows` etc., until the segment is served by a historical eventually. |
There was a problem hiding this comment.
The purpose of this note is to make people less confused, and thus it should be detailed as much as possible.
Please add more details about when this can happen and why, and what columns can vary. I think it's worth to add a new section for this caveat.
There was a problem hiding this comment.
Sometimes more details can be more confusing :)). Tried to add more details, let me know if it's less confusing. Not sure if it needs it's own section and what should be the title of that section. Added a caveat subheading.
| Segments table provides details on all Druid segments, whether they are published yet or not. | ||
|
|
||
| #### CAVEAT | ||
| Note that a segment can be served by more than one realtime or historical servers, in that case it would have multiple replicas. These replicas are weakly consistent with each other when served by multiple realtime tasks, until a segment is eventually served by a historical, at that point the segment is immutable. And broker prefers to query a segment from historical over realtime server. But if a segment has multiple realtime replicas, for eg. kafka index tasks, and one task is slower than other, then the sys.segments query results can vary for the duration of the tasks. The columns of segments table that can have inconsistent values during this period include `size`, `num_replicas`, `num_rows`. |
There was a problem hiding this comment.
There are no such things are realtime or historical servers
There was a problem hiding this comment.
- please ensure consistent capitalization for Historicals, Brokers, etc
There was a problem hiding this comment.
There are no such things are realtime or historical servers
I see mention of Historical Node, Real-time Node in docs. So what should I write historical node ? process ?
There was a problem hiding this comment.
IMO, "Historical process" and "stream ingestion tasks"
There was a problem hiding this comment.
corrected the capitalization and changed to correct terminology
| Segments table provides details on all Druid segments, whether they are published yet or not. | ||
|
|
||
| #### CAVEAT | ||
| Note that a segment can be served by more than one realtime or historical servers, in that case it would have multiple replicas. These replicas are weakly consistent with each other when served by multiple realtime tasks, until a segment is eventually served by a historical, at that point the segment is immutable. And broker prefers to query a segment from historical over realtime server. But if a segment has multiple realtime replicas, for eg. kafka index tasks, and one task is slower than other, then the sys.segments query results can vary for the duration of the tasks. The columns of segments table that can have inconsistent values during this period include `size`, `num_replicas`, `num_rows`. |
There was a problem hiding this comment.
Would you explain why size and num_replica vary? It looks that they are not getting from segmentMetadataQuery.
There was a problem hiding this comment.
Please add why this happens. The root cause is that system schema uses segmentMetadatQuery to retrieve some information, and the broker randomly picks one of the realtime tasks for query processing if there's no published segments, and thus it's not guaranteed that the same task serves segmentMetadataQuery every time.
I think it's worth to link #5915 here too.
There was a problem hiding this comment.
hmm, should there be mention of segmentMetadatQuery and RandomServerSelectorStrategy in user facing docs. I tried to explain without adding internal code details. I feel such details should be in github issues or in javadocs. And do we generally link to github issues in user documentation, are there any similar examples in druid docs?
There was a problem hiding this comment.
SegmentMetdataQuery is a documented query type (http://druid.io/docs/latest/querying/segmentmetadataquery.html). I don't think it's worth to mention the class name of RandomServerSelectorStrategy but the configuration for it is also documented (http://druid.io/docs/latest/configuration/index.html#query-prioritization).
Well, but my above comment about random selection may not be appropriate because it can give a wrong intuition to users. Probably better to not say about random selection at all. But, I think it's still needed to say about only one of the realtime tasks is selected if multiple replicas are running.
There was a problem hiding this comment.
And do we generally link to github issues in user documentation, are there any similar examples in druid docs?
Why not? Here're some examples: https://cse.google.com/cse?cx=000162378814775985090%3Amolvbm0vggm&q=github&oq=github&gs_l=partner-generic.3...1401.2048.0.2184.6.6.0.0.0.0.102.536.5j1.6.0.gsnos%2Cn%3D13...0.606j90652j6...1.34.partner-generic..5.1.102.mApbmyfw_Jw.
There was a problem hiding this comment.
Would you explain why
sizeandnum_replicavary? It looks that they are not getting from segmentMetadataQuery.
I think size would not vary between ingestion tasks, since they all would show 0, but it can vary if a segment is queried from Historical vs realtime task. But given that, Broker prefers Historical, may be size is not an issue. For num_replica, it can change if a segment gets added or removed from TimelineServerView.TimelineCallback in DruidSchema, and it's value can vary between the queries.
There was a problem hiding this comment.
Hmm. For num_replica, it sounds like it's a valid result because it reflects the changes which actually happened. I think it's different from varying num_rows and doesn't have to be noted here.
There was a problem hiding this comment.
In that case, it seems num_rows is the only col affected.
jihoonson
left a comment
There was a problem hiding this comment.
Thanks for the update!
| Segments table provides details on all Druid segments, whether they are published yet or not. | ||
|
|
||
| #### CAVEAT | ||
| Note that a segment can be served by more than one stream ingestion tasks or Historical processes, in that case it would have multiple replicas. These replicas are weakly consistent with each other when served by multiple ingestion tasks, until a segment is eventually served by a Historical, at that point the segment is immutable. Broker prefers to query a segment from Historical over a ingestion task. But if a segment has multiple realtime replicas, for eg. kafka index tasks, and one task is slower than other, then the sys.segments query results can vary for the duration of the tasks because only one of the ingestion tasks is queried by the Broker and it is not gauranteed that the same task gets picked everytime. The columns of segments table that can have inconsistent values during this period include `num_replicas` and `num_rows`. There is an open [issue](https://github.com/apache/incubator-druid/issues/5915) about this inconsistency with stream ingestion tasks. |
There was a problem hiding this comment.
a ingestion task -> an ingestion task.
|
@surekhasaharan thanks! LGTM. |
For the
sys.segmentsqueries, it seems broker randomly chooses one of the replicas, so if there are more than one replica for a segment, then the fields likesizenum_rowsetc. can have different values based on which realtime replica, the broker queries. The results will be eventually consistent once, the segment is served by a historical server.Adding this note to the docs. This may not be a problem once this issue is addressed.