From b1e34550d51606f1476d8e31bf6511bfd6f9a5d4 Mon Sep 17 00:00:00 2001 From: Surekha Saharan Date: Thu, 7 Feb 2019 12:06:03 -0800 Subject: [PATCH 1/4] add doc --- docs/content/querying/sql.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/content/querying/sql.md b/docs/content/querying/sql.md index 3f274918569d..0e4b57910d2e 100644 --- a/docs/content/querying/sql.md +++ b/docs/content/querying/sql.md @@ -570,6 +570,7 @@ The "sys" schema provides visibility into Druid segments, servers and tasks. ### SEGMENTS table Segments table provides details on all Druid segments, whether they are published yet or not. +Note that if a segment is served by more than one realtime tasks(multiple realtime replicas), then the results may vary between the sys.segments queries for columns such as `size`, `num_rows` etc., until the segment is served by a historical eventually. |Column|Notes| From 9672aea761a9ce3a8b07a08d6178ff696993abb5 Mon Sep 17 00:00:00 2001 From: Surekha Saharan Date: Thu, 7 Feb 2019 16:29:20 -0800 Subject: [PATCH 2/4] change docs --- docs/content/querying/sql.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/content/querying/sql.md b/docs/content/querying/sql.md index 0e4b57910d2e..04548561b728 100644 --- a/docs/content/querying/sql.md +++ b/docs/content/querying/sql.md @@ -570,8 +570,9 @@ The "sys" schema provides visibility into Druid segments, servers and tasks. ### SEGMENTS table Segments table provides details on all Druid segments, whether they are published yet or not. -Note that if a segment is served by more than one realtime tasks(multiple realtime replicas), then the results may vary between the sys.segments queries for columns such as `size`, `num_rows` etc., until the segment is served by a historical eventually. +#### CAVEAT +Note that a segment can be served by more than one realtime or historical servers, in that case it would have multiple replicas. These replicas are weakly consistent with each other when served by multiple realtime tasks, until a segment is eventually served by a historical, at that point the segment is immutable. And broker prefers to query a segment from historical over realtime server. But if a segment has multiple realtime replicas, for eg. kafka index tasks, and one task is slower than other, then the sys.segments query results can vary for the duration of the tasks. The columns of segments table that can have inconsistent values during this period include `size`, `num_replicas`, `num_rows`. |Column|Notes| |------|-----| From 2cdbb1d9f4a25ded3cf4d932cf4e2fc7fe1fc113 Mon Sep 17 00:00:00 2001 From: Surekha Saharan Date: Wed, 13 Feb 2019 16:23:52 -0800 Subject: [PATCH 3/4] PR comments --- docs/content/querying/sql.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/content/querying/sql.md b/docs/content/querying/sql.md index 04548561b728..dd85560ee953 100644 --- a/docs/content/querying/sql.md +++ b/docs/content/querying/sql.md @@ -572,7 +572,7 @@ The "sys" schema provides visibility into Druid segments, servers and tasks. Segments table provides details on all Druid segments, whether they are published yet or not. #### CAVEAT -Note that a segment can be served by more than one realtime or historical servers, in that case it would have multiple replicas. These replicas are weakly consistent with each other when served by multiple realtime tasks, until a segment is eventually served by a historical, at that point the segment is immutable. And broker prefers to query a segment from historical over realtime server. But if a segment has multiple realtime replicas, for eg. kafka index tasks, and one task is slower than other, then the sys.segments query results can vary for the duration of the tasks. The columns of segments table that can have inconsistent values during this period include `size`, `num_replicas`, `num_rows`. +Note that a segment can be served by more than one stream ingestion tasks or Historical processes, in that case it would have multiple replicas. These replicas are weakly consistent with each other when served by multiple ingestion tasks, until a segment is eventually served by a Historical, at that point the segment is immutable. Broker prefers to query a segment from Historical over a ingestion task. But if a segment has multiple realtime replicas, for eg. kafka index tasks, and one task is slower than other, then the sys.segments query results can vary for the duration of the tasks because only one of the ingestion tasks is queried by the Broker and it is not gauranteed that the same task gets picked everytime. The columns of segments table that can have inconsistent values during this period include `num_replicas` and `num_rows`. There is an open [issue](https://github.com/apache/incubator-druid/issues/5915) about this inconsistency with stream ingestion tasks. |Column|Notes| |------|-----| From bb55eb9c91bc6a417462cfc5ebe1e2cc5e77d21b Mon Sep 17 00:00:00 2001 From: Surekha Saharan Date: Wed, 13 Feb 2019 21:38:06 -0800 Subject: [PATCH 4/4] few more changes --- docs/content/querying/sql.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/content/querying/sql.md b/docs/content/querying/sql.md index dd85560ee953..883904ced99e 100644 --- a/docs/content/querying/sql.md +++ b/docs/content/querying/sql.md @@ -572,7 +572,7 @@ The "sys" schema provides visibility into Druid segments, servers and tasks. Segments table provides details on all Druid segments, whether they are published yet or not. #### CAVEAT -Note that a segment can be served by more than one stream ingestion tasks or Historical processes, in that case it would have multiple replicas. These replicas are weakly consistent with each other when served by multiple ingestion tasks, until a segment is eventually served by a Historical, at that point the segment is immutable. Broker prefers to query a segment from Historical over a ingestion task. But if a segment has multiple realtime replicas, for eg. kafka index tasks, and one task is slower than other, then the sys.segments query results can vary for the duration of the tasks because only one of the ingestion tasks is queried by the Broker and it is not gauranteed that the same task gets picked everytime. The columns of segments table that can have inconsistent values during this period include `num_replicas` and `num_rows`. There is an open [issue](https://github.com/apache/incubator-druid/issues/5915) about this inconsistency with stream ingestion tasks. +Note that a segment can be served by more than one stream ingestion tasks or Historical processes, in that case it would have multiple replicas. These replicas are weakly consistent with each other when served by multiple ingestion tasks, until a segment is eventually served by a Historical, at that point the segment is immutable. Broker prefers to query a segment from Historical over an ingestion task. But if a segment has multiple realtime replicas, for eg. kafka index tasks, and one task is slower than other, then the sys.segments query results can vary for the duration of the tasks because only one of the ingestion tasks is queried by the Broker and it is not gauranteed that the same task gets picked everytime. The `num_rows` column of segments table can have inconsistent values during this period. There is an open [issue](https://github.com/apache/incubator-druid/issues/5915) about this inconsistency with stream ingestion tasks. |Column|Notes| |------|-----|