The Size reported by a historical when looking on the coordinator at /druid/coordinator/v1/servers does not match the metrics reported by historicals as the sum of the historical metric segment/used. The number reported by the coordinator is MUCH lower. Historical servers which are >99% full are reported by the coordinator as only 91% full! I have confirmed the sum of segment/used as reported by the historicals is the correct on-disk size of the segment data on the historical nodes.
This really screws with capacity planning. One side effect is that the historicals will throw Exception loading segment ... too large for storage and fail to load the segment on that coordinator balancing round. This is particularly harmful when it happens during handoff, because the resources used by realtime indexing tasks cannot be freed!
The view kept by the coordinator regarding sizes on a historical node should be eventually consistent with the data emitted by the historical node itself.
The Size reported by a historical when looking on the coordinator at
/druid/coordinator/v1/serversdoes not match the metrics reported by historicals as the sum of the historical metricsegment/used. The number reported by the coordinator is MUCH lower. Historical servers which are >99% full are reported by the coordinator as only 91% full! I have confirmed the sum ofsegment/usedas reported by the historicals is the correct on-disk size of the segment data on the historical nodes.This really screws with capacity planning. One side effect is that the historicals will throw
Exception loading segment...too large for storageand fail to load the segment on that coordinator balancing round. This is particularly harmful when it happens during handoff, because the resources used by realtime indexing tasks cannot be freed!The view kept by the coordinator regarding sizes on a historical node should be eventually consistent with the data emitted by the historical node itself.