Change default segment loading to http by Caroline1000 · Pull Request #11760 · apache/druid

Caroline1000 · 2021-09-29T23:38:18Z

Description

We have observed more stability with http segment loading than curator segment loading in production clusters. For example, we have observed that problems with zookeeper can lead to the inability to query realtime data.

This PR has:

[X ] added documentation for new or modified features or behaviors.
[X ] been tested in a test Druid cluster.

kfaraz

Thank you for the PR, @Caroline1000 !
Please add a description that explains the change and the reasons involved and update DruidCoordinatorConfigTest to verify the new default value.

samarthjain · 2021-11-03T23:57:21Z

Is HTTP based loading ready for prime time? I am curious about any at-scale testing that has been done to verify HTTP based loading is performing as expected. Also, whether all major functional issues with it are fixed before we make it the default. I see at least one open bug right now.

Caroline1000 · 2022-01-21T23:54:00Z

@samarthjain +1 on fixing #11717. If I'm not mistaken, that issue was first observed when multiple load rules were changed across different tiers, so hopefully that makes the bug less likely to run into(?)

fwiw, I have seen http segment loading work without issue in many production environments (and actually have seen many problems related to curator loading)

stale · 2022-04-16T12:55:55Z

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

didip · 2022-06-18T14:19:48Z

I am not sure if http is ready for prime time, the problem with http arises when jetty http server runs low on threads.

stale · 2022-06-18T14:19:52Z

This issue is no longer marked as stale.

imply-cheddar · 2022-06-21T22:28:59Z

ZK segment loading is broken right now. As of ~2 years ago, a PR was merged that breaks the order of segment loading and dropping via ZK, such that the assignment can enter into deadlocks when a cluster is mostly full. This wasn't widely an issue (personally, I only learned about it ~6 months ago) because the largest clusters (at least that I'm aware of) have all been using http segment assignment.

#11717 has been merged. While it is and was a bug, it was a corner case that we've only seen in development environments and never actually saw it in a production environment. Every cluster I touch, I move from ZK assignment to HTTP assignment because my experience is that HTTP assignment is more stable. I'm +1 on this directionally, but the PR does need the tests fixed as Kashif suggested before it can be approved.

didip · 2022-06-23T14:28:46Z

Also, anyone else have this problem with http loading where Coordinator somehow cached the old Historical's IP addresses?

We saw this often in our Kubernetes deployments.

kfaraz · 2022-06-24T06:32:38Z

@didip , please create an issue for the http loading discrepancies if you have been facing them recently.

capistrant · 2022-08-08T20:53:06Z

I am not sure if http is ready for prime time, the problem with http arises when jetty http server runs low on threads.

@didip

Isn't this addressed with a combination of:

using async IO on historical side to avoid holding threads while work is being done
the explicit recommendation of setting jetty thread count above aggregate count of connections from query servers (broker) to avoid queries consuming all jetty threads

Or are you saying there is a risk of exhaustion on the outgoing side from the coordinator?

Caroline1000 · 2022-12-02T18:50:18Z

closing now that #13092 has been merged

change default segment loading to http

c2c482d

clintropolis added Area - Segment Balancing/Coordination Design Review Release Notes labels Sep 30, 2021

kfaraz reviewed Oct 6, 2021

View reviewed changes

add to DruidCoordinatorConfigTest

1430094

stale Bot added the stale label Apr 16, 2022

stale Bot removed the stale label Jun 18, 2022

Merge branch 'master' into segment-loading20210929

ae195f6

Caroline1000 closed this Dec 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change default segment loading to http#11760

Change default segment loading to http#11760
Caroline1000 wants to merge 3 commits intoapache:masterfrom
Caroline1000:segment-loading20210929

Caroline1000 commented Sep 29, 2021 •

edited

Loading

Uh oh!

kfaraz left a comment

Uh oh!

samarthjain commented Nov 3, 2021

Uh oh!

Caroline1000 commented Jan 21, 2022

Uh oh!

stale Bot commented Apr 16, 2022

Uh oh!

didip commented Jun 18, 2022

Uh oh!

stale Bot commented Jun 18, 2022

Uh oh!

imply-cheddar commented Jun 21, 2022

Uh oh!

didip commented Jun 23, 2022

Uh oh!

kfaraz commented Jun 24, 2022

Uh oh!

capistrant commented Aug 8, 2022

Uh oh!

Caroline1000 commented Dec 2, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

Caroline1000 commented Sep 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

kfaraz left a comment

Choose a reason for hiding this comment

Uh oh!

samarthjain commented Nov 3, 2021

Uh oh!

Caroline1000 commented Jan 21, 2022

Uh oh!

stale Bot commented Apr 16, 2022

Uh oh!

didip commented Jun 18, 2022

Uh oh!

stale Bot commented Jun 18, 2022

Uh oh!

imply-cheddar commented Jun 21, 2022

Uh oh!

didip commented Jun 23, 2022

Uh oh!

kfaraz commented Jun 24, 2022

Uh oh!

capistrant commented Aug 8, 2022

Uh oh!

Caroline1000 commented Dec 2, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Caroline1000 commented Sep 29, 2021 •

edited

Loading