Remove legacy code from LogUsedSegments duty by capistrant · Pull Request #10287 · apache/druid

capistrant · 2020-08-14T22:43:28Z

Update on 01/07/20

After more discussion. We have decided to scrap the ability to disable and instead remove the legacy code that emits alerts for segments with no size. This code is thought to be old debugging code that is safe to remove. Now the LogUsedSegments duty gives a helpful log of # of active segments no matter what. and if debug logging is enabled, used segments are logged. This duty has no performance concerns for non-debug clusters meaning there is no reason to add complexity of new config to disable. In future, it may be beneficial depending on future of this duty. But for now it seems like unnecessary code.

Update on 12/21/20

After analysis of the utility of this duty, we are now considering disabling this duty by default and not documenting the config to enable it. With the idea being that we will remove the duty altogether in the future if it proves that nobody questions it being off by default. Because of this plan, I added a Design Review label to the PR. More input on this plan would be helpful.

Description

Allows cluster admin to disable the LogUsedSegments coordinator duty if they so choose. This duty does not have any impact on successive duties so disabling it does not cause any change to the state of the cluster after coordination duties complete.

As an admin for a large enterprise cluster myself, I have been on the hunt for any optimization I can find in the lifecycle of the coordinator. This patch appears to be low hanging fruit in my efforts. This duty takes ~10 seconds to complete on our largest cluster. We have never used debug logging to log our segments and we have never checked the emitters alert stream for alerts on segments with size <0. Therefore, I think it makes sense to give admins the ability to toggle this duty on and off as they see fit.

New Config: druid.coordinator.duties.logUsedSegments.enabled ... Default: true

I added a config to the DruidCoordinatorConfig. The naming scheme of this config can be easily change as the community sees fit. I made my choice in the hope that it is self-explanatory and allows for more future duty specific configs to be added following the same scheme.

This config defaults to having the LogUsedSegments duty enabled (keeping inline with current functionality). When the coordinator is becoming a leader and creating its duties, it consults the config object to find out if the LogUsedSegments duty should be added to the duty lists for historical segment management and indexing.

This PR has:

been self-reviewed.
added documentation for new or modified features or behaviors.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
been tested in a test Druid cluster.

Key changed/added classes in this PR

DruidCoordinatorConfig
DruidCoordinator

capistrant · 2020-08-14T22:46:19Z

Edit: This is resolved

I recently became a committer, but haven't gotten repo access setup yet (If another committer reads this and is willing to guide me on this process, I'm all ears!) so I cannot complete all of the committer steps. For labels, I would add:

Design Review - It adds a config and we may want to discuss naming scheme
Area - Segment Balancing/Coordination

a2l007 · 2020-09-16T21:52:58Z

 |`druid.coordinator.loadqueuepeon.repeatDelay`|The start and repeat delay for the loadqueuepeon, which manages the load and drop of segments.|PT0.050S (50 ms)|
 |`druid.coordinator.asOverlord.enabled`|Boolean value for whether this Coordinator process should act like an Overlord as well. This configuration allows users to simplify a druid cluster by not having to deploy any standalone Overlord processes. If set to true, then Overlord console is available at `http://coordinator-host:port/console.html` and be sure to set `druid.coordinator.asOverlord.overlordService` also. See next.|false|
 |`druid.coordinator.asOverlord.overlordService`| Required, if `druid.coordinator.asOverlord.enabled` is `true`. This must be same value as `druid.service` on standalone Overlord processes and `druid.selectors.indexing.serviceName` on Middle Managers.|NULL|
+|`druid.coordinator.duties.logUsedSegments.enabled`|Boolean value for whether or not the coordinator should execute the `LogUsedSegments` Duty|true|


We could make the property description friendlier by describing what LogUsedSegments does instead of just mentioning it. Also it would be helpful to add a line or two about when should a cluster operator think about disabling this property.

+1 to describe more about the behavior of LogUsedSegments duty. Also I prefer to remove the duties from the config item name - druid.coordinator.logUsedSegments.enabled. duties is a term in the code not in the user facing documentation, and before it was called helpers in the code.

@a2l007 @ArvinZheng thanks for the thoughts. I agree that if duty isn't a user facing word, it should be removed. I also did my best to improve the documentation for end users to better understand what they are disabling if they choose to do so.

…ty-toggle

ArvinZheng

LGTM

ArvinZheng · 2020-10-29T04:31:13Z

 |`druid.coordinator.loadqueuepeon.repeatDelay`|The start and repeat delay for the loadqueuepeon, which manages the load and drop of segments.|PT0.050S (50 ms)|
 |`druid.coordinator.asOverlord.enabled`|Boolean value for whether this Coordinator process should act like an Overlord as well. This configuration allows users to simplify a druid cluster by not having to deploy any standalone Overlord processes. If set to true, then Overlord console is available at `http://coordinator-host:port/console.html` and be sure to set `druid.coordinator.asOverlord.overlordService` also. See next.|false|
 |`druid.coordinator.asOverlord.overlordService`| Required, if `druid.coordinator.asOverlord.enabled` is `true`. This must be same value as `druid.service` on standalone Overlord processes and `druid.selectors.indexing.serviceName` on Middle Managers.|NULL|
+|`druid.coordinator.logUsedSegments.enabled`|Boolean value for whether or not the coordinator should execute the `LogUsedSegments` portion of its coordination work. `LogUsedSegments` is an informational job run by the Coordinator every coordination cycle. It gets a snapshot of segments in the cluster and iterates them. While iterating, it will emit an alert if a segment has a size less than 0. If debug logging is enabled, it will also log a string representation of each segment. Lastly, it logs then number of segments in the cluster. An admin can decide that forgoing this work may advantageous if they don't need any of the information provided.|true|


typo: it logs then number of > it logs the number of

not a comment, just wanted to see if the following sounds good?

Boolean value for whether or not the coordinator should execute the LogUsedSegments portion of its coordination work. LogUsedSegments is an informational job run by the Coordinator every coordination cycle which logs every segment at DEBUG level and the total number of used segments at INFO level, in addition, it emits an alert if a segment has a size less than 0. An admin can decide that forgoing this work may advantageous if they don't need any of the information provided

I like your wording, Arvin. made the changes.

himanshug · 2020-11-24T00:09:58Z

slightly surprised by ~10 secs taken up by ... (snippet from LogUsedSegments.java)

  public DruidCoordinatorRuntimeParams run(DruidCoordinatorRuntimeParams params)
  {
    log.debug("Starting coordination. Getting used segments.");

    DataSourcesSnapshot dataSourcesSnapshot = params.getDataSourcesSnapshot();
    for (DataSegment segment : dataSourcesSnapshot.iterateAllUsedSegmentsInSnapshot()) {
      if (segment.getSize() < 0) {
        log.makeAlert("No size on a segment")
           .addData("segment", segment)
           .emit();
      }
    }

    // Log info about all used segments
    if (log.isDebugEnabled()) {
      log.debug("Used Segments");
      for (DataSegment dataSegment : dataSourcesSnapshot.iterateAllUsedSegmentsInSnapshot()) {
        log.debug("  %s", dataSegment);
      }
    }

    log.info("Found [%,d] used segments.", params.getUsedSegments().size());

    return params;
  }

I am assuming DEBUG is already not enabled, so 10 secs are going in just iterating over the segments and checking the size to print alert.

I wonder if the slowness comes from use of Stream to create a flat list of segments in ...

  public Iterable<DataSegment> iterateAllUsedSegmentsInSnapshot()
  {
    return () -> dataSourcesWithAllUsedSegments
        .values()
        .stream()
        .flatMap(dataSource -> dataSource.getSegments().stream())
        .iterator();
  }

can you try to change the code to iterate over segments using simply nested for loops in LogUsedSegments.java , e.g.

    for (ImmutableDruidDataSource ds : snapshot.getDataSourcesWithAllUsedSegments()) {
      for (DataSegment segment : ds.getSegments()) {
        ......
      }
    }

and see if it still takes ~10 secs ? I am curious because if nested for-loops fix the perf issue then we need to fix that as iterateAllUsedSegmentsInSnapshot() is used in other places as well and could be contributing extra ~10 secs on each of those invocations.

capistrant · 2020-11-24T00:32:36Z

slightly surprised by ~10 secs taken up by ... (snippet from LogUsedSegments.java)
  public DruidCoordinatorRuntimeParams run(DruidCoordinatorRuntimeParams params)
  {
    log.debug("Starting coordination. Getting used segments.");

    DataSourcesSnapshot dataSourcesSnapshot = params.getDataSourcesSnapshot();
    for (DataSegment segment : dataSourcesSnapshot.iterateAllUsedSegmentsInSnapshot()) {
      if (segment.getSize() < 0) {
        log.makeAlert("No size on a segment")
           .addData("segment", segment)
           .emit();
      }
    }

    // Log info about all used segments
    if (log.isDebugEnabled()) {
      log.debug("Used Segments");
      for (DataSegment dataSegment : dataSourcesSnapshot.iterateAllUsedSegmentsInSnapshot()) {
        log.debug("  %s", dataSegment);
      }
    }

    log.info("Found [%,d] used segments.", params.getUsedSegments().size());

    return params;
  }
I am assuming DEBUG is already not enabled, so 10 secs are going in just iterating over the segments and checking the size to print alert.

I wonder if the slowness comes from use of Stream to create a flat list of segments in ...
  public Iterable<DataSegment> iterateAllUsedSegmentsInSnapshot()
  {
    return () -> dataSourcesWithAllUsedSegments
        .values()
        .stream()
        .flatMap(dataSource -> dataSource.getSegments().stream())
        .iterator();
  }
can you try to change the code to iterate over segments using simply nested for loops in LogUsedSegments.java , e.g.
    for (ImmutableDruidDataSource ds : snapshot.getDataSourcesWithAllUsedSegments()) {
      for (DataSegment segment : ds.getSegments()) {
        ......
      }
    }
and see if it still takes ~10 secs ? I am curious because if nested for-loops fix the perf issue then we need to fix that as iterateAllUsedSegmentsInSnapshot() is used in other places as well and could be contributing extra ~10 secs on each of those invocations.

I'm not sure if I will be able to test at same scale as our prod cluster that is taking 10 seconds. but I can try in a lower env that has about 1/2 the segments as prod to see what that looks like. will try tomorrow. Also, the javadoc calls out that this will be knowingly slower than simple iteration of a list. The use of this method is not that wide so it is probably not drastically hurting perf across the coordinator. still worth taking a look at though. ImmutableDruidServer has a similar method ImmutableDruidServer#iterateAllSegments

capistrant · 2020-11-24T16:14:32Z

@himanshug so I added a timer in my smaller environment and see that it is very fast to run LogUsedSegments as it is written today for cluster that has ~300 datasources and ~150k used segments
Timer readings with existing code (millis): 20, 20, 18, 17, 18, 17, 18, 25
Timer readings with updated nested for loop (millis): 17, 5, 12, 3, 2, 2, 2, 2

My prod cluster is quite a bit larger. over 1k datasources and over 1MM segments. But I am not going to be able to add the timer there or test the nested for loop any time soon because we are in a change freeze until the new year.

there does seem to be evidence of a good speedup in the smaller scale test I did. not sure if you think it is worth opening separate issue/PR to address the usages of the existing stream approach. But the question is, how many clusters operate at a scale where the increased performance is worth getting read of that nifty utility method

himanshug · 2020-11-25T00:38:31Z

@capistrant thanks for the test. this is still surprising .. I did a quick benchmark (see #10604 ) and the iteration looks very fast (relative to ~10 sec) with streams and for-loops both even for 1000 dataSources and 2000 segments each i.e. 2mn segments overall .
where did you get the ~10 sec number from originally ?

capistrant · 2020-11-25T19:37:08Z

@capistrant thanks for the test. this is still surprising .. I did a quick benchmark (see #10604 ) and the iteration looks very fast (relative to ~10 sec) with streams and for-loops both even for 1000 dataSources and 2000 segments each i.e. 2mn segments overall .
where did you get the ~10 sec number from originally ?

Our estimates were from wall clock time looking at logs. But I admit it is pretty hand wavy and glosses over some facts.

EmitClusterStatsAndMetrics logs out some stuff at the end of its run. We then have our configured 30 second backoff time. Then we execute the historical management duties runnable again and the first duty is LogUsedSegment and it logs when it finishes.

so if we have these two wall clock values

2020-11-25T18:05:42,18
2020-11-25T18:06:33,42

you can say there was 11 seconds between the end of the backoff time and the completion of the first duty. But this neglects all of the stuff in DutiesRunnable#run() before we start running duties as well as any discrepancy in the amount of time that is actually backed off for between the end of one run and the next.

capistrant · 2020-12-18T17:21:43Z

@himanshug

circling back on this. my cluster with 161k segments takes ~20ms to execute LogUsedSegments. This duty should scale fairly linearly with number of segments so you'd have to 50x that number of segments to hit 1 second if that is true. I really look forward to when I can deploy #10603 in my larger production environment to confirm. I should be able to do this in mid-Jan.

big scheme of things, do you think this PR should be closed with reasoning of that it is not worth adding a config for such small savings? I'd like to tie up the loose end in my WIP tracking by either going forward with this or closing it out. I won't be upset if it is rejected, but I'd probably still enable this in my production environment just because we find zero value in the duty.

himanshug · 2020-12-19T22:57:08Z

real numbers reported from #10603 might make this more obvious eventually.

That said, On the surface, I do agree with your assessment of the utility of LogUsedSegments and understand why you would wanna remove it. That check appears to have existed since the beginning of [druid] time :) and not sure what original motivation was ... maybe there were bugs at the time and adding this check uncovered those and now a days it doesn't flag anything.

I would set druid.coordinator.logUsedSegments.enabled default value to false and leave it undocumented for now.

if, after a bunch of releases, nobody notices it then maybe remove that code altogether.

capistrant · 2020-12-21T17:02:11Z

real numbers reported from #10603 might make this more obvious eventually.

That said, On the surface, I do agree with your assessment of the utility of LogUsedSegments and understand why you would wanna remove it. That check appears to have existed since the beginning of [druid] time :) and not sure what original motivation was ... maybe there were bugs at the time and adding this check uncovered those and now a days it doesn't flag anything.

I would set druid.coordinator.logUsedSegments.enabled default value to false and leave it undocumented for now.

if, after a bunch of releases, nobody notices it then maybe remove that code altogether.

Hmmm. If we are going to go with a disabled and undocumented implementation, I will add a design review label to make sure we get +1 from an extra committer. I do slightly question the removal because the debug logging may actually be useful for some people.

jihoonson

LGTM, but please fix the documentation before merge.

jihoonson · 2021-01-05T02:05:37Z

 |`druid.coordinator.loadqueuepeon.repeatDelay`|The start and repeat delay for the loadqueuepeon, which manages the load and drop of segments.|PT0.050S (50 ms)|
 |`druid.coordinator.asOverlord.enabled`|Boolean value for whether this Coordinator process should act like an Overlord as well. This configuration allows users to simplify a druid cluster by not having to deploy any standalone Overlord processes. If set to true, then Overlord console is available at `http://coordinator-host:port/console.html` and be sure to set `druid.coordinator.asOverlord.overlordService` also. See next.|false|
 |`druid.coordinator.asOverlord.overlordService`| Required, if `druid.coordinator.asOverlord.enabled` is `true`. This must be same value as `druid.service` on standalone Overlord processes and `druid.selectors.indexing.serviceName` on Middle Managers.|NULL|
+|`druid.coordinator.logUsedSegments.enabled`|Boolean value for whether or not the coordinator should execute the `LogUsedSegments` portion of its coordination work. `LogUsedSegments` is an informational job run by the Coordinator every coordination cycle which logs every segment at DEBUG level and the total number of used segments at INFO level. In addition to these logs, it emits an alert if a segment has a size less than 0. An admin can decide that forgoing this work may advantageous if they don't need any of the information provided.|


The default value column is missing.

Also, did you want to not document it?

well @himanshug had offered up the idea of not documenting it and setting it to true to phase this duty out all together. I'm receptive to that idea and can implement it but a part of me wants to avoid that since the logging of each segment at debug level could actually be useful to people out in the wild.

for now, I fixed this default value. let me know what you think is best for documentation/default value.

If I had to vote, I would keep my implementation because it may actually be useful. The only thing I might change now or in the future is to default it to being off since it seems like a bit more of a debug duty.

jihoonson · 2021-01-05T02:07:39Z

+        "Done making historical management duties %s",
+        duties.stream().map(duty -> duty.getClass().getName()).collect(Collectors.toList())
    );
+    return duties;


nit: Collections.unmodifiableList(duties)

reverting to the original way of doing this now that LogUsedSegments is always used again

gianm

I like this log message from LogUsedSegments and have found it very useful:

log.info("Found [%,d] used segments.", params.getUsedSegments().size());

It's nearly free, since params.getUsedSegments() is something that's just passed into the method and doesn't need to be computed. Instead of a config to disable the duty, how about about removing this block and leaving everything else the same:

    DataSourcesSnapshot dataSourcesSnapshot = params.getDataSourcesSnapshot();
    for (DataSegment segment : dataSourcesSnapshot.iterateAllUsedSegmentsInSnapshot()) {
      if (segment.getSize() < 0) {
        log.makeAlert("No size on a segment")
           .addData("segment", segment)
           .emit();
      }
    }

I'm pretty sure this is ancient debugging code and we can safely get rid of it.

The other potentially expensive part is skipped if log level is INFO or higher. (Which is the default, so it shouldn't be a problem.)

gianm · 2021-01-08T08:48:20Z

circling back on this. my cluster with 161k segments takes ~20ms to execute LogUsedSegments. This duty should scale fairly linearly with number of segments so you'd have to 50x that number of segments to hit 1 second if that is true. I really look forward to when I can deploy #10603 in my larger production environment to confirm. I should be able to do this in mid-Jan.

You could also use sjk to grab a flame graph of a couple of coordinator run cycles (maybe let the tool run for 5 minutes). It would be illuminating as to exactly where the time is going. Some instructions here: https://support.imply.io/hc/en-us/articles/360033747953-Profiling-Druid-queries-using-flame-graphs

(That's written with profiling queries in mind, but the technique would work just as well for profiling anything.)

capistrant · 2021-01-08T18:07:55Z

I like this log message from LogUsedSegments and have found it very useful:
log.info("Found [%,d] used segments.", params.getUsedSegments().size());
It's nearly free, since params.getUsedSegments() is something that's just passed into the method and doesn't need to be computed. Instead of a config to disable the duty, how about about removing this block and leaving everything else the same:
    DataSourcesSnapshot dataSourcesSnapshot = params.getDataSourcesSnapshot();
    for (DataSegment segment : dataSourcesSnapshot.iterateAllUsedSegmentsInSnapshot()) {
      if (segment.getSize() < 0) {
        log.makeAlert("No size on a segment")
           .addData("segment", segment)
           .emit();
      }
    }
I'm pretty sure this is ancient debugging code and we can safely get rid of it.

The other potentially expensive part is skipped if log level is INFO or higher. (Which is the default, so it shouldn't be a problem.)

I think I like this idea the most considering everything we have learned since I created the PR. I am going to amend the PR to remove the configuration to skip the duty. Instead we will remove this legacy code and leave everything else as is. Maybe in the future this duty will change and become expensive resulting in having the ability to skip become beneficial. but lets not put the cart ahead of the horse and over complicate things today.

…s the same

gianm

LGTM!

jihoonson

LGTM. Thanks @capistrant!

* allow the LogUsedSegments duty to be skippped * Fixes for TravisCI coverage checks and documentation spell checking * prameterize DruidCoordinatorTest in order to achieve coverage * update config name to remove duty ref and improve documentation * refine documentation for new config with reviewer advice * add default column to docs for new config * remove legacy code in LogUsedSegments and remove config to disbale duty * fix makeHistoricalMangementDuties now that the returned list is always the same

allow the LogUsedSegments duty to be skippped

2389128

Lucas Capistrant added 2 commits August 15, 2020 16:56

Fixes for TravisCI coverage checks and documentation spell checking

4b9a54b

prameterize DruidCoordinatorTest in order to achieve coverage

e16219b

clintropolis added the Area - Segment Balancing/Coordination label Aug 17, 2020

capistrant mentioned this pull request Sep 16, 2020

recreate the balancer executor only when needed #10280

Merged

a2l007 reviewed Sep 16, 2020

View reviewed changes

Lucas.Capistrant added 2 commits October 28, 2020 16:46

Merge remote-tracking branch 'upstream/master' into logUsedSegmentsDu…

71e7aba

…ty-toggle

update config name to remove duty ref and improve documentation

d6cf0ec

ArvinZheng approved these changes Oct 29, 2020

View reviewed changes

refine documentation for new config with reviewer advice

60f2067

himanshug mentioned this pull request Nov 25, 2020

DataSourcesSnapshotBenchmark to measure iterateAllUsedSegmentsInSnapshot perf #10604

Merged

capistrant added the Design Review label Dec 21, 2020

jihoonson approved these changes Jan 5, 2021

View reviewed changes

Lucas.Capistrant added 2 commits January 5, 2021 16:08

add default column to docs for new config

20bd6e2

Merge branch 'master' into logUsedSegmentsDuty-toggle

1bf750d

gianm reviewed Jan 8, 2021

View reviewed changes

Lucas.Capistrant added 2 commits January 8, 2021 12:24

remove legacy code in LogUsedSegments and remove config to disbale duty

e77b320

fix makeHistoricalMangementDuties now that the returned list is alway…

c6d3ddf

…s the same

capistrant changed the title ~~allow the LogUsedSegments duty to be skipped~~ Remove legacy code from LogUsedSegments duty Jan 8, 2021

gianm approved these changes Jan 8, 2021

View reviewed changes

jihoonson approved these changes Jan 12, 2021

View reviewed changes

jihoonson merged commit aecc9e5 into apache:master Jan 12, 2021

jihoonson added this to the 0.21.0 milestone Jul 15, 2021

Conversation

capistrant commented Aug 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Update on 01/07/20

Update on 12/21/20

Description

Key changed/added classes in this PR

Uh oh!

capistrant commented Aug 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Edit: This is resolved

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArvinZheng left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

himanshug commented Nov 24, 2020

Uh oh!

capistrant commented Nov 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

capistrant commented Nov 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

himanshug commented Nov 25, 2020

Uh oh!

capistrant commented Nov 25, 2020

Uh oh!

capistrant commented Dec 18, 2020

Uh oh!

himanshug commented Dec 19, 2020

Uh oh!

capistrant commented Dec 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jihoonson left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gianm left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gianm commented Jan 8, 2021

Uh oh!

capistrant commented Jan 8, 2021

Uh oh!

gianm left a comment

Choose a reason for hiding this comment

Uh oh!

jihoonson left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

capistrant commented Aug 14, 2020 •

edited

Loading

capistrant commented Aug 14, 2020 •

edited

Loading

capistrant commented Nov 24, 2020 •

edited

Loading

capistrant commented Nov 24, 2020 •

edited

Loading

capistrant commented Dec 21, 2020 •

edited

Loading

gianm left a comment •

edited

Loading

jihoonson left a comment •

edited

Loading