Skip to content

add config to optionally disable all compression in intermediate segment persists while ingestion#7919

Merged
himanshug merged 8 commits intoapache:masterfrom
himanshug:updates
Jul 10, 2019
Merged

add config to optionally disable all compression in intermediate segment persists while ingestion#7919
himanshug merged 8 commits intoapache:masterfrom
himanshug:updates

Conversation

@himanshug
Copy link
Copy Markdown
Contributor

@himanshug himanshug commented Jun 18, 2019

Motivation:
All Druid ingestion tasks consume excessive amount of off-heap memory during final merge of intermediate persisted segments. A good fraction of that memory goes towards holding many 64K decompression buffers, recently reported in #7824 and I have seen this behavior on many different clusters in the past. Most of these 64K decompression buffers can be avoided if compression was disabled on intermediate persisted segments which is not really required on those intermediate segments but on the final segment that gets pushed.

Changes Made:
This patch disables compression on intermediate persisted segments for all ingestion tasks. However, in case this causes problems, a configuration "indexSpecForIntermediatePersists" could be used inside the task tuningConfig to revert the behavior on specific clusters.
This config is intentionally undocumented to not complicate user documentation as I don't foresee it being used really. Most of the changes in patch are actually to add this config.

This patch adds another tuning config "indexSpecForIntermediatePersists" that could be used to disable dimension/metric compression on intermediate persisted segments.

Testing:
Existing unit and integration tests should cover all changes here.

Release Notes:
This PR must be mentioned in the release notes to inform users about this change in behavior and specially to inform about "indexSpecForIntermediatePersists" config in case it needs to be reverted in certain case.
It is now possible to include indexSpecForIntermediatePersists in various task tuning configs to disable dimension/metric compression of intermediate persisted segments to reduce memory required to merge those in to final segment to be published.
This might increase the page cache usage of intermediate segments while they are used for querying, but overall it would be useful in cases where peak memory utilization of indexing task process during final merge is much higher than average memory utilization during ingestion.
To disable dimension/metric compression on intermediate persisted segments, following can be added to task tuningConfig.

        "indexSpecForIntermediatePersists": {
          "dimensionCompression": "uncompressed",
          "metricCompression": "none"
        }

@clintropolis
Copy link
Copy Markdown
Member

clintropolis commented Jun 18, 2019

Hmm, I really like the idea of being able to separately control this stuff for the intermediary segments so mega +1 on that, however I'm not sure how I feel about straight UNCOMPRESSED being the default behavior (if I understand this PR correctly). I think we should consider if this is the best thing to do to avoid the unbounded usage of the 64k processing buffers used for decompression, and maybe we should measure some things? My fear is this as the default could potentially change the dynamic of realtime indexing tasks quite a lot, namely how much memory impact they have on page cache, potentially exaggerating issues like described in #6699 (though I suspect running realtime tasks via YARN is rare-ish).

Experimentation I did related to #6016 showed an often very dramatic size difference between compressed and non-compressed data, particularly with int and long typed columns that I don't think can be ignored. Even the size difference between using CompressedVSizeByte and VSizeByte versions of int columns could be very large. I will see if I can dig up some measurements where i collected uncompressed and/or vsize byte packed sizes and share them here.

Some other ideas have been suggested to help mitigate merging issues like this I think. #5526 suggests adjustments to the merging algorithm that would reduce overall memory usage such as producing the dictionary out of band, and reducing the duplicated number of operations which I would suspect would reduce overall memory usage.

#7900 additionally suggests another thing we could do that would specifically help the unbounded decompression buffer on merge issue, in the form of doing a sort of hierarchical merge. Since we can measure how many intermediary segments we have with column counts for each, we could calculate how much buffer required and divide merge work up as necessary to keep total usage at a reasonable size. If I understand correctly, it also suggests some similar reworking of merge algorithm as mentioned in #5526.

The other thing that makes me think this might not be the best default behavior at least, is that to simply things for new users, the documentation for getting started and smaller cluster tuning suggests running co-tenant middle-manager and historical processes, and I suspect if uncompressed columns size differences are noticeable that this will greatly increase the amount of contention of the os free space between these processes, especially at merge time where all columns of all intermediary segments will be paged in during merge (#6699 again, some related discussion on this in comments). This is already a thing that bothers me about this setup, and I think it could make this issue become a lot worse.

It's also totally possible i'm being overly cautious, but I think we need more data before going with these defaults.

@clintropolis
Copy link
Copy Markdown
Member

I want to add that it's also entirely possible I'm overestimating how effective compression is for these intermediary segments, especially at very small intermediate segment sizes occurring in the issue referenced this change, which are likely to have fewer opportunities for lz4 to take advantage of. My main point I guess is just that we should investigate the matter to make sure things are going to be chill if we do make this change.

@himanshug
Copy link
Copy Markdown
Contributor Author

@clintropolis all the PRs you linked are independent improvements on indexing, index merging, compression, freeing buffers as soon as possible etc which are great and will happen at some point.
This PR is an immediate solution to address the problem. So, I am glad that you agree that having different IndexSpec for intermediate persisted segments makes sense.

Now the real question that affects this PR is whether to change default IndexSpec for intermediate persisted segments or not.

From code perspective it is fairy trivial to retain existing default behavior and I can make that change if that is what most people desire, however here is my rationale for changing the default.

My observation on different clusters have been that current indexing task process peak memory usage is much higher compared to average utilization during ingestion due to merge process at the time of publish. Due to that, users plan for task process to have the "peak" memory available to it throughout its lifetime.
When compression is disabled on intermediate segments, then average memory utilization would increase (more page cache used) but overall peak memory usage would decrease due to no decompression buffers allocated at time of merge. Also, queries would run faster because data is stored uncompressed.
All said, you are right as above assumptions would not hold on some clusters due to specifics of the datasets that these configurations depend upon , there is no single choice that is good for everyone (or else we wouldn't have those configs :) )
If we keep existing behavior as default then I'm afraid there would be very few cluster operators who will use the config introduced here to disable compression on intermediate segments. OTOH , with changed default behavior it would be improve things in most cases and where not, they can use the config to get back older behavior.
Or maybe I am totally wrong and problems would show up on different test clusters upgrading to RCs and we will re-instate the old behavior as default in a follow-up PR.

@clintropolis
Copy link
Copy Markdown
Member

Ah, I agree with you on all points about merge being a peak of like everything 😅

When compression is disabled on intermediate segments, then average memory utilization would increase (more page cache used) but overall peak memory usage would decrease due to no decompression buffers allocated at time of merge.

This is all I really want to investigate a bit deeper before we go with these defaults, because my gut thinks that it would only be lower peak usage in the cases where there are very large numbers of small segments with lots of columns, because it really takes a lot before the 64KB buffers start to become a significant user of direct memory. I think if this is not the typical merge, then peak memory usage but in the form pressure on os free space, could be a fair bit higher with uncompressed intermediary segments, because merging will need to hit all pages essentially cycling the entire uncompressed segment through free space. If the middle manager has the free space the spare, or sufficiently fast disks there is certainly no issue, but this could be pretty unchill if running co-tenant with a historical, or if free space is tight enough perhaps push into disk becoming the bottleneck due to more faults?

I hopefully should have some time to fully review this PR soon, I will try to measure some things to see if I can make myself feel better about this being default 👍

@himanshug
Copy link
Copy Markdown
Contributor Author

@clintropolis you are right about cases where it is possible that peak usage is not much different from average usage. I appreciate you trying it out on the realistic datasets you might have from your customers.
that said, I am now strongly considering changing the default behavior to be same as before just to be on the safe side and to remove element of surprise.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Jun 27, 2019

This config is intentionally undocumented to not complicate user documentation as I don't foresee it being used really. Most of the changes in patch are actually to add this config.

Now that we are thinking of retaining the existing default, would it make sense to add documentation?

@himanshug
Copy link
Copy Markdown
Contributor Author

yeah I need to update this PR to retain older behavior as default... will get to it this week.

@himanshug himanshug changed the title disable all compression in intermediate segment persists while ingestion add config to optionally disable all compression in intermediate segment persists while ingestion Jul 5, 2019
@himanshug
Copy link
Copy Markdown
Contributor Author

@gianm @clintropolis this now retains existing behavior as default and newly added property is documented. I still kept the release notes label (and section in PR description) to add it in release notes to inform users who might need it.

Copy link
Copy Markdown
Member

@nishantmonu51 nishantmonu51 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally looks good to me, would be great if you can also add some serde tests, e.g KafkaSupervisorTuningConfigTest to verify serde with new ppt.

@himanshug
Copy link
Copy Markdown
Contributor Author

@nishantmonu51 updated the serde tests

@himanshug
Copy link
Copy Markdown
Contributor Author

"packaging build" appears to be failing for unknown/unrelated reason... probably timing out or something.

Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 👍

@clintropolis
Copy link
Copy Markdown
Member

"packaging build" appears to be failing for unknown/unrelated reason... probably timing out or something.

I think this is related to #7998, I believe @jihoonson is working on a fix.

@jihoonson
Copy link
Copy Markdown
Contributor

Oops. Yes, it’s because of #7998 and I’m still debugging. Will try to fix by today.

@himanshug
Copy link
Copy Markdown
Contributor Author

@clintropolis @jihoonson thanks for the build fix.

@himanshug himanshug merged commit 14aec7f into apache:master Jul 10, 2019
@clintropolis clintropolis added this to the 0.16.0 milestone Aug 8, 2019
@jihoonson jihoonson mentioned this pull request Oct 15, 2020
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants