Parallelize storage of incremental segments by pramodin · Pull Request #13982 · apache/druid

pramodin · 2023-03-27T17:11:50Z

Description

During ingestion, incremental segments are created in memory for the different time chunks and persisted to disk when certain thresholds are reached (max number of rows, max memory, incremental persist period etc). In the case where there are a lot of dimension and metrics (1000+) it was observed that the creation/serialization of incremental segment file format for persistence and persisting the file took a while and it was blocking ingestion of new data. This affected the real-time ingestion. This serialization and persistence can be parallelized across the different time chunks. This update aims to do that.

The patch adds a simple configuration parameter to the ingestion tuning configuration to specify number of persistence threads. The default value is 1 if it not specified which makes it the same as it is today.

Release note

Key changed/added classes in this PR

StreamAppenderator
AppenderatorImpl
TheirBaz

This PR has:

[X ] been self-reviewed.
- using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
added documentation for new or modified features or behaviors.
a release note entry in the PR description.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
been tested in a test Druid cluster.

…l segments

abhishekagarwal87 · 2023-03-28T06:46:31Z

Thank you for this patch, @PramodSSImmaneni. Do you have some numbers on the time it was taking before and after the change?

pramodin · 2023-04-18T19:15:10Z

Hi @abhishekagarwal87 we were using it on a datasource with about 4000 columns, with the original single thread it would take 3 - 10 seconds to create the incremental segment files. Using 4 or 5 parallelization with this change helped us handle the incoming throughput.

kfaraz · 2023-05-03T04:39:43Z

@PramodSSImmaneni , 3 to 10s seems like a reasonable time for a datasource with 4000 columns.
Did you see this time increase abruptly in any specific scenario? I agree that there is room for parallelization here, but we need to understand the exact benefits that we would get from it.

It would be nice if you could share some more details on the following points:

Did real-time ingestion slow down at any point, i.e. was there a lag buildup?
How long did it take to persist the segments
Once the segments were persisted, did lag catch up?
With parallelization, how did the lag behave?
What was your cluster setup? number of task slots, worker node types

If we do realize that parallelization is in fact needed here, it would be an MM/Indexer runtime property rather than a tuning config that has to be passed through all index specs. Users doing an ingestion need not be exposed to this detail.

kfaraz · 2023-05-03T04:40:06Z

Side note:
Ideally, when you run into an issue, it's generally a good idea to go through the following steps:

Start a thread on Druid slack so that the community can help you troubleshoot your issue. We have very active community members who are always ready to discuss about new and existing problems. :)
You may even search the Druid slack and GitHub issues to see if this is a known issue.
After discussion, if you feel this is certainly a bug/improvement, create an issue (or a PR if you already have a solution in mind) with all the details, maybe even a link to the Slack discussion.

These steps help us understand your issue and solution faster and even helps other people in the community who might be running into similar problems.

pramodin · 2023-05-03T22:48:58Z

@kfaraz The slowness in creation and saving of the incremental index files (because of the large number of columns) was causing ingestion lag to increase continuously and it was falling behind more and more. It would take about 3 to 10 seconds to create the incremental index files after data had been ingested and incremental segment was ready to be persisted (because configured thresholds were reached). This would case the ingestion to fall behind even though there were available cpus on the node. There are multiple segment intervals being ingested at same time so there are multiple index files and these were being saved in a serial fashion. From what I could see there were no dependencies between them and they could be persisted parallelly.

We have a mixture of datasources and some are small that don't need this higher degree of parallelism, initially I considered a MM property but then it would apply to all datasources and a larger datasource may be starved while a smaller one is using the extra threads. Having it on a per datasource basis allows it to be configurable.

abhishekagarwal87 · 2023-05-04T06:56:21Z

@PramodSSImmaneni - how many of these incremental indexes were being created in an hour, in your particular case?

pramodin · 2023-05-04T17:21:50Z

@abhishekagarwal87 about 400 - 500

pramodin · 2023-05-11T16:15:31Z

I have also tested with datasource with a small number of columns, whereas earlier in ingestion we were topping at around 5k thousand rows per second because of segment time spread that was over 24 hours for hourly segment granularity, with 5 persist threads it has been able to ingest at 25k rows per second.

pramodin · 2023-05-12T18:31:01Z

@kfaraz I started a thread on Druid slack in the general channel. Is there another channel that is better suited as there haven't been any comments so far.

kfaraz · 2023-05-13T03:13:21Z

@PramodSSImmaneni , We typically use the following channels:

#troubleshooting - for assistance in debugging issues
#dev - for dev proposals and discussions

jasonk000 · 2023-05-13T05:27:44Z

+          Execs.newBlockingThreaded(
              "[" + StringUtils.encodeForFormat(myId) + "]-appenderator-persist",
-              maxPendingPersists
+              persistThreads, maxPendingPersists


I'm wary of the persistExecutor being called by callers assuming it is single-threaded. For example, there's nothing I can see that prevents the close() from occurring while a task is still running. Nothing stands out as obviously broken though. But, I do see comments like this:

// use persistExecutor to make sure that all the pending persists completes before // starting to abandon segments persistExecutor

Comments suggest that - if we have multiple threads - we might start processing an abandon request before or while a persist request is in queue. What scenarios/mitigations are there here?

I see will get back to you on it

Was looking at this, wouldn't the shutdown on the executor interrupt any running threads anyway. I mean even with single thread the pending persist would not complete. Also when abandoning the segments are not going to get pushed to deep storage and nor would the kafka offsets be committed isn't it. So any pending persists in the queue wouldn't matter.

ektravel

Reviewed the documentation portion of this PR.

ektravel

Documentation changes look good.

pramodin · 2023-08-15T00:54:51Z

Merged with latest and fix conflicts. Kindly let me know what else is needed for the MR to be merged.

…_incremental_persist

kfaraz · 2024-01-12T02:11:33Z

@PramodSSImmaneni , thanks a lot for your patience on this. Please let me know if you are still interested in taking this forward.

I'm still a little wary of this change due to reasons that @jasonk000 mentioned. In short, we need to be sure that there are no places in the code which assume persistExecutor to be single-threaded.

Also, in the first iteration, I would prefer if we add this new config only for the mode of ingestion in question here (i.e. Kafka, if I am not mistaken). Please remove the changes for the other modes of ingestion from this PR. If we realize that this has value for other ingestion modes, we will add the config there when needed.

pramodin · 2024-01-12T20:16:02Z

@kfaraz yes I am still interested in taking this forward. I will make changes to restrict it to Kafka.

kfaraz · 2024-01-14T07:55:21Z

Thanks, @PramodSSImmaneni , please let me know once you have pushed the latest changes. I will approve and merge the changes after that. Please try to double check the following point as well.

In short, we need to be sure that there are no places in the code which assume persistExecutor to be single-threaded.

… requested

…_incremental_persist

pramodin · 2024-02-04T01:37:48Z

@kfaraz reverted changes for Kinesis and kept for Kafka. Merged latest master into it as well. Please see. Also checked again and it doesn't look like any place is relying on it being single threaded, also critical sections like persistHydrant that saves an incremental segment file and creation of commit.json that has list of incremental segments are already protected with locking.

kfaraz

Changes look good to me.

pramodin added 2 commits March 22, 2023 08:50

Added parallel index persists of segment shards during ingestion

da4030c

Added ingestion spec for parallelization to persistence of incrementa…

48231d7

…l segments

kfaraz added Performance Area - Ingestion labels Mar 28, 2023

cryptoe requested a review from kfaraz April 5, 2023 09:13

jasonk000 reviewed May 13, 2023

View reviewed changes

Comment thread server/src/main/java/org/apache/druid/segment/realtime/appenderator/AppenderatorImpl.java Outdated

jasonk000 reviewed May 13, 2023

View reviewed changes

pramodin added 2 commits June 1, 2023 18:13

Cleaned up earlier implementation attempt that is superseded

96e03e7

Added documentation

d6de2df

github-actions Bot added the Area - Documentation label Jun 2, 2023

ektravel reviewed Jul 18, 2023

View reviewed changes

Comment thread docs/development/extensions-core/kafka-supervisor-reference.md Outdated

ektravel reviewed Jul 18, 2023

View reviewed changes

Incorporated documentation feedback

c20736b

ektravel approved these changes Jul 31, 2023

View reviewed changes

Merge remote-tracking branch 'apache/master' into feature-parallelize…

73df24f

…_incremental_persist

pramodin force-pushed the feature-parallelize_incremental_persist branch from 580e7a0 to 73df24f Compare August 30, 2023 22:21

Merge remote-tracking branch 'apache/master' into feature-parallelize…

87a5245

…_incremental_persist

Revert incremental segment persistence parallelization for Kinesis as…

f6bf56d

… requested

github-actions Bot added Area - Streaming Ingestion Kubernetes labels Feb 3, 2024

pramodin added 2 commits February 3, 2024 12:29

Merge remote-tracking branch 'apache/master' into feature-parallelize…

978d8a9

…_incremental_persist

Fix errors after merge

42e8291

pramodin force-pushed the feature-parallelize_incremental_persist branch from b5f3ec2 to 42e8291 Compare February 3, 2024 21:10

Fixed unit test

41a50eb

kfaraz approved these changes Feb 6, 2024

View reviewed changes

abhishekagarwal87 merged commit 59bca09 into apache:master Feb 7, 2024

adarshsanjeev added this to the 30.0.0 milestone May 6, 2024

Conversation

pramodin commented Mar 27, 2023

Description

Release note

Key changed/added classes in this PR

Uh oh!

abhishekagarwal87 commented Mar 28, 2023

Uh oh!

pramodin commented Apr 18, 2023

Uh oh!

kfaraz commented May 3, 2023

Uh oh!

kfaraz commented May 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pramodin commented May 3, 2023

Uh oh!

abhishekagarwal87 commented May 4, 2023

Uh oh!

pramodin commented May 4, 2023

Uh oh!

pramodin commented May 11, 2023

Uh oh!

pramodin commented May 12, 2023

Uh oh!

kfaraz commented May 13, 2023

Uh oh!

Uh oh!

jasonk000 May 13, 2023

Choose a reason for hiding this comment

Uh oh!

pramodin May 16, 2023

Choose a reason for hiding this comment

Uh oh!

pramodin Aug 14, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ektravel left a comment

Choose a reason for hiding this comment

Uh oh!

ektravel left a comment

Choose a reason for hiding this comment

Uh oh!

pramodin commented Aug 15, 2023

Uh oh!

kfaraz commented Jan 12, 2024

Uh oh!

pramodin commented Jan 12, 2024

Uh oh!

kfaraz commented Jan 14, 2024

Uh oh!

pramodin commented Feb 4, 2024

Uh oh!

kfaraz left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

kfaraz commented May 3, 2023 •

edited

Loading