Skip to content

Parallelize storage of incremental segments#13982

Merged
abhishekagarwal87 merged 11 commits intoapache:masterfrom
rivian:feature-parallelize_incremental_persist
Feb 7, 2024
Merged

Parallelize storage of incremental segments#13982
abhishekagarwal87 merged 11 commits intoapache:masterfrom
rivian:feature-parallelize_incremental_persist

Conversation

@pramodin
Copy link
Copy Markdown
Contributor

Description

During ingestion, incremental segments are created in memory for the different time chunks and persisted to disk when certain thresholds are reached (max number of rows, max memory, incremental persist period etc). In the case where there are a lot of dimension and metrics (1000+) it was observed that the creation/serialization of incremental segment file format for persistence and persisting the file took a while and it was blocking ingestion of new data. This affected the real-time ingestion. This serialization and persistence can be parallelized across the different time chunks. This update aims to do that.

The patch adds a simple configuration parameter to the ingestion tuning configuration to specify number of persistence threads. The default value is 1 if it not specified which makes it the same as it is today.

Release note


Key changed/added classes in this PR
  • StreamAppenderator
  • AppenderatorImpl
  • TheirBaz

This PR has:

  • [X ] been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@abhishekagarwal87
Copy link
Copy Markdown
Contributor

Thank you for this patch, @PramodSSImmaneni. Do you have some numbers on the time it was taking before and after the change?

@cryptoe cryptoe requested a review from kfaraz April 5, 2023 09:13
@pramodin
Copy link
Copy Markdown
Contributor Author

Hi @abhishekagarwal87 we were using it on a datasource with about 4000 columns, with the original single thread it would take 3 - 10 seconds to create the incremental segment files. Using 4 or 5 parallelization with this change helped us handle the incoming throughput.

@kfaraz
Copy link
Copy Markdown
Contributor

kfaraz commented May 3, 2023

@PramodSSImmaneni , 3 to 10s seems like a reasonable time for a datasource with 4000 columns.
Did you see this time increase abruptly in any specific scenario? I agree that there is room for parallelization here, but we need to understand the exact benefits that we would get from it.

It would be nice if you could share some more details on the following points:

  • Did real-time ingestion slow down at any point, i.e. was there a lag buildup?
  • How long did it take to persist the segments
  • Once the segments were persisted, did lag catch up?
  • With parallelization, how did the lag behave?
  • What was your cluster setup? number of task slots, worker node types

If we do realize that parallelization is in fact needed here, it would be an MM/Indexer runtime property rather than a tuning config that has to be passed through all index specs. Users doing an ingestion need not be exposed to this detail.

@kfaraz
Copy link
Copy Markdown
Contributor

kfaraz commented May 3, 2023

Side note:
Ideally, when you run into an issue, it's generally a good idea to go through the following steps:

  • Start a thread on Druid slack so that the community can help you troubleshoot your issue. We have very active community members who are always ready to discuss about new and existing problems. :)
  • You may even search the Druid slack and GitHub issues to see if this is a known issue.
  • After discussion, if you feel this is certainly a bug/improvement, create an issue (or a PR if you already have a solution in mind) with all the details, maybe even a link to the Slack discussion.

These steps help us understand your issue and solution faster and even helps other people in the community who might be running into similar problems.

@pramodin
Copy link
Copy Markdown
Contributor Author

pramodin commented May 3, 2023

@kfaraz The slowness in creation and saving of the incremental index files (because of the large number of columns) was causing ingestion lag to increase continuously and it was falling behind more and more. It would take about 3 to 10 seconds to create the incremental index files after data had been ingested and incremental segment was ready to be persisted (because configured thresholds were reached). This would case the ingestion to fall behind even though there were available cpus on the node. There are multiple segment intervals being ingested at same time so there are multiple index files and these were being saved in a serial fashion. From what I could see there were no dependencies between them and they could be persisted parallelly.

We have a mixture of datasources and some are small that don't need this higher degree of parallelism, initially I considered a MM property but then it would apply to all datasources and a larger datasource may be starved while a smaller one is using the extra threads. Having it on a per datasource basis allows it to be configurable.

@abhishekagarwal87
Copy link
Copy Markdown
Contributor

@PramodSSImmaneni - how many of these incremental indexes were being created in an hour, in your particular case?

@pramodin
Copy link
Copy Markdown
Contributor Author

pramodin commented May 4, 2023

@abhishekagarwal87 about 400 - 500

@pramodin
Copy link
Copy Markdown
Contributor Author

I have also tested with datasource with a small number of columns, whereas earlier in ingestion we were topping at around 5k thousand rows per second because of segment time spread that was over 24 hours for hourly segment granularity, with 5 persist threads it has been able to ingest at 25k rows per second.

@pramodin
Copy link
Copy Markdown
Contributor Author

@kfaraz I started a thread on Druid slack in the general channel. Is there another channel that is better suited as there haven't been any comments so far.

@kfaraz
Copy link
Copy Markdown
Contributor

kfaraz commented May 13, 2023

@PramodSSImmaneni , We typically use the following channels:

#troubleshooting - for assistance in debugging issues
#dev - for dev proposals and discussions

Execs.newBlockingThreaded(
"[" + StringUtils.encodeForFormat(myId) + "]-appenderator-persist",
maxPendingPersists
persistThreads, maxPendingPersists
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wary of the persistExecutor being called by callers assuming it is single-threaded. For example, there's nothing I can see that prevents the close() from occurring while a task is still running. Nothing stands out as obviously broken though. But, I do see comments like this:

        // use persistExecutor to make sure that all the pending persists completes before
        // starting to abandon segments
        persistExecutor

Comments suggest that - if we have multiple threads - we might start processing an abandon request before or while a persist request is in queue. What scenarios/mitigations are there here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see will get back to you on it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was looking at this, wouldn't the shutdown on the executor interrupt any running threads anyway. I mean even with single thread the pending persist would not complete. Also when abandoning the segments are not going to get pushed to deep storage and nor would the kafka offsets be committed isn't it. So any pending persists in the queue wouldn't matter.

Comment thread docs/development/extensions-core/kafka-supervisor-reference.md Outdated
Copy link
Copy Markdown
Contributor

@ektravel ektravel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the documentation portion of this PR.

Copy link
Copy Markdown
Contributor

@ektravel ektravel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation changes look good.

@pramodin
Copy link
Copy Markdown
Contributor Author

Merged with latest and fix conflicts. Kindly let me know what else is needed for the MR to be merged.

@pramodin pramodin force-pushed the feature-parallelize_incremental_persist branch from 580e7a0 to 73df24f Compare August 30, 2023 22:21
@kfaraz
Copy link
Copy Markdown
Contributor

kfaraz commented Jan 12, 2024

@PramodSSImmaneni , thanks a lot for your patience on this. Please let me know if you are still interested in taking this forward.

I'm still a little wary of this change due to reasons that @jasonk000 mentioned. In short, we need to be sure that there are no places in the code which assume persistExecutor to be single-threaded.

Also, in the first iteration, I would prefer if we add this new config only for the mode of ingestion in question here (i.e. Kafka, if I am not mistaken). Please remove the changes for the other modes of ingestion from this PR. If we realize that this has value for other ingestion modes, we will add the config there when needed.

@pramodin
Copy link
Copy Markdown
Contributor Author

@kfaraz yes I am still interested in taking this forward. I will make changes to restrict it to Kafka.

@kfaraz
Copy link
Copy Markdown
Contributor

kfaraz commented Jan 14, 2024

Thanks, @PramodSSImmaneni , please let me know once you have pushed the latest changes. I will approve and merge the changes after that. Please try to double check the following point as well.

In short, we need to be sure that there are no places in the code which assume persistExecutor to be single-threaded.

@pramodin pramodin force-pushed the feature-parallelize_incremental_persist branch from b5f3ec2 to 42e8291 Compare February 3, 2024 21:10
@pramodin
Copy link
Copy Markdown
Contributor Author

pramodin commented Feb 4, 2024

@kfaraz reverted changes for Kinesis and kept for Kafka. Merged latest master into it as well. Please see. Also checked again and it doesn't look like any place is relying on it being single threaded, also critical sections like persistHydrant that saves an incremental segment file and creation of commit.json that has list of incremental segments are already protected with locking.

Copy link
Copy Markdown
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me.

@abhishekagarwal87 abhishekagarwal87 merged commit 59bca09 into apache:master Feb 7, 2024
@adarshsanjeev adarshsanjeev added this to the 30.0.0 milestone May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants