Skip to content

IncrementalIndex#add is no longer thread-safe.#15697

Merged
gianm merged 4 commits intoapache:masterfrom
gianm:incremental-index-add-not-thread-safe
Jan 18, 2024
Merged

IncrementalIndex#add is no longer thread-safe.#15697
gianm merged 4 commits intoapache:masterfrom
gianm:incremental-index-add-not-thread-safe

Conversation

@gianm
Copy link
Copy Markdown
Contributor

@gianm gianm commented Jan 16, 2024

Following #14866, there is no longer a reason for IncrementalIndex#add to be thread-safe.

It turns out it already was not using its selectors in a thread-safe way, as exposed by #15615 making testMultithreadAddFactsUsingExpressionAndJavaScript in IncrementalIndexIngestionTest flaky. Note that this problem isn't new: Strings have been stored in the dimension selectors for some time, but we didn't have a test that checked for that case; we only have this test that checks for concurrent adds involving numeric selectors.

At any rate, this patch changes OnheapIncrementalIndex to no longer try to offer a thread-safe "add" method. It also improves performance a bit by adding a row ID supplier to the selectors it uses to read InputRows, meaning that it can get the benefit of caching values inside the selectors.

This patch also:

  1. Adds synchronization to HyperUniquesAggregator and CardinalityAggregator,
    which the similar datasketches versions already have. This is done to
    help them adhere to the contract of Aggregator: concurrent calls to
    "aggregate" and "get" must be thread-safe.

  2. Updates OnHeapIncrementalIndexBenchmark to use JMH and moves it to the
    druid-benchmarks module.

Benchmarks below. I think the speedup is mainly due to the additional caching enabled by the row ID supplier. The benchmark involves two primitive aggregators reading the same input fields, which would now be able to make use of in-selector caching.

patch

Benchmark                                          Mode  Cnt     Score    Error  Units
OnheapIncrementalIndexBenchmark.concurrentAddRead  avgt    5  1805.434 ± 51.638  ms/op

master

Benchmark                                          Mode  Cnt     Score    Error  Units
OnheapIncrementalIndexBenchmark.concurrentAddRead  avgt    5  2049.733 ± 21.237  ms/op

Following apache#14866, there is no longer a reason for IncrementalIndex#add
to be thread-safe.

It turns out it already was not using its selectors in a thread-safe way,
as exposed by apache#15615 making `testMultithreadAddFactsUsingExpressionAndJavaScript`
in `IncrementalIndexIngestionTest` flaky. Note that this problem isn't
new: Strings have been stored in the dimension selectors for some time,
but we didn't have a test that checked for that case; we only have
this test that checks for concurrent adds involving numeric selectors.

At any rate, this patch changes OnheapIncrementalIndex to no longer try
to offer a thread-safe "add" method. It also improves performance a bit
by adding a row ID supplier to the selectors it uses to read InputRows,
meaning that it can get the benefit of caching values inside the selectors.

This patch also:

1) Adds synchronization to HyperUniquesAggregator and CardinalityAggregator,
   which the similar datasketches versions already have. This is done to
   help them adhere to the contract of Aggregator: concurrent calls to
   "aggregate" and "get" must be thread-safe.

2) Updates OnHeapIncrementalIndexBenchmark to use JMH and moves it to the
   druid-benchmarks module.
@gianm gianm merged commit 792e5c5 into apache:master Jan 18, 2024
@gianm gianm deleted the incremental-index-add-not-thread-safe branch January 18, 2024 11:45
@abhishekagarwal87 abhishekagarwal87 added this to the 29.0.0 milestone Jan 24, 2024
LakshSingla pushed a commit to LakshSingla/druid that referenced this pull request Jan 30, 2024
* IncrementalIndex#add is no longer thread-safe.

Following apache#14866, there is no longer a reason for IncrementalIndex#add
to be thread-safe.

It turns out it already was not using its selectors in a thread-safe way,
as exposed by apache#15615 making `testMultithreadAddFactsUsingExpressionAndJavaScript`
in `IncrementalIndexIngestionTest` flaky. Note that this problem isn't
new: Strings have been stored in the dimension selectors for some time,
but we didn't have a test that checked for that case; we only have
this test that checks for concurrent adds involving numeric selectors.

At any rate, this patch changes OnheapIncrementalIndex to no longer try
to offer a thread-safe "add" method. It also improves performance a bit
by adding a row ID supplier to the selectors it uses to read InputRows,
meaning that it can get the benefit of caching values inside the selectors.

This patch also:

1) Adds synchronization to HyperUniquesAggregator and CardinalityAggregator,
   which the similar datasketches versions already have. This is done to
   help them adhere to the contract of Aggregator: concurrent calls to
   "aggregate" and "get" must be thread-safe.

2) Updates OnHeapIncrementalIndexBenchmark to use JMH and moves it to the
   druid-benchmarks module.

* Spelling.

* Changes from static analysis.

* Fix javadoc.
abhishekagarwal87 pushed a commit that referenced this pull request Jan 30, 2024
* IncrementalIndex#add is no longer thread-safe.

Following #14866, there is no longer a reason for IncrementalIndex#add
to be thread-safe.

It turns out it already was not using its selectors in a thread-safe way,
as exposed by #15615 making `testMultithreadAddFactsUsingExpressionAndJavaScript`
in `IncrementalIndexIngestionTest` flaky. Note that this problem isn't
new: Strings have been stored in the dimension selectors for some time,
but we didn't have a test that checked for that case; we only have
this test that checks for concurrent adds involving numeric selectors.

At any rate, this patch changes OnheapIncrementalIndex to no longer try
to offer a thread-safe "add" method. It also improves performance a bit
by adding a row ID supplier to the selectors it uses to read InputRows,
meaning that it can get the benefit of caching values inside the selectors.

This patch also:

1) Adds synchronization to HyperUniquesAggregator and CardinalityAggregator,
   which the similar datasketches versions already have. This is done to
   help them adhere to the contract of Aggregator: concurrent calls to
   "aggregate" and "get" must be thread-safe.

2) Updates OnHeapIncrementalIndexBenchmark to use JMH and moves it to the
   druid-benchmarks module.

* Spelling.

* Changes from static analysis.

* Fix javadoc.

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants