Skip to content

Configurable Index Type#10335

Merged
jihoonson merged 20 commits intoapache:masterfrom
liran-funaro:config-index
Oct 24, 2020
Merged

Configurable Index Type#10335
jihoonson merged 20 commits intoapache:masterfrom
liran-funaro:config-index

Conversation

@liran-funaro
Copy link
Copy Markdown
Contributor

@liran-funaro liran-funaro commented Aug 31, 2020

Fixes #10321.

See #10321 for details.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Key changed/added classes in this PR
  • AppendableIndexBuilder
  • AppendableIndexSpec
  • TuningConfig
  • AppenderatorConfig

@liran-funaro liran-funaro force-pushed the config-index branch 3 times, most recently from 4cfcbed to b552f94 Compare September 1, 2020 15:05
@liran-funaro liran-funaro changed the title [WIP] Design for a Configurable Index Type WIP Design for a Configurable Index Type Sep 13, 2020
@liran-funaro liran-funaro changed the title WIP Design for a Configurable Index Type [WIP] Design for a Configurable Index Type Sep 13, 2020
@liran-funaro liran-funaro changed the title [WIP] Design for a Configurable Index Type Design for a Configurable Index Type Sep 21, 2020
@liran-funaro liran-funaro changed the title Design for a Configurable Index Type Configurable Index Type Sep 21, 2020
@liran-funaro liran-funaro force-pushed the config-index branch 3 times, most recently from 85a422c to 7e62549 Compare October 4, 2020 14:31
Copy link
Copy Markdown

@Eshcar Eshcar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is bloated with similar changes due to refactoring
specifically the change from buildOnHeap() to build() - it affects more than 100 files
maybe there is a way to avoid it or have it scoped in a separate PR so that it would be easier to see and review exactly the changes that add the configuration capability

I added some comments suggesting how to further reduce the number of unnecessary changes (like null params, etc.)

Other than that looks good to me

Copy link
Copy Markdown
Contributor

@a2l007 a2l007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the extensible design that makes it easier to use new incremental index types. It would be useful to have integration tests for this which can test the appendableIndexSpec property within tuningConfig.

Comment thread processing/src/main/java/org/apache/druid/query/groupby/GroupByQueryHelper.java Outdated
Comment thread server/src/main/java/org/apache/druid/segment/indexing/TuningConfig.java Outdated
Comment thread server/src/main/java/org/apache/druid/segment/indexing/TuningConfig.java Outdated
@liran-funaro
Copy link
Copy Markdown
Contributor Author

Thanks, @a2l007
I addressed all your comments in the codes, or with follow-up questions.
Regarding the integration tests, I added tests to all the TuningConfig tests that validate the AppendableIndexSpec within TuningConfig.
I think it is sufficient. Do you agree?

@liran-funaro liran-funaro requested a review from a2l007 October 8, 2020 13:24
Copy link
Copy Markdown
Contributor

@a2l007 a2l007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've left a couple more comments.
Since we're introducing a new property within tuningConfig, I feel it would be useful to have a couple of integration tests with this property in the ingestion spec. However, I would be okay without adding them if the remaining community reviewers are fine without it.

Could you please fix the CI failures as well? Looks like there is coverage failure.

Comment thread server/src/main/java/org/apache/druid/segment/indexing/TuningConfig.java Outdated
@liran-funaro
Copy link
Copy Markdown
Contributor Author

Could you please fix the CI failures as well? Looks like there is coverage failure.

The coverage gap comes from ParallelIndexTuningConfig.equals() and it was there before this PR. I just added another comparison.
It covers 10/20 branches, but it will require writing 10 additional unique tests just to cover it.

@liran-funaro
Copy link
Copy Markdown
Contributor Author

@a2l007 I addressed all your comments. Is there anything else I need to do to move this PR forward?

Copy link
Copy Markdown
Contributor

@a2l007 a2l007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. Just one additional comment, which should help with fixing the travis failures.

Copy link
Copy Markdown
Contributor

@a2l007 a2l007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@liran-funaro
Copy link
Copy Markdown
Contributor Author

@a2l007 Thanks!

@liran-funaro
Copy link
Copy Markdown
Contributor Author

@jihoonson Do you have any more comments or can we proceed to merge this PR?

Copy link
Copy Markdown
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @liran-funaro, I haven't reviewed line by line, but left some comments especially for documentation.

Comment thread docs/ingestion/index.md Outdated
|-----|-----------|-------|
|type|Each in-memory index has its own tuning type code. You must specify the type code that matches your in-memory index. Common options are `onheap`, and `offheap`.|`onheap`|

Beyond these properties, each in-memory index has its own specific tuning properties.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think if users want to change this config ever for now? I remember @gianm mentioned that the offheap incremental index has some performance issue, which is why we don't use it for indexing. If you think this knob is useful for users, please add more details how each index type is different and when it's recommended to use what. Otherwise, I would suggest not documenting this know at least for now.

Copy link
Copy Markdown
Contributor Author

@liran-funaro liran-funaro Oct 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, by @a2l007 suggestion, I deferred the support for offheap ingestion to a different PR.
So this documentation should not include the offheap option anyway.
I'll remove this documentation for now.

@liran-funaro
Copy link
Copy Markdown
Contributor Author

liran-funaro commented Oct 21, 2020

@jihoonson Thanks. I addressed all your comments.
Please let me know if you see any other required modifications, or that we could proceed to merge this PR.

Copy link
Copy Markdown
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. +1 after CI. Thanks @liran-funaro!

@liran-funaro
Copy link
Copy Markdown
Contributor Author

Passed CI.
Thanks @jihoonson and @a2l007

@liran-funaro
Copy link
Copy Markdown
Contributor Author

@jihoonson Do we need more approval before merge?

@jihoonson
Copy link
Copy Markdown
Contributor

@liran-funaro we usually wait for some time before merging a PR for other committers if there is anyone who wants to review as well. I will merge this PR today unless there is someone else who has more comments.

@jihoonson jihoonson merged commit f3a2903 into apache:master Oct 24, 2020
@jihoonson
Copy link
Copy Markdown
Contributor

@liran-funaro merged. Thanks!

@jihoonson jihoonson added this to the 0.21.0 milestone Jan 4, 2021
JulianJaffePinterest pushed a commit to JulianJaffePinterest/druid that referenced this pull request Jan 22, 2021
* Introduce a Configurable Index Type

* Change to @UnstableApi

* Add AppendableIndexSpecTest

* Update doc

* Add spelling exception

* Add tests coverage

* Revert some of the changes to reduce diff

* Minor fixes

* Update getMaxBytesInMemoryOrDefault() comment

* Fix typo, remove redundant interface

* Remove off-heap spec (postponed to a later PR)

* Add javadocs to AppendableIndexSpec

* Describe testCreateTask()

* Add tests for AppendableIndexSpec within TuningConfig

* Modify hashCode() to conform with equals()

* Add comment where building incremental-index

* Add "EqualsVerifier" tests

* Revert some of the API back to AppenderatorConfig

* Don't use multi-line comments

* Remove knob documentation (deferred)
@exherb
Copy link
Copy Markdown
Contributor

exherb commented Jun 13, 2022

Looks this feature has been removed? I try to index kafka stream with off heap but got ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Proposal] Design for a Configurable Index Type

5 participants