merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything by clintropolis · Pull Request #13698 · apache/druid

clintropolis · 2023-01-20T13:03:41Z

Description

This PR picks up where #6443 left off and merges druid-core, extendedset, and druid-hll into druid-processing.

My hope is that this is minimally disruptive to all open PRs, since its largely a straight move, the only expected conflicts coming from files which were newly added to the removed modules (which should be a very simple fix to move to the corresponding processing/src/ path of the same package) as well as poms which have added dependency references.

I chose druid-processing instead of druid-core because the latter was smaller, but I guess I don't feel super strongly either way about the final name, but have a slight affinity for this since it does contain basically all you need to make and query segments. Regardless of destination, all of these packages needed combined for this to work, since extendedset and hll depended on core, and processing depended on all 3.

I tested that using pre-built extensions from older versions with these libraries split up continue to work when used with newer builds with consolidated druid-processing.

Release note

druid-core, extendedset, and druid-hll modules have been consolidated into druid-processing to simplify dependencies. Any extensions referencing these should be updated to use druid-processing instead. Existing extension binaries should continue to function normally when used with newer versions of Druid.

This PR has:

been self-reviewed.
a release note entry in the PR description.
been tested in a test Druid cluster.

… simplify everything

…ymore

This reverts commit 0cac846.

clintropolis · 2023-01-25T06:15:09Z

Made an interesting find in this PR due to the travis consistently timing out during TopNQueryRunnerTest, which wasn't happening locally. Looking a bit closer though, I noticed that the test ran like 6 times slower when run as part of the entire druid-processing suite on my laptop:

$ mvn package -pl processing
...
[INFO] Running org.apache.druid.query.topn.TopNQueryRunnerTest
[INFO] Tests run: 82944, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 307.971 s - in org.apache.druid.query.topn.TopNQueryRunnerTest
...

compared to just running the specific test:

$ mvn package -pl processing -Dtest=org.apache.druid.query.topn.TopNQueryRunnerTest
...
[INFO] Running org.apache.druid.query.topn.TopNQueryRunnerTest
[INFO] Tests run: 82944, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 52.598 s - in org.apache.druid.query.topn.TopNQueryRunnerTest
...

Breaking out the profiler, and profiling both the individual test and the entire suite made me notice an interesting difference in the flame graph of all TopNQueryRunnerTest tests:

"What was Mockito doing in there? This test isn't using Mockito at all..." were my thoughts. Looking to usages of Mockito and ByteBuffer, the only thing using it was NullColumnPartSerdeTest. So, it turns out that a handful of Mockito.mock(ByteBuffer.class) calls in NullColumnPartSerdeTest was basically infecting all ByteBuffer usage in all tests in druid-processing because of Mockito reasons.

I added these tests - they weren't actually mocking a buffer, and the code shouldn't actually use a buffer, so it seemed harmless to use a mock, but my bad, i realize this is certainly not the case at least with Mockito. Modifying this test to use an empty but very regular heap ByteBuffer resulted in a dramatic speedup when run as part of the whole suite:

[INFO] Tests run: 82944, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 84.184 s - in org.apache.druid.query.topn.TopNQueryRunnerTest

So, moral of the story, it might be worth going over the tests and ensuring we limit usage of Mockito to uses that truly require it to avoid cross test strangeness like this.

clintropolis · 2023-01-25T08:26:09Z

hmm, except Travis is still hanging on TopNQueryRunnerTest ....

paul-rogers

Thanks for the refactoring. Did a quick survey: looks right. This is the kind of change which is mechanical (if quite tedious). If the build passes, we're good.

Looks like a few additional minor improvements were slipped in. Those seem OK also.

This change will impact any extension which depended on the now-defunct druid-core. Is that a concern? Such extensions will need two distinct versions to compile against releases before and after this change. How do we usually handle such cases?

This kind of change sometimes looses the commit history of files. However, the diff seems to indicate that Git has worked out that you moved files. Will this cause the commit history to persist?

Otherwise, LGTM.

clintropolis · 2023-01-27T07:07:03Z

This change will impact any extension which depended on the now-defunct druid-core. Is that a concern? Such extensions will need two distinct versions to compile against releases before and after this change. How do we usually handle such cases?

I did some testing and was able to use older extensions which were built with split druid-core and druid-processing dependencies on a newer druid build and everything worked ok. I didn't try newer extensions on older druid, but it might be ok too. I think because none of the actual packages changed on any of the classes everything ends up working out at runtime.

cheddar · 2023-02-09T05:24:01Z

The change set looks good. I agree with Paul, need green checks from build and then we are good.

…ymore

… override

clintropolis · 2023-02-10T20:34:09Z

I made another minor adjustment to add a method to JvmUtils, resetTestsToDefaultRuntimeInfo which allows resetting the RuntimeInfo in the event that a test needed to static inject some overridden version for testing, such as https://github.com/apache/druid/blob/master/processing/src/test/java/org/apache/druid/query/DruidProcessingConfigTest.java#L67 which I think is the reason i initially had to switch some tests from using Runtime.getRuntime().maxMemory() to using JvmUtils.getRuntimeInfo().getMaxHeapSizeBytes() for them to pass.

clintropolis · 2023-02-10T20:35:51Z

the two failing CI (CodeQL and Static Checks) are expected to fail since they consider all of druid-core fair game for analysis. We should definitely have a look at all of the things that have been flagged and consider making changes in future PRs, but do not think we should make any additional changes in this PR since it is doing enough.

…ymore

paul-rogers

Thanks for doing the grunt work to make this happen. This should be a nice improvement to the development experience. LGTM.

merge druid-core, extendedset, and druid-hll into druid-processing to…

6dc70af

… simplify everything

clintropolis added Release Notes Design Review Area - Dev For items related to the project itself, like dev docs and checklists, but not CI labels Jan 20, 2023

clintropolis added 8 commits January 21, 2023 14:19

fix poms and license stuff

a8ec2e0

everything passes locally, why not on ci? :(

98bdd19

more heap?

23571f2

Merge remote-tracking branch 'upstream/master' into i-cant-take-it-an…

b2b22fa

…ymore

fix merge

bfeecf4

lighter buffers

0cac846

Revert "lighter buffers"

768e8ce

This reverts commit 0cac846.

mockito is evil

74242c1

paul-rogers approved these changes Jan 26, 2023

View reviewed changes

cheddar approved these changes Feb 9, 2023

View reviewed changes

clintropolis added 2 commits February 10, 2023 02:05

Merge remote-tracking branch 'upstream/master' into i-cant-take-it-an…

710afa0

…ymore

allow reset of JvmUtils RuntimeInfo if tests used static injection to…

f7580d1

… override

clintropolis added 2 commits February 17, 2023 00:53

Merge remote-tracking branch 'upstream/master' into i-cant-take-it-an…

4b0e1cb

…ymore

more juice

2b23de7

paul-rogers approved these changes Feb 17, 2023

View reviewed changes

clintropolis merged commit 08b5951 into apache:master Feb 17, 2023

clintropolis deleted the i-cant-take-it-anymore branch February 17, 2023 22:27

clintropolis mentioned this pull request Feb 20, 2023

fix some intellij inspections in druid-processing #13823

Merged

10 tasks

clintropolis added this to the 26.0 milestone Apr 10, 2023

techdocsmith mentioned this pull request Apr 17, 2023

[DRAFT] 26.0.0 release notes #14064

Closed

abhishekagarwal87 mentioned this pull request Jun 13, 2023

There is no Druid-Core 26.0.0 on Maven Central #14415

Closed

clintropolis mentioned this pull request Aug 31, 2023

refactor CompactionState since core and processing have been merged #14932

Closed

pagrawal10 added a commit to confluentinc/druid that referenced this pull request Nov 29, 2023

apache#13698 extended to confluent patch extensions

0a6f72d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything#13698

merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything#13698
clintropolis merged 13 commits intoapache:masterfrom
clintropolis:i-cant-take-it-anymore

clintropolis commented Jan 20, 2023 •

edited

Loading

Uh oh!

clintropolis commented Jan 25, 2023 •

edited

Loading

Uh oh!

clintropolis commented Jan 25, 2023 •

edited

Loading

Uh oh!

paul-rogers left a comment

Uh oh!

clintropolis commented Jan 27, 2023

Uh oh!

cheddar commented Feb 9, 2023

Uh oh!

clintropolis commented Feb 10, 2023

Uh oh!

clintropolis commented Feb 10, 2023

Uh oh!

paul-rogers left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

clintropolis commented Jan 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Release note

Uh oh!

clintropolis commented Jan 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clintropolis commented Jan 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paul-rogers left a comment

Choose a reason for hiding this comment

Uh oh!

clintropolis commented Jan 27, 2023

Uh oh!

cheddar commented Feb 9, 2023

Uh oh!

clintropolis commented Feb 10, 2023

Uh oh!

clintropolis commented Feb 10, 2023

Uh oh!

paul-rogers left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

clintropolis commented Jan 20, 2023 •

edited

Loading

clintropolis commented Jan 25, 2023 •

edited

Loading

clintropolis commented Jan 25, 2023 •

edited

Loading