remove druid.processing.columnCache.sizeBytes and CachingIndexed, combine string column implementations by clintropolis · Pull Request #14500 · apache/druid

clintropolis · 2023-06-29T01:29:52Z

Description

Follow up to #14142 to clean up some additional stuff.

changes:

generic indexed, front-coded, and auto string columns now all share the same column and index supplier implementations
remove CachingIndexed implementation, which I think is largely no longer needed by the switch of many things to directly using ByteBuffer, avoiding the cost of creating String
remove ColumnConfig.columnCacheSizeBytes() since CachingIndexed was the only user

Release note

druid.processing.columnCache.sizeBytes has been removed since it provided limited utility after a number of internal changes. Leaving this config is harmless, but it does nothing.

This PR has:

been self-reviewed.
added documentation for new or modified features or behaviors.
a release note entry in the PR description.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
been tested in a test Druid cluster.

changes: * generic indexed, front-coded, and auto string columns now all share the same column and index supplier implementations * remove CachingIndexed implementation, which I think is largely no longer needed by the switch of many things to directly using ByteBuffer, avoiding the cost of creating Strings * remove ColumnConfig.columnCacheSizeBytes since CachingIndexed was the only user

…-one

clintropolis · 2023-06-29T11:55:17Z

intellij inspection failure:

Error:  processing/src/main/java/org/apache/druid/segment/serde/ColumnPartSerde.java:57 -- Parameter <code>columnConfig</code> is not used in either this method or any of its derived methods

is incorrect, but maybe it doesn't recognize the anonymous lambda classes of NestedCommonFormatColumnPartSerde. Will try to make private classes for Deserializer instead to see if it recognizes that...

gianm

LGTM.

I am ok with removing druid.processing.columnCache.sizeBytes. I searched a few places for that property, both public and private, and did not see evidence that it is widely used. I also agree that with the various efforts to do more ops directly on UTF-8, it isn't as useful as it used to be.

Btw, this reminded me of #11201, a PR of mine that is a couple years old and is still open. I just merged master into it, and would appreciate a review of that PR. It's some work towards HLL sketches working directly on UTF-8.

gianm · 2023-06-29T21:38:06Z

-    this.columnConfig = columnConfig;
-    this.numRows = numRows;
+    if (frontCodedStringDictionarySupplier != null) {
+      this.stringIndexSupplier = new StringUtf8ColumnIndexSupplier<>(


Might be clearer if the conditional is only about declaring the dictionary.

cleaned up both this and the deserializer in DictionaryEncodedColumnPartSerde a bit more.

clintropolis · 2023-06-30T03:00:40Z

This PR has bad luck with static checks being completely wrong, here is another one:

Error:  /home/runner/work/druid/druid/processing/src/main/java/org/apache/druid/segment/serde/DictionaryEncodedColumnPartSerde.java:368: 'method call rparen' has incorrect indentation level 15, expected level should be 10. [Indentation]

which is saying that

        builder.setHasMultipleValues(hasMultipleValues)
               .setHasNulls(hasNulls)
               .setDictionaryEncodedColumnSupplier(dictionaryEncodedColumnSupplier);
               .setDictionaryEncodedColumnSupplier(
                   new StringUtf8DictionaryEncodedColumnSupplier<>(
                       dictionarySupplier,
                       rSingleValuedColumn,
                       rMultiValuedColumn
                   )
               );

should instead be:

        builder.setHasMultipleValues(hasMultipleValues)
               .setHasNulls(hasNulls)
               .setDictionaryEncodedColumnSupplier(dictionaryEncodedColumnSupplier);
               .setDictionaryEncodedColumnSupplier(
                   new StringUtf8DictionaryEncodedColumnSupplier<>(
                       dictionarySupplier,
                       rSingleValuedColumn,
                       rMultiValuedColumn
                   )
          );

which .. no.

clintropolis · 2023-07-03T02:36:38Z

failing ci check

[standard-its / (Compile=openjdk8, Run=openjdk8, Cluster Build On K8s) ITNestedQueryPushDownTest integration test](https://github.com/apache/druid/actions/runs/5418854126/jobs/9852080327?pr=14500#logs)

seems to be failing on a few (maybe all?) recent PRs and is unrelated to the changes here

…bine string column implementations (apache#14500) * combine string column implementations changes: * generic indexed, front-coded, and auto string columns now all share the same column and index supplier implementations * remove CachingIndexed implementation, which I think is largely no longer needed by the switch of many things to directly using ByteBuffer, avoiding the cost of creating Strings * remove ColumnConfig.columnCacheSizeBytes since CachingIndexed was the only user

clintropolis added Area - Querying Design Review Area - Segment Format and Ser/De labels Jun 29, 2023

github-actions Bot added the Area - Documentation label Jun 29, 2023

clintropolis added 5 commits June 28, 2023 19:06

fix style

a33dab6

remove unused

d1b0185

Merge remote-tracking branch 'upstream/master' into there-can-be-only…

c37da40

…-one

more unused

9e85ba0

maybe this will pass inspections

0a3d7c3

clintropolis added 2 commits June 29, 2023 04:55

adjust

8716027

revert

2759bd4

gianm approved these changes Jun 30, 2023

View reviewed changes

clintropolis added 2 commits June 29, 2023 18:36

additional cleanup

126e77c

fix style

f01622f

why do static checks hate me

779f6c1

clintropolis changed the title ~~combine string column implementations~~ remove druid.processing.columnCache.sizeBytes and CachingIndexed, combine string column implementations Jun 30, 2023

abhishekagarwal87 approved these changes Jul 1, 2023

View reviewed changes

clintropolis merged commit 277aaa5 into apache:master Jul 3, 2023

clintropolis deleted the there-can-be-only-one branch July 3, 2023 02:37

abhishekagarwal87 added this to the 27.0 milestone Jul 19, 2023

AmatyaAvadhanula mentioned this pull request Aug 6, 2023

[DRAFT] 27.0.0 release notes #14761

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove druid.processing.columnCache.sizeBytes and CachingIndexed, combine string column implementations#14500

remove druid.processing.columnCache.sizeBytes and CachingIndexed, combine string column implementations#14500
clintropolis merged 11 commits intoapache:masterfrom
clintropolis:there-can-be-only-one

clintropolis commented Jun 29, 2023 •

edited

Loading

Uh oh!

clintropolis commented Jun 29, 2023

Uh oh!

gianm left a comment

Uh oh!

gianm Jun 29, 2023

Uh oh!

clintropolis Jun 30, 2023

Uh oh!

clintropolis commented Jun 30, 2023 •

edited

Loading

Uh oh!

clintropolis commented Jul 3, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

clintropolis commented Jun 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Release note

Uh oh!

clintropolis commented Jun 29, 2023

Uh oh!

gianm left a comment

Choose a reason for hiding this comment

Uh oh!

gianm Jun 29, 2023

Choose a reason for hiding this comment

Uh oh!

clintropolis Jun 30, 2023

Choose a reason for hiding this comment

Uh oh!

clintropolis commented Jun 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clintropolis commented Jul 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

clintropolis commented Jun 29, 2023 •

edited

Loading

clintropolis commented Jun 30, 2023 •

edited

Loading

clintropolis commented Jul 3, 2023 •

edited

Loading