Skip to content

remove druid.processing.columnCache.sizeBytes and CachingIndexed, combine string column implementations#14500

Merged
clintropolis merged 11 commits intoapache:masterfrom
clintropolis:there-can-be-only-one
Jul 3, 2023
Merged

remove druid.processing.columnCache.sizeBytes and CachingIndexed, combine string column implementations#14500
clintropolis merged 11 commits intoapache:masterfrom
clintropolis:there-can-be-only-one

Conversation

@clintropolis
Copy link
Copy Markdown
Member

@clintropolis clintropolis commented Jun 29, 2023

Description

Follow up to #14142 to clean up some additional stuff.

changes:

  • generic indexed, front-coded, and auto string columns now all share the same column and index supplier implementations
  • remove CachingIndexed implementation, which I think is largely no longer needed by the switch of many things to directly using ByteBuffer, avoiding the cost of creating String
  • remove ColumnConfig.columnCacheSizeBytes() since CachingIndexed was the only user

Release note

druid.processing.columnCache.sizeBytes has been removed since it provided limited utility after a number of internal changes. Leaving this config is harmless, but it does nothing.


This PR has:
  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

changes:
* generic indexed, front-coded, and auto string columns now all share the same column and index supplier implementations
* remove CachingIndexed implementation, which I think is largely no longer needed by the switch of many things to directly using ByteBuffer, avoiding the cost of creating Strings
* remove ColumnConfig.columnCacheSizeBytes since CachingIndexed was the only user
@clintropolis
Copy link
Copy Markdown
Member Author

intellij inspection failure:

Error:  processing/src/main/java/org/apache/druid/segment/serde/ColumnPartSerde.java:57 -- Parameter <code>columnConfig</code> is not used in either this method or any of its derived methods

is incorrect, but maybe it doesn't recognize the anonymous lambda classes of NestedCommonFormatColumnPartSerde. Will try to make private classes for Deserializer instead to see if it recognizes that...

Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I am ok with removing druid.processing.columnCache.sizeBytes. I searched a few places for that property, both public and private, and did not see evidence that it is widely used. I also agree that with the various efforts to do more ops directly on UTF-8, it isn't as useful as it used to be.

Btw, this reminded me of #11201, a PR of mine that is a couple years old and is still open. I just merged master into it, and would appreciate a review of that PR. It's some work towards HLL sketches working directly on UTF-8.

this.columnConfig = columnConfig;
this.numRows = numRows;
if (frontCodedStringDictionarySupplier != null) {
this.stringIndexSupplier = new StringUtf8ColumnIndexSupplier<>(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be clearer if the conditional is only about declaring the dictionary.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleaned up both this and the deserializer in DictionaryEncodedColumnPartSerde a bit more.

@clintropolis
Copy link
Copy Markdown
Member Author

clintropolis commented Jun 30, 2023

This PR has bad luck with static checks being completely wrong, here is another one:

Error:  /home/runner/work/druid/druid/processing/src/main/java/org/apache/druid/segment/serde/DictionaryEncodedColumnPartSerde.java:368: 'method call rparen' has incorrect indentation level 15, expected level should be 10. [Indentation]

which is saying that

        builder.setHasMultipleValues(hasMultipleValues)
               .setHasNulls(hasNulls)
               .setDictionaryEncodedColumnSupplier(dictionaryEncodedColumnSupplier);
               .setDictionaryEncodedColumnSupplier(
                   new StringUtf8DictionaryEncodedColumnSupplier<>(
                       dictionarySupplier,
                       rSingleValuedColumn,
                       rMultiValuedColumn
                   )
               );

should instead be:

        builder.setHasMultipleValues(hasMultipleValues)
               .setHasNulls(hasNulls)
               .setDictionaryEncodedColumnSupplier(dictionaryEncodedColumnSupplier);
               .setDictionaryEncodedColumnSupplier(
                   new StringUtf8DictionaryEncodedColumnSupplier<>(
                       dictionarySupplier,
                       rSingleValuedColumn,
                       rMultiValuedColumn
                   )
          );

which .. no.

@clintropolis clintropolis changed the title combine string column implementations remove druid.processing.columnCache.sizeBytes and CachingIndexed, combine string column implementations Jun 30, 2023
@clintropolis
Copy link
Copy Markdown
Member Author

clintropolis commented Jul 3, 2023

failing ci check

[standard-its / (Compile=openjdk8, Run=openjdk8, Cluster Build On K8s) ITNestedQueryPushDownTest integration test](https://github.com/apache/druid/actions/runs/5418854126/jobs/9852080327?pr=14500#logs)

seems to be failing on a few (maybe all?) recent PRs and is unrelated to the changes here

@clintropolis clintropolis merged commit 277aaa5 into apache:master Jul 3, 2023
@clintropolis clintropolis deleted the there-can-be-only-one branch July 3, 2023 02:37
@abhishekagarwal87 abhishekagarwal87 added this to the 27.0 milestone Jul 19, 2023
sergioferragut pushed a commit to sergioferragut/druid that referenced this pull request Jul 21, 2023
…bine string column implementations (apache#14500)

* combine string column implementations
changes:
* generic indexed, front-coded, and auto string columns now all share the same column and index supplier implementations
* remove CachingIndexed implementation, which I think is largely no longer needed by the switch of many things to directly using ByteBuffer, avoiding the cost of creating Strings
* remove ColumnConfig.columnCacheSizeBytes since CachingIndexed was the only user
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants