fix bugs with auto encoded long vector deserializers by clintropolis · Pull Request #14186 · apache/druid

clintropolis · 2023-04-29T03:07:27Z

Description

This PR fixes an issue when using 'auto' encoded LONG typed columns and the 'vectorized' query engine. These columns use a delta based bit-packing mechanism, and errors in the vectorized reader would cause it to incorrectly read column values for some bit sizes (1 through 32 bits). This is a regression caused by #11004, which added the optimized readers to improve performance, so impacts Druid versions 0.22.0+.

While writing the test I finally got sad enough about IndexSpec not having a "builder", so I made one, and switched all the things to use it. Apologies for the noise in this bug fix PR, the only real changes are in VSizeLongSerde, and the tests that have been modified to cover the buggy behavior, VSizeLongSerdeTest and ExpressionVectorSelectorsTest. Everything else is just cleanup of IndexSpec usage.

Release note

'auto' encoded LONG typed columns can read values incorrectly when using the vectorized query engine. The data itself in the segment is fine, it is just translated incorrectly for the vectorize query engine. If using 'auto' encoded LONG columns in versions 0.22.0 through 25.0.0, we recommended to disable vectorization until upgrading to a Druid version 26.0.0.

Key changed/added classes in this PR

VSizeLongSerde

This PR has:

gianm

Nice find.

This PR fixes an issue when using 'auto' encoded LONG typed columns and the 'vectorized' query engine. These columns use a delta based bit-packing mechanism, and errors in the vectorized reader would cause it to incorrectly read column values for some bit sizes (1 through 32 bits). This is a regression caused by apache#11004, which added the optimized readers to improve performance, so impacts Druid versions 0.22.0+. While writing the test I finally got sad enough about IndexSpec not having a "builder", so I made one, and switched all the things to use it. Apologies for the noise in this bug fix PR, the only real changes are in VSizeLongSerde, and the tests that have been modified to cover the buggy behavior, VSizeLongSerdeTest and ExpressionVectorSelectorsTest. Everything else is just cleanup of IndexSpec usage.

This PR fixes an issue when using 'auto' encoded LONG typed columns and the 'vectorized' query engine. These columns use a delta based bit-packing mechanism, and errors in the vectorized reader would cause it to incorrectly read column values for some bit sizes (1 through 32 bits). This is a regression caused by #11004, which added the optimized readers to improve performance, so impacts Druid versions 0.22.0+. While writing the test I finally got sad enough about IndexSpec not having a "builder", so I made one, and switched all the things to use it. Apologies for the noise in this bug fix PR, the only real changes are in VSizeLongSerde, and the tests that have been modified to cover the buggy behavior, VSizeLongSerdeTest and ExpressionVectorSelectorsTest. Everything else is just cleanup of IndexSpec usage.

fix bugs with auto encoded long vector deserializers

7bad218

clintropolis added Bug Release Notes labels Apr 29, 2023

gianm approved these changes Apr 29, 2023

View reviewed changes

clintropolis added 3 commits April 28, 2023 21:02

fix equals test

736b7f5

fix tests

cb6f21c

style

22812af

clintropolis added this to the 26.0 milestone Apr 29, 2023

clintropolis added 5 commits April 28, 2023 23:44

more tests

48d15d3

better

2365a7f

oops

e5bd134

style

89b1a3f

test

41066bf

clintropolis added the Area - Querying label May 1, 2023

abhishekagarwal87 merged commit 90ea192 into apache:master May 1, 2023

clintropolis deleted the fix-vector-auto-long-readers branch May 1, 2023 06:25

clintropolis mentioned this pull request May 1, 2023

[Backport] fix bugs with auto encoded long vector deserializers #14190

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix bugs with auto encoded long vector deserializers#14186

fix bugs with auto encoded long vector deserializers#14186
abhishekagarwal87 merged 9 commits intoapache:masterfrom
clintropolis:fix-vector-auto-long-readers

clintropolis commented Apr 29, 2023 •

edited

Loading

Uh oh!

gianm left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

clintropolis commented Apr 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Release note

Key changed/added classes in this PR

Uh oh!

gianm left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

clintropolis commented Apr 29, 2023 •

edited

Loading