Skip to content

fix endianness related issue with complex metric compression#17391

Closed
clintropolis wants to merge 1 commit intoapache:masterfrom
clintropolis:compressed-complex-metrics-endians
Closed

fix endianness related issue with complex metric compression#17391
clintropolis wants to merge 1 commit intoapache:masterfrom
clintropolis:compressed-complex-metrics-endians

Conversation

@clintropolis
Copy link
Copy Markdown
Member

This PR is a follow-up to #16863 to fix a problem for any complex metrics which do not specify the byte order of the underlying buffer and were counting on the java big endian default (such as approxHistogram).

The underlying CompressedBlockReader already had the ability to specify an order on the buffer for values it reads, but it was just specifying the same order that the compression was using, which is native order by default for performance reasons.

To push this into the block reader, I have added ObjectStrategy.getByteOrder() with a default implementation returning big endian, though it probably would have been safe to hard-code to use big endian, since the things that need little endian were already explicitly ordering stuff on value read. However, thinking forward it seems like it would be nice to not have to re-order the values again in the object strategy if we can know ahead of time.

I have added implementations for ObjectStrategy which i knew were little endian, even though they are already explicitly ordering values on read due to the behavior of the uncompressed complex columns.

}

/**
* Preferred byte order to read values from a buffer with {@link #fromByteBuffer(ByteBuffer, int)} and similar methods
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also update fromByteBuffer contract to say that the caller must pass in a buffer whose order matches getByteOrder(). Please double-check all callers of fromByteBuffer to make sure they do that properly.

@@ -34,17 +34,29 @@ public class CompressedVariableSizedBlobColumnSupplier implements Supplier<Compr
public static CompressedVariableSizedBlobColumnSupplier fromByteBuffer(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method seems like a bug waiting to happen, where someone should provide both compressionOrder and valueOrder but only provides compressionOrder because this method exists. Remove it?

@clintropolis
Copy link
Copy Markdown
Member Author

closing in favor of #17422, though i will likely resurrect some of the ideas in this PR later

@clintropolis clintropolis deleted the compressed-complex-metrics-endians branch October 30, 2024 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants