Add mechanism for 'safe' memory reads for complex types#13361
Add mechanism for 'safe' memory reads for complex types#13361clintropolis merged 5 commits intoapache:masterfrom
Conversation
we can leave your bounds behind 'cause if the memory is not there we really don't care and we'll crash this process of mine
|
Thanks for the changes @clintropolis! 👍 I had a doubt regarding the usage of Regarding the compatibility, I think since the method has not been released yet (is it released in 24.0.1?) the default implementation could also throw UOE, but it would mean that all complex types would have to implement it. Further, the function could be put behind a cluster level feature flag (not a good thing as well) so that the users/admins are aware of the scope. I was trying to see if this problem could get solved by newer java internals strong encapsulation feature in future but don't have any concrete update/thoughts on it yet. |
This isn't really up to me, but we have offered to contribute if there is wider use beyond Druids needs, relevant slack thread https://the-asf.slack.com/archives/CP0930GKG/p1666290716186309.
I guess I just used a
My personal feelings are that this is at least safe as of this PR to have a default implementation with all core and contrib extensions. Extension implementors have to go out of their way to use
I'd rather avoid such a flag if possible, the PR is intended to prevent having to have something like that. And as mentioned, it isn't strictly a problem with |
Thanks a lot for the link to the slack thread - it was relevant to the assertion/exception doubt that I had. Regarding the safe implementation, what I meant was to have a function like Regarding the compatibility and safety, I agree with your points that the current modules are safe and we could put advisory in the docs for extension implementors to consider implementing this method and also ensure safety while ingesting intermediate states objects. My reasoning for suggesting flag was to not affect users who aren't using these functionalities but do have extensions - but agree that a default throwing implementation would be a better option than flag. |
rohangarg
left a comment
There was a problem hiding this comment.
LGTM since all the modules in apache distribution are safe now while deserializing complex objects.
For extensions, it is decided to put a release note with this change which requests the implementors to verify the safety behavior in the extension.
* we can read where we want to we can leave your bounds behind 'cause if the memory is not there we really don't care and we'll crash this process of mine
Description
This PR fixes an accidental kill switch introduced by #13332, where by exposing the ability to construct sketches via expressions we also expose the ability to crash any jvm process running them by feeding them bad input.
For example, the query
SELECT COMPLEX_DECODE_BASE64('HLLSketch', 'AgEH')which is a truncated base64 encoded sketch object, will immediately crash the broker which evaluates it when planning the SQL query with an error of the form:After the changes in this PR, we just get a regular out of bounds exception:
In fairness, this problem existed prior to #13332, since the underlying problem has to do with anything using
Memoryon untrusted input bytes for anything that gets instructions of where to read or write from those bytes. The reason is thatMemoryfor performance reasons usesUnsafefor all read and write operations, even when wrappingByteBuffer, so if the code tries to read out of bounds locations it can crash the jvm process and do who knows what else.I think this behavior is absolutely correct for
Memoryfor most uses, but to solve our problem in a bit of an unconventional way, this PR introducesSafeWritableMemoryandSafeWritableBufferthat delegate all operations to an underlying plain javaByteBufferso that it gets the bounds checking that goes with that, which we can use for anything that usesMemorywhen loading from an untrusted source. When reading from segments we still use standardMemorycalls.Fortunately due to a bug that wasn't fixed until #13332, i wasn't able to trigger this exact failure in older versions of Druid with the native
complex_decode_base64expression because the complex type mapping wasn't correctly wired up to the function, but i believe it is still possible when ingesting pre-aggregated sketches that useMemory.To wire up to the
complex_decode_base64function, I've added a new method toObjectStrategy,which should be implemented by anything which uses
Unsafeor other potentially dangerous operations. I debated having a default implementation but ultimately decided to add one since this interface is marked as an@ExtensionPoint. This is erring on the side of danger in favor of compatibility, so I welcome discussion here.Besides the added tests, I also did an experiment to replace all calls to
Memory.wrapwith the "safe" version and all of the tests passed, so I think the "safe" implementation ofMemoryshould be correct.Release note
A new method has been added to the
ObjectStrategyextension pointto allow extension writers that implement complex types which typically use "unsafe" memory operations (such as using Java
Unsafeor DatasketchesMemory) to optionally provide a "safe" read method to use when processing untrusted inputs such as those decoded from base64 string inputs at ingest time and avoid bad inputs crashing the process. A default implementation is provided to be compatibility focused, but if your extension fits this description and uses unsafe memory operations, you should implement this method. For users of DatasketchesMemory, Druid provides aSafeWritableMemory.wrapmethod which can create a 'safe'Memoryfrom wrapping aByteBuffer.This PR has: