reset keySerde when closing groupers to clear out heap dictionaries#16114
reset keySerde when closing groupers to clear out heap dictionaries#16114LakshSingla merged 1 commit intoapache:masterfrom
Conversation
|
Nice find @clintropolis. Wouldn't JVM claim those weak references more aggressively under memory pressure? |
So i think it often does get reclaimed or else this would probably be a much bigger problem, but as I understand it can still take multiple cycles, so I suspect it is possible in really tight cases that the oom can still occur. It seems .. complicated https://stackoverflow.com/questions/17104452/threadlocal-garbage-collection |
Can something like this be done:
In this case,I suspect the |
That's part of the problem, it isn't the qtp thread, its the processing pool thread that has the |
|
Thanks for the PR and the explanation! |
Description
ConcurrentGrouperkind of misusesThreadLocalto hold aSpillingGrouper, and never callsremove()on it, which can result in large amounts of heap being retained as weak references even after grouping is finished.Its kind of difficult to rework this such that we use
ThreadLocaland actually callremove(), since theConcurrentGrouperis created on the 'qtp' threads, but theThreadLocalis set from the processing threads, so removing them from theThreadLocalMapof the processing threads is a bit tricksy.Poking around in the debugger sort of shows the problem, with several 'closed'

SpillingGrouperpresent in theThreadLocalMapof processing threads:The
ThreadLocalMapstores these all asWeakReference, so they should eventually be reclaimed, but it might take several GC cycles, so OOM can still occur.Rather than fixing usage of
ThreadLocalto call remove, a much lower budget way to make this less painful is forGrouper.close()to more aggressively free stuff up, sinceConcurrentGrouperdoes call close on all of the associatedSpillingGrouper, including those stored in theThreadLocalMap. The main thing using heap as far as I can tell is the dictionaries built by theKeySerde, so callingkeySerde.reset()on all of theGrouper.close()implementations which have aKeySerdeshould free up a bunch of space that is no longer needed.The
RowBasedKeySerdeimplementation ofresetwas missing clearing out the array dictionaries, so I also added that.I couldn't think of an easy way to write tests for this because its kind of stuffed down pretty deep, but if anyone has any ideas that aren't a ton of work I'm happy to add them.
This PR has: