CURATOR-487 Make GzipCompressionProvider to recycle Deflaters and Inflaters in pools#282
Conversation
|
Thanks for the PR @alexbrasetvik I will merge this shortly. |
|
I came to this issue late. Are we certain merging this was the right thing to do? Is there any additional referencing documentation of other projects doing something similar? I'm concerned about replacing a JDK library method. If the JDK authors had a better implementation surely they would've updated the JDK no? |
|
JDK authors did the right thing... in OpenJDK 12 (see JDK-8212129). When the minimum Curator requirement is bumped to at least JDK 12 (realistically: JDK 17, the next LTS version), this specialization code could be removed. |
|
However, the specialized code will still be more efficient: // Even when Curator's minimum supported Java version becomes
// no less than Java 12, where finalize() methods are removed
// in Deflater and Inflater classes and instead they are phantom-referenced
// via Cleaner, it still makes sense to avoid GZIPInputStream
// and GZIPOutputStream because phantom references are also not
// entirely free for GC algorithms, and also to allocate less garbage
// and make less unnecessary data copies. |
I don't see how it makes sense to avoid JDK library code. If what you say is true, why wouldn't they update the JDK? |
|
Because |
|
Can you point to other libraries that have taken the approach of re-writing these APIs? I see you opened https://issues.apache.org/jira/browse/COMPRESS-473. Are they taking this change as well? |
|
There is no peer evidence here, because we are on the optimization forefront. See apache/druid#6677 (comment) and https://lists.apache.org/thread.html/1aff123193cec5c385821b2d745a4e846a8a5786146c047acbdf8ea3@%3Cdev.druid.apache.org%3E. I've seen a Druid heap with more than 10k finalizable Deflater objects, about 8k of which were already dead, awaiting in the finalization queue. They come from Historically Druid uses Zookeeper somewhat wrong (not for what Zookeeper was designed): it announces data segment placement using Zookeeper, that leads to creation of a lot of new nodes in Zookeeper every second. It means that by accident, Druid is a good stress test for Zookeeper (and consequently for Curator), and we run probably the largest Druid cluster. |
|
OK - interesting. It might make sense to develop a general purpose project just for this. Larger projects like Curator could pull this new lib in via shading to avoid the dependency. |
This PR addresses https://issues.apache.org/jira/browse/CURATOR-487 by recycling Deflaters and Inflaters in static concurrent pools. Since Deflaters and Inflaters are acquired and returned to the pools in try-finally blocks that are free of blocking calls themselves, it's not expected that the number of objects in the pools could exceed the number of hardware threads on the machine much. Therefore it's accepted to have simple pools of strongly-referenced objects.
Just an interesting cross project reference, similar task in Jetty: jetty/jetty.project#300