Skip to content

CURATOR-487 Make GzipCompressionProvider to recycle Deflaters and Inflaters in pools#282

Merged
asfgit merged 2 commits intoapache:masterfrom
leventov:GzipCompressionProvider-references
Dec 10, 2018
Merged

CURATOR-487 Make GzipCompressionProvider to recycle Deflaters and Inflaters in pools#282
asfgit merged 2 commits intoapache:masterfrom
leventov:GzipCompressionProvider-references

Conversation

@leventov
Copy link
Copy Markdown
Member

This PR addresses https://issues.apache.org/jira/browse/CURATOR-487 by recycling Deflaters and Inflaters in static concurrent pools. Since Deflaters and Inflaters are acquired and returned to the pools in try-finally blocks that are free of blocking calls themselves, it's not expected that the number of objects in the pools could exceed the number of hardware threads on the machine much. Therefore it's accepted to have simple pools of strongly-referenced objects.

Just an interesting cross project reference, similar task in Jetty: jetty/jetty.project#300

@cammckenzie
Copy link
Copy Markdown
Contributor

Thanks for the PR @alexbrasetvik I will merge this shortly.

@asfgit asfgit merged commit 7467e59 into apache:master Dec 10, 2018
@Randgalt
Copy link
Copy Markdown
Member

I came to this issue late. Are we certain merging this was the right thing to do? Is there any additional referencing documentation of other projects doing something similar? I'm concerned about replacing a JDK library method. If the JDK authors had a better implementation surely they would've updated the JDK no?

@leventov
Copy link
Copy Markdown
Member Author

leventov commented Dec 10, 2018

JDK authors did the right thing... in OpenJDK 12 (see JDK-8212129). When the minimum Curator requirement is bumped to at least JDK 12 (realistically: JDK 17, the next LTS version), this specialization code could be removed.

@leventov leventov deleted the GzipCompressionProvider-references branch December 10, 2018 14:05
@leventov
Copy link
Copy Markdown
Member Author

However, the specialized code will still be more efficient:

    // Even when Curator's minimum supported Java version becomes
    // no less than Java 12, where finalize() methods are removed
    // in Deflater and Inflater classes and instead they are phantom-referenced
    // via Cleaner, it still makes sense to avoid GZIPInputStream
    // and GZIPOutputStream because phantom references are also not
    // entirely free for GC algorithms, and also to allocate less garbage
    // and make less unnecessary data copies.

@Randgalt
Copy link
Copy Markdown
Member

// it still makes sense to avoid GZIPInputStream
// and GZIPOutputStream

I don't see how it makes sense to avoid JDK library code. If what you say is true, why wouldn't they update the JDK?

@leventov
Copy link
Copy Markdown
Member Author

Because GZIPInputStream and GZIPOutputStream are more generic mechanisms, they are able to compress / decompress arbitrary streams of data, of unknown length, available only byte-by-byte, etc. Also, GZIPInputStream is capable of decompressing series of GZipped sequences, following one after another. In GzipCompressionProvider, the problem statement is much more narrow: compress/decompress byte[]. When decompressing, the array is expected to contain exactly one GZIP sequence.

@Randgalt
Copy link
Copy Markdown
Member

Can you point to other libraries that have taken the approach of re-writing these APIs? I see you opened https://issues.apache.org/jira/browse/COMPRESS-473. Are they taking this change as well?

@leventov
Copy link
Copy Markdown
Member Author

There is no peer evidence here, because we are on the optimization forefront. See apache/druid#6677 (comment) and https://lists.apache.org/thread.html/1aff123193cec5c385821b2d745a4e846a8a5786146c047acbdf8ea3@%3Cdev.druid.apache.org%3E.

I've seen a Druid heap with more than 10k finalizable Deflater objects, about 8k of which were already dead, awaiting in the finalization queue. They come from GzipCompressionProvider.

Historically Druid uses Zookeeper somewhat wrong (not for what Zookeeper was designed): it announces data segment placement using Zookeeper, that leads to creation of a lot of new nodes in Zookeeper every second. It means that by accident, Druid is a good stress test for Zookeeper (and consequently for Curator), and we run probably the largest Druid cluster.

@Randgalt
Copy link
Copy Markdown
Member

OK - interesting. It might make sense to develop a general purpose project just for this. Larger projects like Curator could pull this new lib in via shading to avoid the dependency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants