FileUtils: Sync directory entry too on writeAtomically.#6677
FileUtils: Sync directory entry too on writeAtomically.#6677leventov merged 5 commits intoapache:masterfrom
Conversation
See the fsync(2) man page for why this is important: https://linux.die.net/man/2/fsync This also plumbs CompressionUtils's "zip" function through writeAtomically, so the code for handling atomic local filesystem writes is all done in the same place.
| if (fsync) { | ||
| return FileUtils.writeAtomically(outputZipFile, out -> zip(directory, out)); | ||
| } else { | ||
| try (final FileOutputStream out = new FileOutputStream(outputZipFile)) { |
There was a problem hiding this comment.
Please don't use FileOutputStream and FileInputStream in any new code (and properly they should be eliminated from old code too), because they are finalizable. Please use a static factory method in Files
There was a problem hiding this comment.
Oh, interesting, I didn't know that. I'll change it.
Do you have a sense of roughly how expensive a finalizer is? Like is it bad enough that we should care a lot about adjusting old code, or not so bad, and we should only care for adjusting old code that creates a lot of objects (maybe code that creates lots of short lived tmp files).
From looking around online it seems that the JDK will probably remove finalization from these classes eventually (https://bugs.openjdk.java.net/browse/JDK-8187325).
There was a problem hiding this comment.
Well, now that I look, it seems like there aren't that many anyway, so maybe not a big deal to adjust them all.
There was a problem hiding this comment.
Personally I've only heard that all types of references from java.lang.ref are somehow bad for GC and could lead to effective memory leaks, both especially true for finalizable objects. But I cannot prove that.
Anecdote:
We are constantly suffering GC and apparent memory leak problems with pretty much all Druid node types, that only gets worse with Druid updates, but we cannot understand the reasons. We only try to make incremental improvements on this front, such as #6370 and other PRs.
On the suffering nodes, we cannot made full heap dumps to anylize them because heaps are too big, but we can see heap stats (a-la jmap -histo) and they show that there are, for example, on some Historical node: more than 10k of finalizable objects on the heap (determined by the number of java.lang.ref.Finalizer objects on the heap), tens of thousands of java.lang.ref.PhantomReferences, and several thousands of both WeakReferences and SoftReferences. I don't know where the hell all this comes from. (Except for PhantomReferences that are apparently connected to direct ByteBuffers, #4773 could help here.)
There was a problem hiding this comment.
finalize() in FileInputStream and FileOutputStream are removed in OpenJDK 12: https://bugs.openjdk.java.net/browse/JDK-8192939 but Druid won't probably enjoy that until OpenJDK 17, because versions 12 through 16 are those "unsupported" ones that nobody would probably risk to use ever: https://blog.joda.org/2018/10/adopt-java-12-or-stick-on-11.html
Another minor reason to prefer Files static factory methods is that they provide explicit control over how do you open the file with OpenOptions. FileOutputStream just always creates a file if it doesn't exist, or truncates an existing file, and this could not be configured.
There was a problem hiding this comment.
We are constantly suffering GC and apparent memory leak problems with pretty much all Druid node types, that only gets worse with Druid updates, but we cannot understand the reasons. We only try to make incremental improvements on this front, such as #6370 and other PRs.
Hm interesting. I did a jmap -histo on a historical of a fairly active Druid cluster (~150 queries/sec, ~5000 segments/server) and I see about 900 WeakReferences, 200 SoftReferences, no PhantomReferences, and 1100 Finalizers. It is typically doing 0.5-2sec of GC time per minute (based on the jvm/gc/time metric). It has been running for a month without being restarted, and avg heap occupancy / gc time hasn't changed much in that time; so in that sense, I don't see signs of memory leaks. GC and heap options are -XX:+UseG1GC -XX:MaxGCPauseMillis=500 -Xms24g -Xmx24g. Is your experience different from this?
There was a problem hiding this comment.
I don't understand how you don't see any PhantomReferences. Aren't there any segments loaded? Each loaded segment means one (or several) direct ByteBuffers (memory mapping). Each direct ByteBuffer does have a Cleaner, that means a PhantomReference.
There was a problem hiding this comment.
I see 5183 io.druid.segment.QueryableIndexSegment objects (loaded segments), 2.6 million java.nio.DirectByteBufferR, 140k java.nio.DirectByteBuffer, 7422 sun.misc.Cleaner, but no PhantomReferences. This is on OpenJDK 1.8.0_181 and Druid 0.12.3-iap8 (like 0.12.3, but with some extra patches, see https://docs.imply.io/2.7.9/on-prem/misc/release).
In my JDK, Cleaner extends PhantomReference, so maybe that's why there are no explicit PhantomReference objects. The Cleaners are PhantomReferences. It doesn't explain why you do see them though.
There was a problem hiding this comment.
Right, sorry for confusion. I have Cleaners too, just associated them with PhantomReferences.
There was a problem hiding this comment.
Got it. So, sure, I do see a few thousand of those, on a box that has 5000 segments loaded. & that box does about 0.5s-2s of GC activity per minute.
| try { | ||
| final T retVal; | ||
|
|
||
| try (final FileOutputStream out = new FileOutputStream(tmpFile)) { |
There was a problem hiding this comment.
Please use FileChannel.open(...) and then Channels.newOutputStream()
| finally { | ||
| tmpFile.delete(); | ||
| if (tmpFile.exists()) { | ||
| if (!tmpFile.delete()) { |
There was a problem hiding this comment.
Files.delete() will allow to see the reason of the failure (IO error or file not found, or permissions, etc.)
|
Pushed an update. |
| try { | ||
| try (final FileOutputStream out = new FileOutputStream(tmpFile)) { | ||
| //noinspection unused | ||
| try (final Closeable deleter = () -> java.nio.file.Files.deleteIfExists(tmpFile.toPath())) { |
There was a problem hiding this comment.
Now IOExceptions that could be thrown from deleteIfExists() are propagated
There was a problem hiding this comment.
I think that's fine. It would be a shock to have it fail, and maybe it's best to propagate in that case.
| StandardOpenOption.WRITE, | ||
| StandardOpenOption.CREATE_NEW | ||
| ); | ||
| final OutputStream out = Channels.newOutputStream(fileChannel) |
There was a problem hiding this comment.
Maybe a bit clearer to factor this as a nested try-with-resource block and omit out.flush()
There was a problem hiding this comment.
The fileChannel.force (fsync) has to be done while the stream is still open, so we can't close the OutputStream before that happens. So that's why the flush was first. I just pushed a change with some more comments describing what's going on.
But, I also removed the out.flush completely in this commit -- Channels.newOutputStream is specced as unbuffered, so a flush isn't necessary.
There was a problem hiding this comment.
Right, thanks for researching that.
* FileUtils: Sync directory entry too on writeAtomically. See the fsync(2) man page for why this is important: https://linux.die.net/man/2/fsync This also plumbs CompressionUtils's "zip" function through writeAtomically, so the code for handling atomic local filesystem writes is all done in the same place. * Remove unused import. * Avoid FileOutputStream. * Allow non-atomic writes to overwrite. * Add some comments. And no need to flush an unbuffered stream.
* FileUtils: Sync directory entry too on writeAtomically. See the fsync(2) man page for why this is important: https://linux.die.net/man/2/fsync This also plumbs CompressionUtils's "zip" function through writeAtomically, so the code for handling atomic local filesystem writes is all done in the same place. * Remove unused import. * Avoid FileOutputStream. * Allow non-atomic writes to overwrite. * Add some comments. And no need to flush an unbuffered stream.
See the fsync(2) man page for why this is important:
https://linux.die.net/man/2/fsync
This also plumbs CompressionUtils's "zip" function through
writeAtomically, so the code for handling atomic local filesystem
writes is all done in the same place.