FrameFile: Java 17 compatibility.#12987
Conversation
DataSketches Memory.map is not Java 17 compatible, and from discussions with the team, is challenging to make compatible with 17 while also retaining compatibility with 8 and 11. So, in this patch, we switch away from Memory.map and instead use the builtin JDK mmap functionality. Since it only supports maps up to Integer.MAX_VALUE, we also implement windowing in FrameFile, such that we can still handle large files. Other changes: 1) Add two new "map" functions to FileUtils, which we use in this patch. 2) Add a footer checksum to the FrameFile format. Individual frames already have checksums, but the footer was missing one.
|
This pull request introduces 6 alerts and fixes 1 when merging 0724033 into 7e2371b - view on LGTM.com new alerts:
fixed alerts:
|
|
The issues raised by LGTM are false alarms — the first two are about potential overflows that can't happen, and the last is about using task IDs in file paths, which is OK since they are path-safe. I pushed a commit to try to satisfy the alerts anyway. |
|
This pull request introduces 1 alert and fixes 1 when merging 024ddad into 7e2371b - view on LGTM.com new alerts:
fixed alerts:
|
|
This pull request introduces 1 alert and fixes 1 when merging 850e7fb into 0460d8a - view on LGTM.com new alerts:
fixed alerts:
|
|
This pull request introduces 1 alert and fixes 1 when merging da29eb5 into 9eb20e5 - view on LGTM.com new alerts:
fixed alerts:
|
cryptoe
left a comment
There was a problem hiding this comment.
Quite cool LGTM +1 non binding
|
|
||
| try (final RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r"); | ||
| final FileChannel channel = randomAccessFile.getChannel()) { | ||
| final MappedByteBuffer mappedByteBuffer = channel.map(FileChannel.MapMode.READ_ONLY, offset, length); |
There was a problem hiding this comment.
Nit should we use method at line 220 here ?
|
|
||
| /** | ||
| * Mapped memory, starting from {@link #bufferOffset} in {@link #file}, up to max of {@link #maxMmapSize}. Acts as | ||
| * a window on the underlying file. Remapped using {@link #remapBuffer(long)}, freed using {@link #releaseBuffer()}. |
There was a problem hiding this comment.
Do we require benchmarks for this ?
There was a problem hiding this comment.
My thinking is we don't need to block this patch on it, but yes, we should definitely add performance tests / benchmarks for the frame stuff generally.
| * | ||
| * - 2 bytes: {@link FrameFileWriter#MAGIC} | ||
| * - NNN bytes: sequence of {@link FrameFileWriter#MARKER_FRAME} followed by one compressed frame (see {@link Frame}) | ||
| * - 1 byte: {@link FrameFileWriter#MARKER_NO_MORE_FRAMES} |
There was a problem hiding this comment.
Should we also add a frameVersion byte?
We might add more things to this in the future then things like snapshotting for fault tolerance becomes tricky
There was a problem hiding this comment.
We've got this covered: there's version bytes for each frame, & if we need to version the entire frame file format, we can increment the MAGIC.
vogievetsky
left a comment
There was a problem hiding this comment.
Yay for more Java support
* FrameFile: Java 17 compatibility. DataSketches Memory.map is not Java 17 compatible, and from discussions with the team, is challenging to make compatible with 17 while also retaining compatibility with 8 and 11. So, in this patch, we switch away from Memory.map and instead use the builtin JDK mmap functionality. Since it only supports maps up to Integer.MAX_VALUE, we also implement windowing in FrameFile, such that we can still handle large files. Other changes: 1) Add two new "map" functions to FileUtils, which we use in this patch. 2) Add a footer checksum to the FrameFile format. Individual frames already have checksums, but the footer was missing one. * Changes for static analysis. * wip * Fixes.
* FrameFile: Java 17 compatibility. DataSketches Memory.map is not Java 17 compatible, and from discussions with the team, is challenging to make compatible with 17 while also retaining compatibility with 8 and 11. So, in this patch, we switch away from Memory.map and instead use the builtin JDK mmap functionality. Since it only supports maps up to Integer.MAX_VALUE, we also implement windowing in FrameFile, such that we can still handle large files. Other changes: 1) Add two new "map" functions to FileUtils, which we use in this patch. 2) Add a footer checksum to the FrameFile format. Individual frames already have checksums, but the footer was missing one. * Changes for static analysis. * wip * Fixes.
* FrameFile: Java 17 compatibility. DataSketches Memory.map is not Java 17 compatible, and from discussions with the team, is challenging to make compatible with 17 while also retaining compatibility with 8 and 11. So, in this patch, we switch away from Memory.map and instead use the builtin JDK mmap functionality. Since it only supports maps up to Integer.MAX_VALUE, we also implement windowing in FrameFile, such that we can still handle large files. Other changes: 1) Add two new "map" functions to FileUtils, which we use in this patch. 2) Add a footer checksum to the FrameFile format. Individual frames already have checksums, but the footer was missing one. * Changes for static analysis. * wip * Fixes.
DataSketches Memory.map is not Java 17 compatible, and from discussions
with the team, is challenging to make compatible with 17 while also
retaining compatibility with 8 and 11. So, in this patch, we switch away
from Memory.map and instead use the builtin JDK mmap functionality. Since
it only supports maps up to Integer.MAX_VALUE, we also implement windowed
mmaps in FrameFile, such that we can still handle large files.
Other changes:
already have checksums, but the footer was missing one. Not directly
related, but I thought of it since I was modifying the code that reads
footers.
Unit tests use an artificially lower
maxMmapSize, since I didn't think it wasgoing to be feasible to test with real 2GB+ files in unit tests.
This work is towards #12838.