Skip to content

FrameFile: Java 17 compatibility.#12987

Merged
vogievetsky merged 5 commits intoapache:masterfrom
gianm:dsmem-17
Aug 30, 2022
Merged

FrameFile: Java 17 compatibility.#12987
vogievetsky merged 5 commits intoapache:masterfrom
gianm:dsmem-17

Conversation

@gianm
Copy link
Copy Markdown
Contributor

@gianm gianm commented Aug 27, 2022

DataSketches Memory.map is not Java 17 compatible, and from discussions
with the team, is challenging to make compatible with 17 while also
retaining compatibility with 8 and 11. So, in this patch, we switch away
from Memory.map and instead use the builtin JDK mmap functionality. Since
it only supports maps up to Integer.MAX_VALUE, we also implement windowed
mmaps in FrameFile, such that we can still handle large files.

Other changes:

  1. Add two new "map" functions to FileUtils, which we use in this patch.
  2. Add a footer checksum to the FrameFile format. Individual frames
    already have checksums, but the footer was missing one. Not directly
    related, but I thought of it since I was modifying the code that reads
    footers.

Unit tests use an artificially lower maxMmapSize, since I didn't think it was
going to be feasible to test with real 2GB+ files in unit tests.

This work is towards #12838.

DataSketches Memory.map is not Java 17 compatible, and from discussions
with the team, is challenging to make compatible with 17 while also
retaining compatibility with 8 and 11. So, in this patch, we switch away
from Memory.map and instead use the builtin JDK mmap functionality. Since
it only supports maps up to Integer.MAX_VALUE, we also implement windowing
in FrameFile, such that we can still handle large files.

Other changes:

1) Add two new "map" functions to FileUtils, which we use in this patch.
2) Add a footer checksum to the FrameFile format. Individual frames
   already have checksums, but the footer was missing one.
@lgtm-com
Copy link
Copy Markdown

lgtm-com Bot commented Aug 27, 2022

This pull request introduces 6 alerts and fixes 1 when merging 0724033 into 7e2371b - view on LGTM.com

new alerts:

  • 3 for Result of multiplication cast to wider type
  • 2 for Implicit narrowing conversion in compound assignment
  • 1 for Uncontrolled data used in path expression

fixed alerts:

  • 1 for Uncontrolled data used in path expression

@gianm
Copy link
Copy Markdown
Contributor Author

gianm commented Aug 27, 2022

The issues raised by LGTM are false alarms — the first two are about potential overflows that can't happen, and the last is about using task IDs in file paths, which is OK since they are path-safe. I pushed a commit to try to satisfy the alerts anyway.

@lgtm-com
Copy link
Copy Markdown

lgtm-com Bot commented Aug 27, 2022

This pull request introduces 1 alert and fixes 1 when merging 024ddad into 7e2371b - view on LGTM.com

new alerts:

  • 1 for Uncontrolled data used in path expression

fixed alerts:

  • 1 for Uncontrolled data used in path expression

@lgtm-com
Copy link
Copy Markdown

lgtm-com Bot commented Aug 29, 2022

This pull request introduces 1 alert and fixes 1 when merging 850e7fb into 0460d8a - view on LGTM.com

new alerts:

  • 1 for Uncontrolled data used in path expression

fixed alerts:

  • 1 for Uncontrolled data used in path expression

@lgtm-com
Copy link
Copy Markdown

lgtm-com Bot commented Aug 30, 2022

This pull request introduces 1 alert and fixes 1 when merging da29eb5 into 9eb20e5 - view on LGTM.com

new alerts:

  • 1 for Uncontrolled data used in path expression

fixed alerts:

  • 1 for Uncontrolled data used in path expression

Copy link
Copy Markdown
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite cool LGTM +1 non binding


try (final RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r");
final FileChannel channel = randomAccessFile.getChannel()) {
final MappedByteBuffer mappedByteBuffer = channel.map(FileChannel.MapMode.READ_ONLY, offset, length);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit should we use method at line 220 here ?


/**
* Mapped memory, starting from {@link #bufferOffset} in {@link #file}, up to max of {@link #maxMmapSize}. Acts as
* a window on the underlying file. Remapped using {@link #remapBuffer(long)}, freed using {@link #releaseBuffer()}.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we require benchmarks for this ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is we don't need to block this patch on it, but yes, we should definitely add performance tests / benchmarks for the frame stuff generally.

*
* - 2 bytes: {@link FrameFileWriter#MAGIC}
* - NNN bytes: sequence of {@link FrameFileWriter#MARKER_FRAME} followed by one compressed frame (see {@link Frame})
* - 1 byte: {@link FrameFileWriter#MARKER_NO_MORE_FRAMES}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add a frameVersion byte?
We might add more things to this in the future then things like snapshotting for fault tolerance becomes tricky

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've got this covered: there's version bytes for each frame, & if we need to version the entire frame file format, we can increment the MAGIC.

@gianm gianm added this to the 24.0.0 milestone Aug 30, 2022
Copy link
Copy Markdown
Contributor

@vogievetsky vogievetsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay for more Java support

@vogievetsky vogievetsky merged commit 2450b96 into apache:master Aug 30, 2022
599166320 pushed a commit to 599166320/apache-druid that referenced this pull request Aug 30, 2022
* FrameFile: Java 17 compatibility.

DataSketches Memory.map is not Java 17 compatible, and from discussions
with the team, is challenging to make compatible with 17 while also
retaining compatibility with 8 and 11. So, in this patch, we switch away
from Memory.map and instead use the builtin JDK mmap functionality. Since
it only supports maps up to Integer.MAX_VALUE, we also implement windowing
in FrameFile, such that we can still handle large files.

Other changes:

1) Add two new "map" functions to FileUtils, which we use in this patch.
2) Add a footer checksum to the FrameFile format. Individual frames
   already have checksums, but the footer was missing one.

* Changes for static analysis.

* wip

* Fixes.
gianm added a commit to gianm/druid that referenced this pull request Aug 30, 2022
* FrameFile: Java 17 compatibility.

DataSketches Memory.map is not Java 17 compatible, and from discussions
with the team, is challenging to make compatible with 17 while also
retaining compatibility with 8 and 11. So, in this patch, we switch away
from Memory.map and instead use the builtin JDK mmap functionality. Since
it only supports maps up to Integer.MAX_VALUE, we also implement windowing
in FrameFile, such that we can still handle large files.

Other changes:

1) Add two new "map" functions to FileUtils, which we use in this patch.
2) Add a footer checksum to the FrameFile format. Individual frames
   already have checksums, but the footer was missing one.

* Changes for static analysis.

* wip

* Fixes.
gianm added a commit that referenced this pull request Aug 31, 2022
* FrameFile: Java 17 compatibility.

DataSketches Memory.map is not Java 17 compatible, and from discussions
with the team, is challenging to make compatible with 17 while also
retaining compatibility with 8 and 11. So, in this patch, we switch away
from Memory.map and instead use the builtin JDK mmap functionality. Since
it only supports maps up to Integer.MAX_VALUE, we also implement windowing
in FrameFile, such that we can still handle large files.

Other changes:

1) Add two new "map" functions to FileUtils, which we use in this patch.
2) Add a footer checksum to the FrameFile format. Individual frames
   already have checksums, but the footer was missing one.

* Changes for static analysis.

* wip

* Fixes.
@gianm gianm deleted the dsmem-17 branch September 23, 2022 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants