use the latest release of datasketches#8647
Conversation
There was a problem hiding this comment.
👍 thanks for updating this
Could you update the sketches-core and memory entries in license.yaml and add
<dependency>
<groupId>org.apache.datasketches</groupId>
<artifactId>datasketches-memory</artifactId>
<version>1.1.0-incubating</version>
</dependency>
to the datasketches extension pom.xml, per this travis failure?
After doing both of those things it looks like travis should pass.
clintropolis
left a comment
There was a problem hiding this comment.
It looks like some of the JDK 11 unit tests are failing:
[ERROR] roundMergeSketch[v1](org.apache.druid.query.aggregation.datasketches.hll.HllSketchAggregatorTest) Time elapsed: 1.092 s <<< ERROR!
java.lang.ExceptionInInitializerError: Unsupported JDK Major Version, must be 1.8: 11.0.2
at org.apache.datasketches.memory.UnsafeUtil.checkJavaVersion(UnsafeUtil.java:147)
at org.apache.datasketches.memory.UnsafeUtil.parseJavaVersion(UnsafeUtil.java:141)
...
[ERROR] roundBuildSketch[v1](org.apache.druid.query.aggregation.datasketches.hll.HllSketchAggregatorTest) Time elapsed: 0.176 s <<< ERROR!
java.lang.NoClassDefFoundError: Could not initialize class org.apache.datasketches.memory.UnsafeUtil
at org.apache.datasketches.memory.WritableMemory.wrap(WritableMemory.java:322)
at org.apache.datasketches.memory.WritableMemory.wrap(WritableMemory.java:289)
...
Is the updated version more picky about which Java versions are supported?
|
Our memory library, as it is now, works with JDK 8 only. And it was like this for a long time, but it was not enforced before. |
|
Our Memory and core Java component uses Unsafe and these components have not been recoded to accommodate the new Java modules. I am not familiar with how these issues are addressed with OpenJDK. So if anyone on the Druid side has experience on how to get around these issues we would appreciate clues as to how to address them. |
|
By the way, for anyone looking at this from the Druid side, there's been a related conversation on the DataSketches dev list: https://lists.apache.org/thread.html/31aadc6418bd52867676ac02930d9f78eef1e239f9ef6024f2dc38d0@%3Cdev.datasketches.apache.org%3E |
|
@gianm @leventov
Druid is still compiling and targeting Java 1.8. Yet for PRs apparently, you are also requiring compatibility with JDK 9+. The previous (prior to Apache) DataSketches releases 0.X.Y and the newest Apache Release 1.1.0-incubating will not run in JDK 9+ because of our use of Unsafe, Cleaner, Bits, and internal access to ByteBuffer. I have verified this fact by my own testing against OpenJDK11. You can read my study on our ASF slack channel "datasketches-dev", which also reveals some performance consequences of moving to JDK11 as well. We have started to investigate this issue as mentioned by Gian above, but to fix this will require some major rewrites and will likely take us quite awhile to do. This newest Apache release has some performance improvements and bug fixes that were specifically targeted for Druid. So if you want those improvements please allow this PR to go through. As long as Druid is running on JDK 8, there is no problem. Thanks, Lee. |
|
I think it would be reasonable to modify the tests to be skipped for now on newer java versions and we add this to list of known issues with running newer versions of java, though I'm not sure the best way to do that. There is a function in |
|
Hey Druid team, can we get some traction on this issue? Folks are asking questions about sketches almost every day on issues or in the druid Google user group. |
I guess the quickest way would be to modify travis config for 'other modules test' and 'other modules test (SQL compatibility)' to skip datasketches extension for jdk11 only? Anyone else have any better ideas? |
And this makes total sense, since the project currently only supports Java 8, but we are actively trying to move to supporting both Java 8 and Java 11 (see #5589). So the 'right direction' for any new patches is towards increasing Java 11 compatibility.
I think since datasketches is known to not work properly on Java 11, @clintropolis's suggestion here is reasonable. We should definitely keep running the Java 11 tests on other modules. Actually, one weird thing is that currently, on the older datasketches version, the extension tests are passing! Based on @leerho's comment, they should be broken. So before implementing @clintropolis's suggestion to disable them on Java 11, it'd be good to figure out why they work today (we are probably missing some coverage of direct-buffer methods) and shore that up so the tests fail on the older datasketches version too. Then, commit that test, disabling the Java 11 tests in CI, upgrade to the latest datasketches, and raise a follow-up issue linked to #5589. |
|
I would recommend skipping jdk9+. If the same tests are currently running against the older DS version then most likely you are not testing in true direct (off-heap) mode. The latest DS version (with the not-JDK8 test disabled) operating on-heap works fine on JDK11. But it is not a solution for Druid if it only works on-heap. This is why the JDK8 test is there. |
|
As a head up, I'm considering moving DataSketches from extensions to core as part of #8769. I've started a thread on the dev mailing list to discuss the pros/cons. |
|
@gianm |
|
removed the JDK version check (back to how it was before) |
|
@AlexanderSaydakov Thanks, must be annoying but appreciated. It looks like CI validations are choking on licenses.yaml and this now-unused dependency: <dependency>
<groupId>com.yahoo.datasketches</groupId>
<artifactId>sketches-core</artifactId>
<version>0.13.4</version>
<scope>provided</scope>
</dependency>If you remove that and update licenses.yaml the build should pass and we could merge it. |
|
Btw, @ccaominh, are you still thinking about moving the DataSketches dependency to core Druid? (For ref, here's the thread he was talking about: https://lists.apache.org/thread.html/9188e70c0ea4239e50a2b097c8c044e72d4a1cd4bb7693917609cb0f@%3Cdev.druid.apache.org%3E) Maybe we can help out with Java 11-ifying DataSketches for real? @leerho @AlexanderSaydakov how far has that project got and is there stuff you could use help with? (I can't commit anything right now but I'd like to at least consider helping out!) |
|
@gianm Nonetheless, I looked at the errors generated in this last run and came up with the following: LICENSE CHECKS FAILING:It appears that the AND As well as the master DEPENDENCY CHECKS FAILING:Part of these failures are due to the above:
However, I also ran this test:
I got 2 warnings: (which the test reports as errors): <prerequisites> specificationThis is already documented in our POM file: This warning wants us to remove the <prerequisites … /> specification. We already use the enforcer plugin. But this causes another failure with the “mvn versions” command. Unused Dependencies foundThe reason we added the org.slf4j:slf4j-simple:jar:1.7.27 to the dependencies was to eliminate a warning every time we run the test suite that the “slf4j-api” was not being supplied with an implementation thus the slf4j becomes a No-op. Again, this is another "catch-22" where trying to clean up one warning now causes another warning. OPENJDK8 OTHER MODULES TEST:OPENJDK11 OTHER MODUES TEST:OPENJDK8 OTHER MODULES TEST (SQL Compatibility):OPENJDK11 OTHER MODULES TEST (SQL Compatibility):All of these are caused by the fact that the old com.yahoo.sketches.* code is still in the system and not Conclusion:It appears that by updating the Your help in expediting this would be greatly appreciated! Lee. |
|
As you your earlier question of why the old datasketches code seems to work under JDK11, I have a pretty good guess. It turns out that Druid does not actually use the advanced capability of Memory to allocate memory off heap, e.g. Meanwhile, we do use Unsafe functions for reading and writing primitives, which, for the time-being are deprecated but still exist in the JDK11. Yes, eventually we will have to deal with this issue. |
| <artifactId>sketches-core</artifactId> | ||
| <groupId>org.apache.datasketches</groupId> | ||
| <artifactId>datasketches-java</artifactId> | ||
| <version>1.1.0-incubating</version> |
There was a problem hiding this comment.
How about defining the version in the root POM (to ensure the value here is consistent with the one in extensions-core/datasketches/pom.xml)?
|
Yes, ideally, we can move datasketches to core. Currently, the implementation of superbatch range partitioning requires adding the datasketches extension to the classpath, which is not ideal. |
|
I updated dependencies, but there are still some build failures. I do not understand why does this happen: There is some problem with SQL tests. I am not familiar with this part of the code. It seems to me that someone set up very rigid tests asserting exact binary match of the resulting sketches: |
|
I ran |
It is, but we exclude it: <dependency>
<groupId>org.apache.calcite</groupId>
<artifactId>calcite-core</artifactId>
<exclusions>
<!--
~ Calcite 1.15.0 uses sketches-core 0.9.0 for a profiling feature:
~ (https://issues.apache.org/jira/browse/CALCITE-1616).
~ This conflicts with druid-datasketches which uses a newer version of sketches-core.
~ Druid does not use this Calcite profiling feature, so we exclude sketches-core here.
-->
<exclusion>
<groupId>com.yahoo.datasketches</groupId>
<artifactId>sketches-core</artifactId>
</exclusion>
<!--
~ We don't use Calcite's YAML features.
-->
<exclusion>
<groupId>com.fasterxml.jackson.dataformat</groupId>
<artifactId>jackson-dataformat-yaml</artifactId>
</exclusion>
<exclusion>
<groupId>org.yaml</groupId>
<artifactId>snakeyaml</artifactId>
</exclusion>
</exclusions>
</dependency>
|
For this, I'd just update the tests to whatever the datasketches library is outputting now. |
|
Thanks all for sticking through this one, it looks good now. We will need to revisit the DataSketches-on-Java-11 issue when we move forwards with officially supporting Java 11. |
|
@leerho I can help look into the jdk11 compatibility issues. I've spent a fair amount of time working on dual jdk8 and jdk11 support for Druid's use of internal jdk APIs, and – while a bit tedious – it's not too hard once you spot the patterns. Hopefully, I'll have some time in the coming weeks. |
|
@xvrl I did some poking around and found that an updated Unsafe still exists in JDK11, although for how long I don't know. I did some timing measurements and discovered that unsafe calls are noticeably slower in JDK11 than JDK8. Possibly some redirect has been inserted. Do you have any explanation? It is my understanding, that the intention is that var-handles and method handles are to replace unsafe calls. The way we use Unsafe is mostly to read-write primitives and arrays into blobs of bytes similar to what one can do with the C union / struct. If we can replicate that with these var-handles, then we will probably be OK. We will also likely need to setup Multi-Release Jars (MRJARS) to accommodate both JDK8 and JDK11 code sets. I have no experience in doing this. Thanks again for your offer of help!! Lee. |
Description
This is to update the core extension of datasketches to the latest release apache-datasketches-1.1.0-incubating.
The package name has changed from com.yahoo.sketches to org.apache.datasketches