Describe the bug
Although the Comet/Iceberg integration in #1920 and apache/iceberg#13378 appears to work, it requires both the Comet and Iceberg jars to be on the classpath, even though the Iceberg jar contains a copy of the Comet classes.
One reason that this is required is due to shading issues with the Arrow Java library. Arrow contains JNI classes that cannot be shaded because the fully qualified Java method names must match the native function names in the C code, and Comet does not relocate these classes, but Iceberg does. If the Comet jar is not on the classpath, then queries fail at runtime because Arrow cannot load the JNI classes.
java.lang.RuntimeException: Could not find class Lorg/apache/arrow/c/jni/PrivateData;
at java.base/java.lang.ClassLoader$NativeLibrary.load0(Native Method)
at java.base/java.lang.ClassLoader$NativeLibrary.load(ClassLoader.java:2450)
at java.base/java.lang.ClassLoader$NativeLibrary.loadLibrary(ClassLoader.java:2506)
at java.base/java.lang.ClassLoader.loadLibrary0(ClassLoader.java:2705)
at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2635)
at java.base/java.lang.Runtime.load0(Runtime.java:768)
at java.base/java.lang.System.load(System.java:1854)
at org.apache.comet.shaded.arrow.c.jni.JniLoader.load(JniLoader.java:90)
Another issue I ran into, which was potentially only happening due to a specific class loading order, is that Comet did not recognize the SupportsComet interface being implemented in Iceberg's SparkBatchQueryScan. I suspect that this was due to having two different versions of SupportsComet on the classpath (one in each jar), but I have not been able to prove that this is the case.
This issue is to review these issues and perhaps remove the requirement to have the Comet jar on the classpath and make the Iceberg jar work alone.
It may be beneficial to try and reduce the number of Arrow versions in use as well.
Spark 3.4.3 uses Arrow 11.0.0
Comet uses Arrow 18.3.0
Iceberg uses Arrow 15.0.2
Steps to reproduce
No response
Expected behavior
No response
Additional context
No response
Describe the bug
Although the Comet/Iceberg integration in #1920 and apache/iceberg#13378 appears to work, it requires both the Comet and Iceberg jars to be on the classpath, even though the Iceberg jar contains a copy of the Comet classes.
One reason that this is required is due to shading issues with the Arrow Java library. Arrow contains JNI classes that cannot be shaded because the fully qualified Java method names must match the native function names in the C code, and Comet does not relocate these classes, but Iceberg does. If the Comet jar is not on the classpath, then queries fail at runtime because Arrow cannot load the JNI classes.
Another issue I ran into, which was potentially only happening due to a specific class loading order, is that Comet did not recognize the
SupportsCometinterface being implemented in Iceberg'sSparkBatchQueryScan. I suspect that this was due to having two different versions ofSupportsCometon the classpath (one in each jar), but I have not been able to prove that this is the case.This issue is to review these issues and perhaps remove the requirement to have the Comet jar on the classpath and make the Iceberg jar work alone.
It may be beneficial to try and reduce the number of Arrow versions in use as well.
Spark 3.4.3 uses Arrow 11.0.0
Comet uses Arrow 18.3.0
Iceberg uses Arrow 15.0.2
Steps to reproduce
No response
Expected behavior
No response
Additional context
No response