Skip to content

Fix shading issues with Iceberg integration #1934

@andygrove

Description

@andygrove

Describe the bug

Although the Comet/Iceberg integration in #1920 and apache/iceberg#13378 appears to work, it requires both the Comet and Iceberg jars to be on the classpath, even though the Iceberg jar contains a copy of the Comet classes.

One reason that this is required is due to shading issues with the Arrow Java library. Arrow contains JNI classes that cannot be shaded because the fully qualified Java method names must match the native function names in the C code, and Comet does not relocate these classes, but Iceberg does. If the Comet jar is not on the classpath, then queries fail at runtime because Arrow cannot load the JNI classes.

java.lang.RuntimeException: Could not find class Lorg/apache/arrow/c/jni/PrivateData;
	at java.base/java.lang.ClassLoader$NativeLibrary.load0(Native Method)
	at java.base/java.lang.ClassLoader$NativeLibrary.load(ClassLoader.java:2450)
	at java.base/java.lang.ClassLoader$NativeLibrary.loadLibrary(ClassLoader.java:2506)
	at java.base/java.lang.ClassLoader.loadLibrary0(ClassLoader.java:2705)
	at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2635)
	at java.base/java.lang.Runtime.load0(Runtime.java:768)
	at java.base/java.lang.System.load(System.java:1854)
	at org.apache.comet.shaded.arrow.c.jni.JniLoader.load(JniLoader.java:90)

Another issue I ran into, which was potentially only happening due to a specific class loading order, is that Comet did not recognize the SupportsComet interface being implemented in Iceberg's SparkBatchQueryScan. I suspect that this was due to having two different versions of SupportsComet on the classpath (one in each jar), but I have not been able to prove that this is the case.

This issue is to review these issues and perhaps remove the requirement to have the Comet jar on the classpath and make the Iceberg jar work alone.

It may be beneficial to try and reduce the number of Arrow versions in use as well.

Spark 3.4.3 uses Arrow 11.0.0
Comet uses Arrow 18.3.0
Iceberg uses Arrow 15.0.2

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions