Skip to content

fix(java): resolve JNI classloader bug on dispatcher thread in Spark#6549

Open
summaryzb wants to merge 3 commits intolance-format:mainfrom
summaryzb:fix/jni-classloader-bug
Open

fix(java): resolve JNI classloader bug on dispatcher thread in Spark#6549
summaryzb wants to merge 3 commits intolance-format:mainfrom
summaryzb:fix/jni-classloader-bug

Conversation

@summaryzb
Copy link
Copy Markdown

@summaryzb summaryzb commented Apr 16, 2026

Summary

Fix a JNI classloader bug where the AsyncScanner class could not be found on the lance-jni-dispatcher thread in Spark executor environments. Spark loads user JARs through an isolated MutableURLClassLoader/ChildFirstURLClassLoader chain that is only installed on Spark's managed task threads — JNI native threads attached via attach_current_thread_permanently() never receive this classloader and fall back to the JVM system classloader, which cannot see user classes.

Problem

Error log

thread 'lance-jni-dispatcher' panicked at src/dispatcher.rs:44:22:
AsyncScanner class not found: JavaException
java.lang.NoClassDefFoundError: org/lance/ipc/AsyncScanner
Caused by: java.lang.ClassNotFoundException: org.lance.ipc.AsyncScanner

In Spark executor environments, user JARs (including lance) are not on the JVM system classpath. Instead, Spark constructs a custom classloader chain during Executor initialization (Executor.scala:newSessionState()):

systemLoader (JVM default)
  └── MutableURLClassLoader / ChildFirstURLClassLoader (user JARs, --jars, dynamic deps)
        └── ExecutorClassLoader (optional, for REPL classes)

This classloader is installed per-thread inside TaskRunner.run() (Executor.scala:580):

Thread.currentThread.setContextClassLoader(isolatedSession.replClassLoader)

Only Spark's managed task-pool threads receive this call. When lance's Rust dispatcher thread calls jvm.attach_current_thread_permanently(), the JVM creates a new Java thread that:

  • Was not created by Spark's cached thread pool
  • Never goes through TaskRunner.run()
  • Never has setContextClassLoader() called with Spark's custom classloader
  • Therefore retains the JVM default system classloader, which lacks user JARs

When the dispatcher thread then called env.find_class("org/lance/ipc/AsyncScanner"), the JVM used this system classloader to resolve the class and failed:

This is a well-known JNI pitfall: native threads attached to the JVM via AttachCurrentThread / attach_current_thread_permanently always get the system classloader, not the application classloader that originally loaded the library.

Approach

The fix leverages the fact that JNI_OnLoad is called on the thread that triggered System.loadLibrary(), which has access to the correct application classloader. The class resolution is moved to JNI_OnLoad, where a GlobalRef to the AsyncScanner class is created. This GlobalRef is then passed into Dispatcher::initialize, so the dispatcher thread never needs to call find_class at all -- it uses the pre-resolved class reference to look up method IDs.

This approach is the standard JNI pattern for solving classloader issues: resolve classes eagerly on a thread with the right classloader and cache GlobalRefs for later use on native threads.

Move AsyncScanner class resolution from the JNI dispatcher thread to
JNI_OnLoad, which runs on the thread with the correct application
classloader. Pass a GlobalRef to the dispatcher so it never calls
FindClass on a native thread that only has the system classloader.

In Spark executors, user JARs are loaded by an isolated
MutableURLClassLoader/ChildFirstURLClassLoader chain installed
per-thread inside TaskRunner.run(). JNI native threads attached via
attach_current_thread_permanently() never receive this classloader,
causing ClassNotFoundException for org.lance.ipc.AsyncScanner.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Change-Id: I166bec22f7373e6140cdc20c573723bcaeff48e7
@github-actions github-actions Bot added bug Something isn't working java labels Apr 16, 2026
Change-Id: Ic2a8e094a5fd600d8148853bdf930fe70261897b
@summaryzb
Copy link
Copy Markdown
Author

Fix #6577

@summaryzb
Copy link
Copy Markdown
Author

@beinan PTAL, this issue is introduced in #6102

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working java

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant