-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Explicitly set the Pulsar function classloader #10922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explicitly set the Pulsar function classloader #10922
Conversation
pulsar-client-api/src/main/java/org/apache/pulsar/client/internal/ReflectionUtils.java
Show resolved
Hide resolved
|
Overall I like this approach. Please take a look to my comment |
eolivelli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I wasn't clear regarding the system property.
I may be 'pulsar.allow.override.system.classloader', it defaults to false.
If you set it to true then you can set a new value.
The value of the system property should be populated in a static block and put into a private final static method.
BTW lgtm in this form
eolivelli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@merlimat thanks for your contribution. For this PR, do we need to update docs? |
|
Tests are failing |
eolivelli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that the better way to deal with the classpath is to implement a simpler ClassLoader hierarchy.
In which the Function is loaded in a classloader (NarClassLoader) that is child of the CL that loads the Pulsar Client Implementation and it follows the standard CL delegation model (parent-first).
This way the function is forced to use the same version of the classes that are running within Pulsar core.
The downside is that if you want to use a library that is already inside Pulsar runtime, like Netty, you have to relocate it inside the function NAR.
When we choose to let GenericRecord.getNativeObject return the "native" object we drew this path.
Otherwise it is not possible for a Function to use the returned object.
for instance in a Sink:
void write(Record record) {
Object nativeAvroRecord = record.getValue().getNativeObject();
org.apache.avro.GenericRecord record = (xxx) nativeAvroRecord;
}
the org.apache.avro.GenericRecord class for "nativeAvroRecord" but be the same class that is loaded by the function to represent "record", otherwise you will see a ClassCastException.
If we implement "parent-first" strategy, when the JMV loads the class for "record" (using the same classloader that loaded the Sink class), it will return the same o.a.avro.GenericRecord class that built the "nativeAvroRecord" object.
I believe that there is no other way to do it.
@lhotari gave me a link to how Gradle solved a similar problem (if want to filter out some classes/resources and keep only a subset)
| static <T> Class<T> newClassInstance(String className) { | ||
| try { | ||
| public static <T> Class<T> newClassInstance(String className) { | ||
| return (Class<T>) loadedClasses.computeIfAbsent(className, name -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this cache ?
the JVM is very good in dealing with Class.forName.
| this.collectorRegistry = collectorRegistry; | ||
|
|
||
| this.instanceClassLoader = Thread.currentThread().getContextClassLoader(); | ||
| ReflectionUtils.setClassLoader(instanceClassLoader); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that this will not fix the problems we have.
Therefore I discovered this other bad problem about using AVRO (and JsonNode...)
#11274
basically now (2.8.0) you cannot use the org.apache.avro...GenericRecord returned by GenericRecord.getNativeObject()
|
An alternative approach: |
Motivation
When a function is trying to create objects through the client API that involve loading an implementation class, it will fail because it will attempt to use the function's own class loader, and that does not include Pulsar client implementation classes.
Instead, we should be recognizing that we are in "Pulsar Function" mode and directly use the framework class loader for all the implementation classes.