Lazy initialization for JavaScript functions#4871
Conversation
|
If the javascript in such cases is programmatically generated, would it make more sense to intern/cache the compiled form or similar? Moving the failure feedback for malformed javascript so late in the code path seems risky. |
@drcrallen good point. I think that is the best solution, but it will require a bunch of work. Maybe we can file an issue as a future solution.
Would you elaborate more on the risks? I think malformed JavaScripts will emit errors eventually in the middle of processing. Generally it's good to return an error as soon as possible, but, in this case, avoiding OOM will be better. |
| ) | ||
| { | ||
| super(timestampSpec, dimensionsSpec); | ||
| Preconditions.checkState(config.isEnabled(), "JavaScript is disabled"); |
There was a problem hiding this comment.
The original idea behind delaying these checks until actual usage is that when you disable JavaScript, you might still want your Druid nodes to be able to deserialize JavaScript-based objects. For example, maybe you had used JavaScript aggregators in your ingest tasks, and after disabling JavaScript you still want to be able to view old completed tasks from the task table in the metadata store.
So I think it's worth being able to create these objects even if JavaScript is disabled, just make sure the JavaScript is never actually compiled.
There was a problem hiding this comment.
Thanks. Changed to postpone the JavaScript configuration check.
| } | ||
|
|
||
| return new JavaScriptParser(function); | ||
| parser = parser == null ? new JavaScriptParser(function) : parser; |
There was a problem hiding this comment.
Is this thread-safe? I guess, maybe, although it might do extra work since not all threads will see the update at the same time. I'm also not sure if ParseSpecs need to be thread-safe; if it's an issue could you please double-check it?
There was a problem hiding this comment.
It looks that one ParseSpec (and one InputRowParser) is created per task and there's no thread-safety issue. One exception is DataSchema.getParser() which might create a new InputRowParser, but it's always called by a single thread. I additionally modified DataSchema to cache the created InputRowParser.
I recognize ThriftDeserialization (added in #3418) is designed for thread-safety, but am not sure how different threads can call it at the same time.
| } | ||
|
|
||
| return compiledScript; | ||
| compiledScript = compiledScript == null ? compileScript(fnAggregate, fnReset, fnCombine) : compiledScript; |
There was a problem hiding this comment.
Is this going to be thread-safe (albeit with some possible extra work if multiple threads don't see the compiledScript variable update)? AggregatorFactorys definitely do need to be thread-safe as they are used from multiple threads during query processing.
|
|
||
| private final Function fn; | ||
| // This variable is lazily initialized to avoid unnecessary JavaScript compilation during JSON serde | ||
| private Function fn; |
There was a problem hiding this comment.
Similar thread-safety questions to the other files.
|
@drcrallen In general I think the benefit from delaying compilation exceeds the risk (especially since the user will see the error soon after query submission, when the historicals fail to compile the script and all bomb out). Agree with both you and @jihoonson that a compile cache is probably the better overall approach, but I'm not convinced it's necessary to take the time to implement it over this simpler solution. Especially since JS is not meant to be super optimized for production, and with this patch, we're just trying to make it slightly more robust for a specific case we ran into. |
|
👍 |
|
👍 |
Currently, most JavaScript functions are compiled in constructors, which means they are compiled whenever they are deserialized from JSON. This can be critical if JS functions prevails especially in metricsSpec because they are stored in Metadata of each segment and deserialized whenever a segment is loaded which generates classes for JS functions each time, thereby causing
java.lang.OutOfMemoryError: Compressed class space.This patch postpones the compilation of JS functions until they are really used.
This change is