Skip to content

Lazy initialization for JavaScript functions#4871

Merged
jon-wei merged 7 commits intoapache:masterfrom
jihoonson:js-lazy-init
Oct 11, 2017
Merged

Lazy initialization for JavaScript functions#4871
jon-wei merged 7 commits intoapache:masterfrom
jihoonson:js-lazy-init

Conversation

@jihoonson
Copy link
Copy Markdown
Contributor

@jihoonson jihoonson commented Sep 28, 2017

Currently, most JavaScript functions are compiled in constructors, which means they are compiled whenever they are deserialized from JSON. This can be critical if JS functions prevails especially in metricsSpec because they are stored in Metadata of each segment and deserialized whenever a segment is loaded which generates classes for JS functions each time, thereby causing java.lang.OutOfMemoryError: Compressed class space.
This patch postpones the compilation of JS functions until they are really used.


This change is Reviewable

@drcrallen
Copy link
Copy Markdown
Contributor

If the javascript in such cases is programmatically generated, would it make more sense to intern/cache the compiled form or similar?

Moving the failure feedback for malformed javascript so late in the code path seems risky.

@jihoonson
Copy link
Copy Markdown
Contributor Author

If the javascript in such cases is programmatically generated, would it make more sense to intern/cache the compiled form or similar?

@drcrallen good point. I think that is the best solution, but it will require a bunch of work. Maybe we can file an issue as a future solution.

Moving the failure feedback for malformed javascript so late in the code path seems risky.

Would you elaborate more on the risks? I think malformed JavaScripts will emit errors eventually in the middle of processing. Generally it's good to return an error as soon as possible, but, in this case, avoiding OOM will be better.

)
{
super(timestampSpec, dimensionsSpec);
Preconditions.checkState(config.isEnabled(), "JavaScript is disabled");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original idea behind delaying these checks until actual usage is that when you disable JavaScript, you might still want your Druid nodes to be able to deserialize JavaScript-based objects. For example, maybe you had used JavaScript aggregators in your ingest tasks, and after disabling JavaScript you still want to be able to view old completed tasks from the task table in the metadata store.

So I think it's worth being able to create these objects even if JavaScript is disabled, just make sure the JavaScript is never actually compiled.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Changed to postpone the JavaScript configuration check.

}

return new JavaScriptParser(function);
parser = parser == null ? new JavaScriptParser(function) : parser;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this thread-safe? I guess, maybe, although it might do extra work since not all threads will see the update at the same time. I'm also not sure if ParseSpecs need to be thread-safe; if it's an issue could you please double-check it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks that one ParseSpec (and one InputRowParser) is created per task and there's no thread-safety issue. One exception is DataSchema.getParser() which might create a new InputRowParser, but it's always called by a single thread. I additionally modified DataSchema to cache the created InputRowParser.

I recognize ThriftDeserialization (added in #3418) is designed for thread-safety, but am not sure how different threads can call it at the same time.

}

return compiledScript;
compiledScript = compiledScript == null ? compileScript(fnAggregate, fnReset, fnCombine) : compiledScript;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to be thread-safe (albeit with some possible extra work if multiple threads don't see the compiledScript variable update)? AggregatorFactorys definitely do need to be thread-safe as they are used from multiple threads during query processing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed.


private final Function fn;
// This variable is lazily initialized to avoid unnecessary JavaScript compilation during JSON serde
private Function fn;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar thread-safety questions to the other files.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Oct 2, 2017

@drcrallen In general I think the benefit from delaying compilation exceeds the risk (especially since the user will see the error soon after query submission, when the historicals fail to compile the script and all bomb out).

Agree with both you and @jihoonson that a compile cache is probably the better overall approach, but I'm not convinced it's necessary to take the time to implement it over this simpler solution. Especially since JS is not meant to be super optimized for production, and with this patch, we're just trying to make it slightly more robust for a specific case we ran into.

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Oct 6, 2017

👍

@jon-wei
Copy link
Copy Markdown
Contributor

jon-wei commented Oct 11, 2017

👍

@jon-wei jon-wei merged commit 56fb11c into apache:master Oct 11, 2017
@jon-wei jon-wei added this to the 0.12.0 milestone Jan 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants