JS expressions for trigger filter#3771
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: slinkydeveloper The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
There is also this newer js engine https://github.com/dop251/goja we should give a try. I obviously avoided all bindings to v8 because they require cgo in place and that's a huge pain for our infra |
|
/retest |
This is some fantastic work, sire Francesco <3 |
| for name, fn := range map[string]func() interface{}{ | ||
| "specversion": func() interface{} { | ||
| return event.SpecVersion() | ||
| }, | ||
| "type": func() interface{} { | ||
| return event.Type() | ||
| }, | ||
| "source": func() interface{} { | ||
| return event.Source() | ||
| }, | ||
| "subject": func() interface{} { | ||
| return event.Subject() | ||
| }, | ||
| "id": func() interface{} { | ||
| return event.ID() | ||
| }, | ||
| "time": func() interface{} { | ||
| return event.Time() | ||
| }, | ||
| "dataschema": func() interface{} { | ||
| return event.DataSchema() | ||
| }, | ||
| "datacontenttype": func() interface{} { | ||
| return event.DataContentType() | ||
| }, | ||
| } { |
There was a problem hiding this comment.
Splitting the map declaration and usage would be cleaner.
Does sdk-go have constants for keys?
There was a problem hiding this comment.
Does sdk-go have constants for keys?
Nope, but in any case these getters are "spec version" agnostic (look at dataschema vs schemaurl) and i intentionally wrote that code to decouple sdk-go apis and filtering apis, which i think should be something stable (probably a change to that requires bumping the trigger crd version)
| func bindValue(vm *goja.Runtime, propName string, eventVal interface{}, obj *goja.Object) error { | ||
| return obj.DefineAccessorProperty(propName, vm.ToValue(func(fc goja.FunctionCall) goja.Value { | ||
| return coerceToJsTypes(vm, eventVal) | ||
| }), nil, goja.FLAG_FALSE, goja.FLAG_TRUE) | ||
| } | ||
|
|
||
| func bindValueExtractor(vm *goja.Runtime, propName string, extractFn func() interface{}, obj *goja.Object) error { | ||
| return obj.DefineAccessorProperty(propName, vm.ToValue(func(fc goja.FunctionCall) goja.Value { | ||
| return coerceToJsTypes(vm, extractFn()) | ||
| }), nil, goja.FLAG_FALSE, goja.FLAG_TRUE) | ||
| } |
There was a problem hiding this comment.
They are similar, can we do
func bindValueExtractor(vm *goja.Runtime, propName string, extractFn func() interface{}, obj *goja.Object) error {
return bindValue(vm, propName, extractFn(), obj)
}Or get rid of bindValueExtractor completely?
There was a problem hiding this comment.
nope because i want to keep the extraction lazy, that's the reason why we have these two methods. in one case the bindValueExtractor enclose in the getter closure the function pointer, in the other case it enclose the value itself
There was a problem hiding this comment.
nope because i want to keep the extraction lazy
What is the goal? Speed?
There was a problem hiding this comment.
What is the goal? Speed?
Mostly avoid unnecessary memory allocations & type conversions to copy all the stuff inside the engine, when you just access a bunch of fields of the event
| client.CreateTriggerOrFailV1Beta1(triggerNameOk, | ||
| resources.WithSubscriberServiceRefForTriggerV1Beta1(subscriberNamePass), | ||
| resources.WithExpressionTriggerFilterV1Beta1("event.id != null"), | ||
| resources.WithBrokerV1Beta1(brokerName), | ||
| ) | ||
|
|
||
| subscriberNameFail := "recordevents-expression-fail" | ||
| eventTrackerFail, _ := recordevents.StartEventRecordOrFail(client, subscriberNameFail) | ||
| defer eventTrackerFail.Cleanup() | ||
|
|
||
| triggerNameFail := "trigger-expression-fail" | ||
| client.CreateTriggerOrFailV1Beta1(triggerNameFail, | ||
| resources.WithSubscriberServiceRefForTriggerV1Beta1(subscriberNameFail), | ||
| resources.WithExpressionTriggerFilterV1Beta1("event.id == null"), | ||
| resources.WithBrokerV1Beta1(brokerName), | ||
| ) |
There was a problem hiding this comment.
I would test Trigger v1 as well.
There was a problem hiding this comment.
In theory it's already tested b/c the v1beta1 get converted to v1
|
Open question for people here: should this pr add |
|
It seems that with this code only expression language supported will be JS? |
Yes that's the idea |
|
@aslom I renamed the field |
|
/retest |
|
The following jobs failed:
Failed non-flaky tests preventing automatic retry of pull-knative-eventing-integration-tests: |
|
/retest |
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
Added doc about multiple filter types Removed useless code in e2e test Some nits in errors of filter_engine.go Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
|
Raw results: Benchstat: |
|
Comments of the results with a comparison with json schema proposal https://docs.google.com/document/d/1Pz2vaLWKUrMQDLNyDW7ksjgLG4GAxBY546dT7_jF5Rk/edit#heading=h.nqj7kw3uqc6m |
|
I guess the benchstat is kinda useless as it's not comparing AttribteFilter vs. JsEngine filter, right? At least this proofs what I guess was expected: The JsEngine stuff is 2 orders of magnitude slower than a "native" filter. Whether or not that will impact workloads remains to be seen in real-world benchmarks I guess, but the sheer amount of allocations would have me quite cautious towards GC pressure. That area also looks like it could be vastly improved as I don't quite see a reason why we'd have to have that many allocations per filter. |
It tests the variance through the different runs of the same test |
|
@markusthoemmes yep there is definitely something wrong going on with this implementation, i'm trying to figure out what's wrong. Looking at the profiling data right now |
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
|
Preliminary results with the hacks I made: which is pretty much similar to the json schema results |
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
|
The following is the coverage report on the affected files.
|
|
@slinkydeveloper: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Death ReportSo after running the benchmarks, I quickly identified that there was a crucial issue in this PR: while trying to isolate every execution of script, I was creating one JS Engine per request. Isolation was guaranteed, but every engine allocated quite a lot of memory for the built-in objects, the vm, etc... The total allocation was up to 170kb, which means that creating a new engine for each filtering request doesn't work in real world. So I took a different approach: reuse the same engine per expression definition. Doing some static analysis of the ast and hacking a bit around the engine, I identified how to make the executions isolated and avoid the user to run malicious code, like Doom. But, cherry on the cake, the engine is not thread safe: you cannot run an exported function from multiple goroutines (which, in the end, it makes sense for JS per se). I ended up giving up with this engine, it doesn't have the requirements we need. Maybe in future we could re-explore the idea embedding V8 or duktape, but they require CGO which opens to a complete different set of problems for our infra. The quest for powerful filtering solution doesn't end here, rest assured! In fact, you should look at https://docs.google.com/document/d/1Pz2vaLWKUrMQDLNyDW7ksjgLG4GAxBY546dT7_jF5Rk/edit#heading=h.3w9ulpur0y16 |
Signed-off-by: Francesco Guardiani francescoguard@gmail.com
Fixes #3359
Proposed Changes
Expressionto code the JS expression to perform the filteringRelease Note
Docs
Docs will be available in next PRs in knative.dev/docs
The API to access the event attributes are as simple as
event.attributenameorevent.extensionname. An example expression:This filter will accept an event with id containing the string francesco, time with year greater or equal to 2020 and with an extension named
exta.There is no access to the
dataof the event. I believe this requires a separate discussion because there are a lot of open questions that we need to address:Some notes
vm *goja.Runtimein the code) is re-created from scratch, so one filter execution cannot affect any other filterastis parsed separately from the code execution, so we can cache it. I also implemented a check for theasttoken, in order to discard bad code (code that is not an expression)PR is still pretty much drafted, I would love to hear some feedback about the approach.
TODO