Skip to content

docs: Add scripting guide#1962

Closed
ghost wants to merge 2 commits intomasterfrom
scripting-guide
Closed

docs: Add scripting guide#1962
ghost wants to merge 2 commits intomasterfrom
scripting-guide

Conversation

@ghost
Copy link

@ghost ghost commented Mar 1, 2020

This is a WIP on a guide describing usage of scripting transforms (lua and javascript) in Vector.

It describes unimplemented features and is far from being finished by itself, so this PR is created just to collect some feedback.

Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
@ghost ghost force-pushed the scripting-guide branch from d0557b9 to 10d3aa8 Compare March 1, 2020 19:44
@binarylogic binarylogic self-requested a review March 1, 2020 20:17
Copy link
Contributor

@binarylogic binarylogic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start and seems to solve to a number of existing issues. A few outstanding questions I have:

  1. Should scripting transforms be able to control flow and define "outputs" similar to the swimlanes transform?
  2. Can scripting transforms emit more than one event?
  3. Backward compatibility is handled by the "anonymous handler" feature, correct?
  4. Should we address performance concerns? If performance was equal, I'd probably recommend using scripting transforms for everything.
  5. Should we mention testing? These seem like prime candidates for testing,.

event.log.id = counter

if counter % 2 == 0 then
event.lane = "even"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see anywhere in this guide where the lane field is actually used. It would be more clear to follow through with this example.

I'm also curious if you saw #1942? It might be easier to provide a way for scripting transforms to define outputs versus pairing it with the swimlandes transform.

Copy link
Author

@ghost ghost Mar 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw #1942 and I want to solve it by allowing the user to optionally add lane field to the top level of the event. The expected effect could then be the same as when using swimlanes transform.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I'd prefer that we prefix special field with _, @, or some other character. To throw another edge case at you, what if a user wanted to emit an event to 2 lanes?

Copy link
Author

@ghost ghost Mar 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer that we prefix special field with _, @, or some other character.

It is already a kind of "meta" field, as it is not visible to the downstream components which see only the event which is inside the value of the log key. For example, if the original event produced by say stdin source looks like this:

{
  "message": "message",
  "timestamp": "2020-03-02T15:45:56.751Z",
  "host": "localhost"
}

Then it passed to the lua transform as a table

{
  log = {
    message = "message",
    timestamp = "2020-03-02T15:45:56.751Z",
    host = "localhost"
  }
}

Setting the lane by having event.lane = "lane_name" in the transform code results in this:

{
  log = {
    message = "message",
    timestamp = "2020-03-02T15:45:56.751Z",
    host = "localhost"
  },
  lane = "lane_name"
}

After this table is returned from the transform, the log part is extracted to be sent to the downstream components reading from transform_name.lane_name. So if there is a downstream console sink reading from transform_name.lane_name it would receive

{
  "message": "message",
  "timestamp": "2020-03-02T15:45:56.751Z",
  "host": "localhost"
}

By the way, I'm thinking about another "meta" field visible only to the transform, input, which would contain the name of the component which produced the event. But it would require changes in other parts of code to implement, not only in lua and javascript transforms.

To throw another edge case at you, what if a user wanted to emit an event to 2 lanes?

One approach is to duplicate the event and then set different lanes for the duplicates. On the other hand, to make it more convenient, it is also possible to allow setting the field not only to strings, but also to arrays.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes more sense, I forgot the type data was nested. I still find the lanes approach to be somewhat awkward. IMO, it feels more natural to pair this with the return statement or an explicit function. For example:

return {event, "lane1", "lane2"}

And returning just event would work as well:

return event

Is your intent to avoid having to implement a common API across all scripting transforms? I assume adding a lane field would make it easier to share this logic across different scripting transforms?

Copy link
Author

@ghost ghost Mar 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm experimenting with the approach proposed in #1942, with the transform pushing the events using an emit function. But then I think it makes sense to make the handlers closures which take entire input stream and then iterate over it and emit output events.

For example, if the handler takes input parameter which is an iterator and the emit function, then the entire handler can look like this:

function (input, emit)
  -- init
  counter = 0

  -- process
  for event in input do
    counter = counter + 1
    event.log.id = counter

    if counter % 2 == 0 then
      emit(event, "even")
    else
      emit(event, "odd")
    end
  end

  -- clean up
  emit {
    log = {
      message = "nothing left to be processed",
    }
  }
end

Which implements the true iterator pattern from Lua by itself. However, my concern with this approach is that it is not possible to produce an event if the waiting takes too long. So we need a more flexible approach, which is also not overcomplicated in the simple cases.

In general, I think if add such large changes, it is worth to create an RFC on the new programming model which would solve all issues with the Lua transform.

Is your intent to avoid having to implement a common API across all scripting transforms? I assume adding a lane field would make it easier to share this logic across different scripting transforms?

Not really, the underlying scripting engines are very different, so I think the code duplication is unavoidable in any case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that we’d need a RFC for that. I prefer your original proposal over this direction, as I find it to be less straightforward. Happy to have others weigh in if you disagree. And appreciate you trying that out.

Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>
@ghost
Copy link
Author

ghost commented Mar 2, 2020

  • Should scripting transforms be able to control flow and define "outputs" similar to the swimlanes transform?

Yes! The idea is that the transform can set lane field in the transform and then the event can be consumed by the downstream components by setting the inputs field in the downstream components to transform_name.lane_name. If lane is not set, then it should work as usually, so events can be consumed from transform_name.

  • Can scripting transforms emit more than one event?

They can return arrays of events instead of singular events.

  • Backward compatibility is handled by the "anonymous handler" feature, correct?

Not really, because the event structure is changed in any case (now log events are nested in log field). Anonymous handlers still have function definition, but without specifying the name, like this:

function (event)
  return event
end

In my opinion, for backward compatibility it is better to just introduce a legacy configuration option, which, if set to true, would allow to use pre-existing transform scripts.

  • Should we address performance concerns? If performance was equal, I'd probably recommend using scripting transforms for everything.

I expect the performance to be at least 10x slower than with native transforms. Furthermore, some design decisions (such as using tables instead of userdata) additionally decrease performance, but improve the usability. As the wasm transform is in works, I think it is less than an issue than it could have been previously because the users who need high performance for custom transforms would be able to implement them in Rust.

  • Should we mention testing? These seem like prime candidates for testing,.

I think so, there should be an example of defining unit tests for scripting transforms.

@binarylogic binarylogic self-assigned this Mar 2, 2020
@binarylogic
Copy link
Contributor

In my opinion, for backward compatibility it is better to just introduce a legacy configuration option, which, if set to true, would allow to use pre-existing transform scripts.

Agree, we have very important users using the lua transform and I would like to not break their configuration. Maybe we could require a version attribute going forward?

@binarylogic
Copy link
Contributor

Closing since #2000 will introduce changes and this PR has drifted.

@binarylogic binarylogic closed this Mar 9, 2020
@binarylogic binarylogic deleted the scripting-guide branch April 24, 2020 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants