From f5cfa3105c6ae7ac91c5de46d77da4c00d98e3d3 Mon Sep 17 00:00:00 2001 From: Alexander Rodin Date: Fri, 6 Mar 2020 23:57:09 +0300 Subject: [PATCH 01/18] chore: RFC #1999 - 2020-03-06 - API extensions for `lua` transform Signed-off-by: Alexander Rodin --- ...6-1999-api-extensions-for-lua-transform.md | 175 ++++++++++++++++++ 1 file changed, 175 insertions(+) create mode 100644 rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md diff --git a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md new file mode 100644 index 0000000000000..27f0ccc8a1a25 --- /dev/null +++ b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md @@ -0,0 +1,175 @@ +# RFC #1999 - 2020-03-06 - API extensions for `lua` transform + +This RFC proposes a new API for the `lua` transform. + +## Motivation + +Currently the [`lua` transform](https://vector.dev/docs/reference/transforms/lua/) has some limitations in its API. In particular, the following features are missing: + +* **Nested Fields** + + Currently accessing nested fields is possible using the dot notation: + + ```lua + event["nested.field"] = 5 + ``` + + However, users expect nested fields to be accessible as native Lua structures, for example like this: + + ```lua + event["nested"]["field"] = 5 + ``` + + See [#706](https://github.com/timberio/vector/issues/706) and [#1406](https://github.com/timberio/vector/issues/1406). + +* **Setup Code** + + Some scripts require expensive setup steps, for example, loading of modules or invoking shell commands. These steps should not be part of the main transform code. + + For example, this code adding custom hostname + + ```lua + if event["host"] == nil then + local f = io.popen ("/bin/hostname") + local hostname = f:read("*a") or "" + f:close() + hostname = string.gsub(hostname, "\n$", "") + event["host"] = hostname + end + ``` + + Should be split into two parts, the first part executed just once at the initialization: + + ```lua + local f = io.popen ("/bin/hostname") + local hostname = f:read("*a") or "" + f:close() + hostname = string.gsub(hostname, "\n$", "") + ``` + + and the second part executed for each incoming event: + + ```lua + if event["host"] == nil then + event["host"] = hostname + end + ``` + + See [#1864](https://github.com/timberio/vector/issues/1864). + +* **Control Flow** + + It should be possible to define channels for output events, similarly to how it is done in [`swimlanes`](https://vector.dev/docs/reference/transforms/swimlanes/) transform. + + See [#1942](https://github.com/timberio/vector/issues/1942). + +## Prior Art + +The implementation of `lua` transform has the following design: + +* There is a `source` parameter which takes a string of code. +* When a new event comes in, the global variable `event` is set inside the Lua context and the code from `source` is evaluated. +* After that, Vector reads the global variable `event` as the processed event. +* If the global variable `event` is set to `nil`, then the event is dropped. + +Events have type [`userdata`](https://www.lua.org/pil/28.1.html) with custom [metamethods](https://www.lua.org/pil/13.html), so they are views to Vector's events. Thus passing an event to Lua has zero cost, so only when fields are actually accessed the data is copied to Lua. + +The fields are accessed through string indexes using [Vector's dot notation](https://vector.dev/docs/about/data-model/log/#dot-notation). + +## Guide-level Proposal + +### Motivating example + + +```toml +[transforms.lua] + type = "lua" + inputs = [] + source = """ + counter = counter + 1 + event = nil + """ + [transforms.lua.hooks] + init = """ + counter = 0 + previous_timestamp = os.time() + Event = Event.new_log() + event["message"] = "starting up" + event:set_lane("auxiliary") + """ + shutdown = """ + final_stats_event = Event.new_log() + final_stats_event["stats"] = { count = counter, interval = os.time() - previous_timestamp } + final_stats_event["stats.rate"] = final_stats_event["stats"].count / final_stats_event["stats.interval"] + + shutdown_event = Event.new_log() + shutdown_event["message"] = "shutting down" + shutdown_event:set_lane("auxiliary") + + event = {final_stats_event, shutdown_event} + """ + [[transforms.lua.timers]] + interval = 10 + source = """ + event = Event.new_log() + event["stats"] = { count = counter, interval = 10 } + event["stats.rate"] = event["stats"].count / event["stats.interval"] + counter = 0 + previous_timestamp = os.time() + """ + [[transforms.lua.timers]] + interval = 60 + source = """ + event["message"] = "heartbeat" + event:set_lane("auxiliary") + "" +``` + +The code above consumes the incoming events, counts them, and then emits these stats about these counts every 10 seconds. In addition, it sends debug logs about its functioning into a separate lane called `auxiliary`. + +### Proposed changes + +* Hooks for initialization and shutdown called `init` and `shutdown`. They are defined as strings of Lua code in the `hooks` section of the configuration of the transform. +* Timers which define pieces of code that are executed periodically. They are defined in array `timers`, each timer takes two configuration options: `interval` which is the interval for execution in seconds and `source` which is the code which is to be executed periodically. +* Support for setting the output lane using `set_lane` method on the event which takes a string as the parameter. It should also be possible to read the lane using `get_lane` method. Reading from the lanes can be done in the downstream sinks by specifying the name of transform suffixed by a dot and the name of the lane. +* Support multiple output events by making it possible to set the `event` global variable to an [sequence](https://www.lua.org/pil/11.1.html) of events. +* Support direct access to the nested fields (in both maps and arrays). + +## Sales Pitch + +The proposal + +* gives users more power to create custom transforms; +* does not break backward compatibility (except `pairs` method in case of nested fields); +* makes it possible to add complexity to the configuration of the transform gradually only when needed. + +## Drawbacks + +The only drawback is that supporting both dot notation and classical indexing makes it impossible to add escaping of dots in field names. For example, for incoming event structure like + +```json +{ + "field.first": { + "second": "value" + } +} +``` + +accessing `event["field.first"]` would return `nil`. + +However, because of the specificity of the observability data, there seems to be no need to have both field names with dots and nested fields. + +## Outstanding Questions + +* In access to the arrays should the indexes be 0-based or 1-based? Vector uses 0-based indexing, while in Lua the indexing is traditionally 1-based. However, technically it is possible to implement 0-based indexing for arrays which are stored inside events, as both [`__index`](https://www.lua.org/pil/13.4.1.html) and [`__len`](https://www.lua.org/manual/5.3/manual.html#3.4.7) need to have custom implementations in any case. + +* Is it confusing that the same global variable name `event` used also for outputting multiple events? The alternative, using a different name, for example, `events`, would lead to questions of precedence in case if both `event` and `events` are set. + +## Plan of Action + +- [ ] Add `init` and `shutdown` hooks. +- [ ] Add timers. +- [ ] Implement `set_lane` and `get_lane` methods on the events. +- [ ] Support multiple output events. +- [ ] Implement `Event.new_log()` function. +- [ ] Support direct access to the nested fields in addition to the dot notation. From 0386f9b144224df593230e758ef3019d01d2ce56 Mon Sep 17 00:00:00 2001 From: Alexander Rodin Date: Sat, 7 Mar 2020 00:14:06 +0300 Subject: [PATCH 02/18] Fix whitespace usage Signed-off-by: Alexander Rodin --- ...6-1999-api-extensions-for-lua-transform.md | 28 +++++++++---------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md index 27f0ccc8a1a25..6fa136370f62b 100644 --- a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md +++ b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md @@ -13,21 +13,21 @@ Currently the [`lua` transform](https://vector.dev/docs/reference/transforms/lua ```lua event["nested.field"] = 5 ``` - + However, users expect nested fields to be accessible as native Lua structures, for example like this: - + ```lua event["nested"]["field"] = 5 ``` - + See [#706](https://github.com/timberio/vector/issues/706) and [#1406](https://github.com/timberio/vector/issues/1406). - + * **Setup Code** Some scripts require expensive setup steps, for example, loading of modules or invoking shell commands. These steps should not be part of the main transform code. - + For example, this code adding custom hostname - + ```lua if event["host"] == nil then local f = io.popen ("/bin/hostname") @@ -37,26 +37,26 @@ Currently the [`lua` transform](https://vector.dev/docs/reference/transforms/lua event["host"] = hostname end ``` - + Should be split into two parts, the first part executed just once at the initialization: - + ```lua local f = io.popen ("/bin/hostname") local hostname = f:read("*a") or "" f:close() hostname = string.gsub(hostname, "\n$", "") ``` - + and the second part executed for each incoming event: - + ```lua if event["host"] == nil then event["host"] = hostname end ``` - + See [#1864](https://github.com/timberio/vector/issues/1864). - + * **Control Flow** It should be possible to define channels for output events, similarly to how it is done in [`swimlanes`](https://vector.dev/docs/reference/transforms/swimlanes/) transform. @@ -101,11 +101,11 @@ The fields are accessed through string indexes using [Vector's dot notation](htt final_stats_event = Event.new_log() final_stats_event["stats"] = { count = counter, interval = os.time() - previous_timestamp } final_stats_event["stats.rate"] = final_stats_event["stats"].count / final_stats_event["stats.interval"] - + shutdown_event = Event.new_log() shutdown_event["message"] = "shutting down" shutdown_event:set_lane("auxiliary") - + event = {final_stats_event, shutdown_event} """ [[transforms.lua.timers]] From d6de2a95a816bd73c534a66a9378704ac31e276c Mon Sep 17 00:00:00 2001 From: Alexander Rodin Date: Sun, 8 Mar 2020 20:18:55 +0300 Subject: [PATCH 03/18] Support metric events and timestamps, add `emit` function Signed-off-by: Alexander Rodin --- ...6-1999-api-extensions-for-lua-transform.md | 86 ++++++++++++------- 1 file changed, 55 insertions(+), 31 deletions(-) diff --git a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md index 6fa136370f62b..04e36f8b40a47 100644 --- a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md +++ b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md @@ -4,7 +4,7 @@ This RFC proposes a new API for the `lua` transform. ## Motivation -Currently the [`lua` transform](https://vector.dev/docs/reference/transforms/lua/) has some limitations in its API. In particular, the following features are missing: +Currently, the [`lua` transform](https://vector.dev/docs/reference/transforms/lua/) has some limitations in its API. In particular, the following features are missing: * **Nested Fields** @@ -87,42 +87,66 @@ The fields are accessed through string indexes using [Vector's dot notation](htt inputs = [] source = """ counter = counter + 1 - event = nil + -- without calling `emit` function nothing is produced by default """ [transforms.lua.hooks] init = """ counter = 0 previous_timestamp = os.time() + emit({ + log = { + messge = "starting up", + timestamp = os.date("!*t), + } Event = Event.new_log() - event["message"] = "starting up" - event:set_lane("auxiliary") + event["log"]["message"] = "starting up" """ shutdown = """ - final_stats_event = Event.new_log() - final_stats_event["stats"] = { count = counter, interval = os.time() - previous_timestamp } - final_stats_event["stats.rate"] = final_stats_event["stats"].count / final_stats_event["stats.interval"] - - shutdown_event = Event.new_log() - shutdown_event["message"] = "shutting down" - shutdown_event:set_lane("auxiliary") - - event = {final_stats_event, shutdown_event} + final_stats_event = { + log = { + count = counter, + timestamp = timestamp(), + interval = os.time() - previous_timestamp + } + } + final_stats_event.log.stats.rate = final_stats_event["log"]["stats"].count / final_stats_event.log.stats.interval + emit(final_stats_event) + + emit({ + log = { + message = "shutting down" + } + }, "auxiliary") """ [[transforms.lua.timers]] interval = 10 source = """ - event = Event.new_log() - event["stats"] = { count = counter, interval = 10 } - event["stats.rate"] = event["stats"].count / event["stats.interval"] - counter = 0 - previous_timestamp = os.time() + emit { + metric = { + name = "response_time_ms", + timestamp = os.date("!*t"), + kind = "absolute", + tags = { + host = "localhost" + }, + value = { + type = "counter", + value = 24.2 + } + } + } """ [[transforms.lua.timers]] interval = 60 source = """ - event["message"] = "heartbeat" - event:set_lane("auxiliary") - "" + event = { + log = { + message = "heartbeat", + timestamp = os.date("!*t), + } + } + emit(event, "auxiliary") + """ ``` The code above consumes the incoming events, counts them, and then emits these stats about these counts every 10 seconds. In addition, it sends debug logs about its functioning into a separate lane called `auxiliary`. @@ -131,16 +155,16 @@ The code above consumes the incoming events, counts them, and then emits these s * Hooks for initialization and shutdown called `init` and `shutdown`. They are defined as strings of Lua code in the `hooks` section of the configuration of the transform. * Timers which define pieces of code that are executed periodically. They are defined in array `timers`, each timer takes two configuration options: `interval` which is the interval for execution in seconds and `source` which is the code which is to be executed periodically. -* Support for setting the output lane using `set_lane` method on the event which takes a string as the parameter. It should also be possible to read the lane using `get_lane` method. Reading from the lanes can be done in the downstream sinks by specifying the name of transform suffixed by a dot and the name of the lane. -* Support multiple output events by making it possible to set the `event` global variable to an [sequence](https://www.lua.org/pil/11.1.html) of events. +* Events are produced by the transform by calling function `emit` with the first argument being the event and the second option argument being the name of the lane where to emit the event. Outputting the events by storing them to the `event` global variable should not be supported, so its content would be ignored. * Support direct access to the nested fields (in both maps and arrays). +* Add support for the timestamp type as a `userdata` object with the same visible fields as in the table returned by [`os.date`](https://www.lua.org/manual/5.3/manual.html#pdf-os.date). In addition, monkey-patch `os.date` function available inside Lua scripts to make it return the same kind of userdata instead of a table if it is called with `*t` or `!*t` as the argument. This is necessary to allow one-to-one correspondence between types in Vector and Lua. ## Sales Pitch The proposal * gives users more power to create custom transforms; -* does not break backward compatibility (except `pairs` method in case of nested fields); +* supports both logs and metrics; * makes it possible to add complexity to the configuration of the transform gradually only when needed. ## Drawbacks @@ -161,15 +185,15 @@ However, because of the specificity of the observability data, there seems to be ## Outstanding Questions -* In access to the arrays should the indexes be 0-based or 1-based? Vector uses 0-based indexing, while in Lua the indexing is traditionally 1-based. However, technically it is possible to implement 0-based indexing for arrays which are stored inside events, as both [`__index`](https://www.lua.org/pil/13.4.1.html) and [`__len`](https://www.lua.org/manual/5.3/manual.html#3.4.7) need to have custom implementations in any case. - -* Is it confusing that the same global variable name `event` used also for outputting multiple events? The alternative, using a different name, for example, `events`, would lead to questions of precedence in case if both `event` and `events` are set. +* Are there better alternatives to the proposed solution for supporting of the timestamp type? +* Could some users be surprised if the transform which doesn't call `emit` function doesn't output anything? ## Plan of Action +- [ ] Implement access to the nested structure of logs events. +- [ ] Support creation of logs events as table inside the transform. +- [ ] Implement metrics support. +- [ ] Add `emit` function. - [ ] Add `init` and `shutdown` hooks. - [ ] Add timers. -- [ ] Implement `set_lane` and `get_lane` methods on the events. -- [ ] Support multiple output events. -- [ ] Implement `Event.new_log()` function. -- [ ] Support direct access to the nested fields in addition to the dot notation. +- [ ] Implement support for the timestamp type compatible with the result of execution of `os.date("!*t")`. From 88ca8e67694eaafbbfaf5091e96f27600676bc77 Mon Sep 17 00:00:00 2001 From: Alexander Rodin Date: Sun, 8 Mar 2020 20:28:08 +0300 Subject: [PATCH 04/18] Add `version` configuration option Signed-off-by: Alexander Rodin --- rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md index 04e36f8b40a47..523ce5b36e4aa 100644 --- a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md +++ b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md @@ -85,6 +85,7 @@ The fields are accessed through string indexes using [Vector's dot notation](htt [transforms.lua] type = "lua" inputs = [] + version = "2" # defaults to 1 source = """ counter = counter + 1 -- without calling `emit` function nothing is produced by default @@ -153,6 +154,7 @@ The code above consumes the incoming events, counts them, and then emits these s ### Proposed changes +* Add `version` configuration option which would allow the users to chose between the new API described in this RFC (version 2) and the old one (version 1). * Hooks for initialization and shutdown called `init` and `shutdown`. They are defined as strings of Lua code in the `hooks` section of the configuration of the transform. * Timers which define pieces of code that are executed periodically. They are defined in array `timers`, each timer takes two configuration options: `interval` which is the interval for execution in seconds and `source` which is the code which is to be executed periodically. * Events are produced by the transform by calling function `emit` with the first argument being the event and the second option argument being the name of the lane where to emit the event. Outputting the events by storing them to the `event` global variable should not be supported, so its content would be ignored. @@ -190,6 +192,7 @@ However, because of the specificity of the observability data, there seems to be ## Plan of Action +- [ ] Implement support for `version` config option and split implementations for versions 1 and 2. - [ ] Implement access to the nested structure of logs events. - [ ] Support creation of logs events as table inside the transform. - [ ] Implement metrics support. From f545cf06f6338d3228ed94f2fa1958dd4e1c1135 Mon Sep 17 00:00:00 2001 From: Alexander Rodin Date: Sun, 8 Mar 2020 20:33:50 +0300 Subject: [PATCH 05/18] Clarify timestamps creation and add a question about them Signed-off-by: Alexander Rodin --- rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md index 523ce5b36e4aa..403c3db745aaf 100644 --- a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md +++ b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md @@ -106,7 +106,7 @@ The fields are accessed through string indexes using [Vector's dot notation](htt final_stats_event = { log = { count = counter, - timestamp = timestamp(), + timestamp = os.date("!*t"), interval = os.time() - previous_timestamp } } @@ -115,7 +115,8 @@ The fields are accessed through string indexes using [Vector's dot notation](htt emit({ log = { - message = "shutting down" + message = "shutting down", + timestamp = os.date("!*t"), } }, "auxiliary") """ @@ -187,6 +188,7 @@ However, because of the specificity of the observability data, there seems to be ## Outstanding Questions +* Should timestamps be automatically inserted to created logs and metrics created as tables inside the transform is they are not present? * Are there better alternatives to the proposed solution for supporting of the timestamp type? * Could some users be surprised if the transform which doesn't call `emit` function doesn't output anything? From cb92f66807d7b4498d919f4b10147a530167ce78 Mon Sep 17 00:00:00 2001 From: Alexander Rodin Date: Sun, 8 Mar 2020 20:34:21 +0300 Subject: [PATCH 06/18] Add a question about `null` values Signed-off-by: Alexander Rodin --- rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md | 1 + 1 file changed, 1 insertion(+) diff --git a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md index 403c3db745aaf..efced88ec4ab0 100644 --- a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md +++ b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md @@ -191,6 +191,7 @@ However, because of the specificity of the observability data, there seems to be * Should timestamps be automatically inserted to created logs and metrics created as tables inside the transform is they are not present? * Are there better alternatives to the proposed solution for supporting of the timestamp type? * Could some users be surprised if the transform which doesn't call `emit` function doesn't output anything? +* `null` might present in the events would be lost because in Lua setting a field to `nil` means deletion. Is it acceptable? If it is not, it is possible to introduce a new kind of `userdata` for representing `null` values. ## Plan of Action From 0577934192fe3df02dada47abf1c97584555e65d Mon Sep 17 00:00:00 2001 From: Alexander Rodin Date: Sun, 8 Mar 2020 21:29:29 +0300 Subject: [PATCH 07/18] Fix typos Signed-off-by: Alexander Rodin --- rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md index efced88ec4ab0..92f34dd42baf2 100644 --- a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md +++ b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md @@ -96,11 +96,10 @@ The fields are accessed through string indexes using [Vector's dot notation](htt previous_timestamp = os.time() emit({ log = { - messge = "starting up", + message = "starting up", timestamp = os.date("!*t), } - Event = Event.new_log() - event["log"]["message"] = "starting up" + }, "auxiliary") """ shutdown = """ final_stats_event = { From 7c092c7cd25ef0ff16df0f0f785e07bc86a24d1a Mon Sep 17 00:00:00 2001 From: Alexander Rodin Date: Tue, 10 Mar 2020 21:23:38 +0300 Subject: [PATCH 08/18] Include suggested changes and clarify details Signed-off-by: Alexander Rodin --- ...6-1999-api-extensions-for-lua-transform.md | 542 +++++++++++++++--- 1 file changed, 455 insertions(+), 87 deletions(-) diff --git a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md index 92f34dd42baf2..0429b241301c5 100644 --- a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md +++ b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md @@ -2,13 +2,33 @@ This RFC proposes a new API for the `lua` transform. +* [Motivation](#motivation) +* [Prior Art](#prior-art) +* [Guide-level Proposal](#guide-level-proposal) + * [Motivating Example](#motivating-example) + * [Possible Configs](#possible-configs) + * [Inline Functions](#inline-functions) + * [Single Source](#single-source) + * [Loadable Module](#loadable-module) +* [Reference-level Proposal](#reference-level-proposal) + * [New Concepts](#new-concepts) + * [Hooks](#hooks) + * [Timers](#timers) + * [Emitting Functions](#emitting-functions) + * [Event Schema](#event-schema) + * [Data Types](#data-types) + * [Configuration](#configuration) +* [Sales Pitch](#sales-pitch) +* [Plan of Action](#plan-of-action) + + ## Motivation Currently, the [`lua` transform](https://vector.dev/docs/reference/transforms/lua/) has some limitations in its API. In particular, the following features are missing: * **Nested Fields** - Currently accessing nested fields is possible using the dot notation: + Currently accessing nested fields is possible using the field path notation: ```lua event["nested.field"] = 5 @@ -65,7 +85,7 @@ Currently, the [`lua` transform](https://vector.dev/docs/reference/transforms/lu ## Prior Art -The implementation of `lua` transform has the following design: +The implementation of `lua` transform supports only log events. Processing of log events has the following design: * There is a `source` parameter which takes a string of code. * When a new event comes in, the global variable `event` is set inside the Lua context and the code from `source` is evaluated. @@ -74,131 +94,479 @@ The implementation of `lua` transform has the following design: Events have type [`userdata`](https://www.lua.org/pil/28.1.html) with custom [metamethods](https://www.lua.org/pil/13.html), so they are views to Vector's events. Thus passing an event to Lua has zero cost, so only when fields are actually accessed the data is copied to Lua. -The fields are accessed through string indexes using [Vector's dot notation](https://vector.dev/docs/about/data-model/log/#dot-notation). +The fields are accessed through string indexes using [Vector's field path notation](https://vector.dev/docs/about/data-model/log/). ## Guide-level Proposal ### Motivating example +The motivating example is a log to metric transform which produces metric events from incoming log events using the following algorithm: + +1. There is an internal counter which is increased on each incoming log event. +2. The log events are discarded. +2. Each 10 seconds the transform produces a metric event with the count of received log events. +4. Edge cases are handled in the following way: + 1. If there are no incoming invents, the metric event with the counter equal to 0 still has to be produced. + 2. On Vector's shutdown the transform has to produce the final metric event with the count of received events since the last flush. + +This example would be used in the following to illustrate different ways to execute the transform. + +### Possible Configs + +Two versions of a config running the same Lua code are listed below, both of them implement the transform described in the motivating example. + +#### Inline Functions + +This config uses Lua functions defined as inline strings. It is easier to get started with runtime transforms. ```toml [transforms.lua] type = "lua" inputs = [] - version = "2" # defaults to 1 - source = """ - counter = counter + 1 - -- without calling `emit` function nothing is produced by default + version = "2" + hooks.init = """ + function init (emit) + event_counter = 0 + emit({ + log = { + message = "starting up" + } + }, "auxiliary") + end """ - [transforms.lua.hooks] - init = """ - counter = 0 - previous_timestamp = os.time() - emit({ - log = { - message = "starting up", - timestamp = os.date("!*t), - } - }, "auxiliary") + hooks.process = """ + function (event, emit) + event_counter = event_counter + 1 + end """ - shutdown = """ - final_stats_event = { - log = { - count = counter, - timestamp = os.date("!*t"), - interval = os.time() - previous_timestamp + hooks.shutdown = """ + function shutdown (emit) + emit { + metric = { + name = "counter_10s", + counter = { + value = event_counter + } + } } - } - final_stats_event.log.stats.rate = final_stats_event["log"]["stats"].count / final_stats_event.log.stats.interval - emit(final_stats_event) - emit({ - log = { - message = "shutting down", - timestamp = os.date("!*t"), + emit({ + log = { + message = "shutting down" + } + }, "auxiliary") + end + """ + [[timers]] + interval_seconds = 10 + handler = """ + function (emit) + emit { + metric = { + name = "counter_10s", + counter = { + value = event_counter + } + } } - }, "auxiliary") + counter = 0 + end """ - [[transforms.lua.timers]] - interval = 10 +``` + +#### Single Source + +This version of the config uses the same Lua code as the config using inline Lua functions above, but all of the functions are defined in a single `source` option: + +```toml +[transforms.lua] + type = "lua" + inputs = [] + version = "2" source = """ - emit { - metric = { - name = "response_time_ms", - timestamp = os.date("!*t"), - kind = "absolute", - tags = { - host = "localhost" - }, - value = { - type = "counter", - value = 24.2 + function init (emit) + event_counter = 0 + emit({ + log = { + message = "starting up" + } + }, "auxiliary") + end + + function process (event, emit) + event_counter = event_counter + 1 + end + + function shutdown (emit) + emit { + metric = { + name = "counter_10s", + counter = { + value = event_counter + } } } - } + + emit({ + log = { + message = "shutting down" + } + }, "auxiliary") + end + + function timer_handler (emit) + emit { + metric = { + name = "counter_10s", + counter = { + value = event_counter + } + } + } + counter = 0 + end """ - [[transforms.lua.timers]] - interval = 60 - source = """ - event = { - log = { - message = "heartbeat", - timestamp = os.date("!*t), + hooks.init = "init" + hooks.process = "process" + hooks.shutdown = "shutdown" + timers = [{interval_seconds = 10, handler = "timer_handler"}] +``` + +#### Loadable Module + +In this example the code from the `source` of the example above is put into a separate file: + +`example_transform.lua` +```lua +function init (emit) + event_counter = 0 + emit({ + log = { + message = "starting up" + } + }, "auxiliary") +end + +function process (event, emit) + event_counter = event_counter + 1 +end + +function shutdown (emit) + emit { + metric = { + name = "counter_10s", + counter = { + value = event_counter } } - emit(event, "auxiliary") - """ + } + + emit({ + log = { + message = "shutting down" + } + }, "auxiliary") +end + +function timer_handler (emit) + emit { + metric = { + name = "counter_10s", + counter = { + value = event_counter + } + } + } + counter = 0 +end +``` + +It reduces the size of the transform configuration: + +```toml +[transforms.lua] + type = "lua" + inputs = [] + version = "2" + search_dirs = ["/example/search/dir"] + source = "require 'example_transform.lua'" + hooks.init = "init" + hooks.process = "process" + hooks.shutdown = "shutdown" + timers = [{interval_seconds = 10, handler = "timer_handler"}] ``` -The code above consumes the incoming events, counts them, and then emits these stats about these counts every 10 seconds. In addition, it sends debug logs about its functioning into a separate lane called `auxiliary`. +## Reference-level Proposal -### Proposed changes +### New Concepts -* Add `version` configuration option which would allow the users to chose between the new API described in this RFC (version 2) and the old one (version 1). -* Hooks for initialization and shutdown called `init` and `shutdown`. They are defined as strings of Lua code in the `hooks` section of the configuration of the transform. -* Timers which define pieces of code that are executed periodically. They are defined in array `timers`, each timer takes two configuration options: `interval` which is the interval for execution in seconds and `source` which is the code which is to be executed periodically. -* Events are produced by the transform by calling function `emit` with the first argument being the event and the second option argument being the name of the lane where to emit the event. Outputting the events by storing them to the `event` global variable should not be supported, so its content would be ignored. -* Support direct access to the nested fields (in both maps and arrays). -* Add support for the timestamp type as a `userdata` object with the same visible fields as in the table returned by [`os.date`](https://www.lua.org/manual/5.3/manual.html#pdf-os.date). In addition, monkey-patch `os.date` function available inside Lua scripts to make it return the same kind of userdata instead of a table if it is called with `*t` or `!*t` as the argument. This is necessary to allow one-to-one correspondence between types in Vector and Lua. +In order to enable writing complex transforms, such as the one from the motivating example, a few new concepts have to be introduced. -## Sales Pitch +#### Hooks -The proposal +Hooks are user-defined functions which are called on certain events. -* gives users more power to create custom transforms; -* supports both logs and metrics; -* makes it possible to add complexity to the configuration of the transform gradually only when needed. +* `init` hook is a function with signature + ```lua + function (emit) + -- ... + end + ``` + which is called when the transform is created. It takes a single argument, `emit` function, which can be used to produce new events from the hook. + +* `shutdown` hook is a function with signature + ```lua + function (emit) + -- ... + end + ``` + which is called when the transform is destroyed, for example on Vector's shutdown. After the shutdown is called, no code from the transform would be called. +* `process` hook is a function with signature + ```lua + function (event, emit) + -- ... + end + ``` + which takes two arguments, an incoming event and the `emit` function. It is called immediately when a new event comes to the transform. -## Drawbacks +#### Timers -The only drawback is that supporting both dot notation and classical indexing makes it impossible to add escaping of dots in field names. For example, for incoming event structure like +Timers are user-defined functions called on predefined time interval. The specified time interval sets the minimal interval between subsequent invocations of the same timer function. -```json -{ - "field.first": { - "second": "value" - } -} +The timer functions have the following signature: + + +```lua +function (emit) + -- ... +end ``` -accessing `event["field.first"]` would return `nil`. +The `emit` argument is an emitting function which allows the timer to produce new events. + +#### Emitting Functions -However, because of the specificity of the observability data, there seems to be no need to have both field names with dots and nested fields. +Emitting function is a function that can be passed to a hook or timer. It has the following signature: -## Outstanding Questions +```lua +function (event, lane) + -- ... +end +``` -* Should timestamps be automatically inserted to created logs and metrics created as tables inside the transform is they are not present? -* Are there better alternatives to the proposed solution for supporting of the timestamp type? -* Could some users be surprised if the transform which doesn't call `emit` function doesn't output anything? -* `null` might present in the events would be lost because in Lua setting a field to `nil` means deletion. Is it acceptable? If it is not, it is possible to introduce a new kind of `userdata` for representing `null` values. +Here `event` is an encoded event to be produced by the transform, and `lane` is an optional parameter specifying the output lane. In order to read events produced by the transform on a certain lane, the downstream components have to use the name of the transform suffixed by `.` character and the name of the lane. + +**Example** +> An emitting function is called from a transform component called `example_transform` with `lane` parameter set to `example_lane`. Then the downstream `console` sink have to be defined as the following to be able to read the emitted event: +> ```toml +> [sinks.example_console] +> type = "console" +> inputs = ["example_transform.example_lane"] # would output the event from `example_lane` +> encoding = "text" +> ``` +> Other components connected to the same transform, but with different lanes names or without lane names at all would not receive any event. + +### Event Schema + +Events passed to the transforms have [`userdata`](https://www.lua.org/pil/28.1.html) type with custom implementation of the [`__index` metamethod](https://www.lua.org/pil/13.4.1.html). This data type is used instead of [`table`](https://www.lua.org/pil/2.5.html) because it allows to avoid copying of the data which is not used. + +Events produced by the transforms through calling an emitting function can have either the same `userdata` type as the events passed to the transform, or be a newly created Lua tables with the same schema outlines below. + +Both log and metrics events are encoded using [external tagging](https://serde.rs/enum-representations.html#externally-tagged). + +* [Log events](https://vector.dev/docs/about/data-model/log/) could be seen as tables created using + + ```lua + { + log = { + -- ... + } + } + ``` + + The content of the `log` field corresponds to the usual [log event](https://vector.dev/docs/about/data-model/log/#examples) structure, with possible nesting of the fields. + + If a log event is created by the user inside the transform is a table, then, if default fields named according to the [global schema](https://vector.dev/docs/reference/global-options/#log_schema) are not present in such a table, then they are automatically added to the event. This rule does not apply to events having `userdata` type. + + **Example 1** + > The global schema is configured so that `message_key` is `"message"`, `timestamp_key` is `"timestamp"`, and `host_key` is is `"instance_id"`. + > + > If a new event is created inside the user-defined Lua code as a table + > + > ```lua + > event = { + > log = { + > message = "example message", + > nested = { + > field = "example nested field value" + > }, + > array = {1, 2, 3}, + > } + > } + > ``` + > and then emitted through an emitting function, Vector would examine its fields and add `timestamp` containing the current timestamp and `instance_id` field with the current hostname. + + **Example 2** + > The global schema has [default settings](https://vector.dev/docs/reference/global-options/#log_schema). + > + > A log event created by `stdin` source is passed to the `process` hook inside the transform, where it appears to have `userdata` type. The Lua code inside the transform deletes the `timestamp` field by setting it to `nil`: + > ```lua + > event.log.timestamp = nil + > ``` + > And then emits the event. In that case Vector would not automatically insert the `timestamp` field. + +* [Metric events](https://vector.dev/docs/about/data-model/metric/) could be seen as tables created using + + ```lua + { + metric = { + -- ... + } + } + ``` + + The content of the `metric` field matches the [metric data model](https://vector.dev/docs/about/data-model/metric). The values use [external tagging](https://serde.rs/enum-representations.html#externally-tagged) with respect to the metric type, see the examples. + + In case when the metric events are created as tables in user-defined code, the following default values are assumed if they are not provided: + + | Field Name | Default Value | + | ----------- | ------------- | + | `timestamp` | Current time | + | `kind` | `absolute` | + | `tags` | empty map | + + Furthermore, for [`aggregated_histogram`](https://vector.dev/docs/about/data-model/metric/#aggregated_histogram) the `count` field inside the `value` map can be omitted. + + + **Example: `counter`** + > The minimal Lua code required to create a counter metric is the following: + > + > ```lua + > { + > metric = { + > name = "example_counter", + > counter = { + > value = 10 + > } + > } + > } + + **Example: `gauge`** + > The minimal Lua code required to create a gauge metric is the following: + > + > ```lua + > { + > metric = { + > name = "example_gauge", + > gauge = { + > value = 10 + > } + > } + > } + + **Example: `set`** + > The minimal Lua code required to create a set metric is the following: + > + > ```lua + > { + > metric = { + > name = "example_set", + > set = { + > values = {"a", "b", "c"} + > } + > } + > } + + **Example: `distribution`** + > The minimal Lua code required to create a distribution metric is the following: + > + > ```lua + > { + > metric = { + > name = "example_distribution", + > distribution = { + > values = {"a", "b", "c"} + > } + > } + > } + + **Example: `aggregated_histogram`** + > The minimal Lua code required to create an aggregated histogram metric is the following: + > + > ```lua + > { + > metric = { + > name = "example_histogram", + > aggregated_histogram = { + > buckets = {1.0, 2.0, 3.0}, + > counts = {30, 20, 10}, + > sum = 1000 -- total sum of all measured values, cannot be inferred from `counts` and `buckets` + > } + > } + > } + > Note that the field [`count`](https://vector.dev/docs/about/data-model/metric/#count) is not required because it can be inferred by Vector automatically by summing up the values from `counts`. + + **Example: `aggregated_summary`** + > The minimal Lua code required to create an aggregated summary metric is the following: + > + > ```lua + > { + > metric = { + > name = "example_summary", + > aggregated_summary = { + > quantiles = {0.25, 0.5, 0.75}, + > values = {1.0, 2.0, 3.0}, + > sum = 200, + > count = 100 + > } + > } + > } + +### Data Types + +The mapping between Vector data types and Lua data types is the following: + +| Vector Type | Lua Type | Comment | +| :----------- | :-------- | :------- | +| [`String`](https://vector.dev/docs/about/data-model/log/#strings) | [`string`](https://www.lua.org/pil/2.4.html) || +| [`Integer`](https://vector.dev/docs/about/data-model/log/#ints) | [`integer`](https://docs.rs/rlua/0.17.0/rlua/type.Integer.html) || +| [`Float`](https://vector.dev/docs/about/data-model/log/#floats) | [`number`](https://docs.rs/rlua/0.17.0/rlua/type.Number.html) || +| [`Boolean`](https://vector.dev/docs/about/data-model/log/#booleans) | [`boolean`](https://www.lua.org/pil/2.2.html) || +| [`Timestamp`](https://vector.dev/docs/about/data-model/log/#timestamps) | [`userdata`](https://www.lua.org/pil/28.1.html) | There is no dedicated timestamp type in Lua. However, there is a standard library function [`os.date`](https://www.lua.org/manual/5.1/manual.html#pdf-os.date) which returns a table with fields `year`, `month`, `day`, `hour`, `min`, `sec`, and some others. Other standard library functions, such as [`os.time`](https://www.lua.org/manual/5.1/manual.html#pdf-os.time), support tables with these fields as arguments. Because of that, Vector timestamps passed to the transform are represented as `userdata` with the same set of accessible fields. In order to have one-to-one correspondence between Vector timestamps and Lua timestamps, `os.date` function from the standard library is patched to return not a table, but `userdata` with the same set of fields as it usually would return instead. This approach makes it possible to have both compatibility with the standard library functions and a dedicated data type for timestamps. | +| [`Null`](https://vector.dev/docs/about/data-model/log/#null-values) | [`nil`](https://www.lua.org/pil/2.1.html) | In Lua setting a table field to `nil` means deletion of this field. Likewise, accessing non-existing field in a Lua table returns `nil` instead of producing an error. So `Null` values which initially were in the event represented as `userdata` and were not modified stay there, as `Null` values, but if one field is assigned to another field with `Null` value, than the assigned field is effectively deleted. | +| [`Map`](https://vector.dev/docs/about/data-model/log/#maps) | [`userdata`](https://www.lua.org/pil/28.1.html) or [`table`](https://www.lua.org/pil/2.5.html) | Maps which are parts of events passed to the transform from Vector have `userdata` type. User-created maps have `table` type. Both types are converted to Vector's `Map` type when they are emitted from the transform. | +| [`Array`](https://vector.dev/docs/about/data-model/log/#arrays) | [`sequence`](https://www.lua.org/pil/11.1.html) | Sequences in Lua are a special case of tables. Because of that fact, the indexes can in principle start from any number. However, the convention in Lua is to to start indexes from 1 instead of 0, so Vector should adhere it. | + +### Configuration + +The new configuration options are the following: + +| Option Name | Required | Example | Description | +| :--------- | :--------: | :------- | :-------- | +| `version` | yes | `2` | In order to use the proposed API, the config has to contain `version` option set to `2`. If it is not provided, Vector assumes that [API version 1](https://vector.dev/docs/reference/transforms/lua/) is used. | +| `search_dirs` | no | `["/etc/vector/lua"]` | A list of directories where [`require`](https://www.lua.org/pil/8.1.html) function would look at if called from any part of the Lua code. | +| `source` | no | `example_module = require("example_module")` | Lua source evaluated when the transform is created. It can call `require` function or define variables and handler functions inline. It is **not** called for each event like the [`source` parameter in version 1 of the transform](https://vector.dev/docs/reference/transforms/lua/#source) | +| `hooks`.`init` | no | `example_function` or `function (emit) ... end` | Contains a Lua expression evaluating to `init` hook function. | +| `hooks`.`shutdown` | no | `example_function` or `function (emit) ... end` | Contains a Lua expression evaluating to `shutdown` hook function. | +| `hooks`.`process` | yes | `example_function` or `function (event, emit) ... end` | Contains a Lua expression evaluating to `shutdown` hook function. | +| `timers` | no | `[{interval_seconds = 10, handler = "example_function"}]` or `[{interval_seconds = 10, handler = "function (emit) ... end"}]` | Contains an [array of tables](https://github.com/toml-lang/toml#user-content-array-of-tables). Each table in the array has two fields, `interval_seconds` which can take an integer number of seconds, and `handler`, which is a Lua expression evaluating to a handler function for the timer. | + +## Sales Pitch + +The proposal + +* gives users more power to create custom transforms; +* supports both logs and metrics; +* makes it possible to add complexity to the configuration of the transform gradually when needed. ## Plan of Action - [ ] Implement support for `version` config option and split implementations for versions 1 and 2. +- [ ] Add support for `userdata` type for timestamps. - [ ] Implement access to the nested structure of logs events. -- [ ] Support creation of logs events as table inside the transform. - [ ] Implement metrics support. -- [ ] Add `emit` function. -- [ ] Add `init` and `shutdown` hooks. -- [ ] Add timers. -- [ ] Implement support for the timestamp type compatible with the result of execution of `os.date("!*t")`. +- [ ] Support creation of events as table inside the transform. +- [ ] Support emitting functions. +- [ ] Implement hooks invocation. +- [ ] Implement timers invocation. +- [ ] Add behavior tests and examples to the documentation. From 357e70c63df8fe78d445366c23f42b200d7183e2 Mon Sep 17 00:00:00 2001 From: Alexander Rodin Date: Wed, 11 Mar 2020 19:19:14 +0300 Subject: [PATCH 09/18] Add an additional example Signed-off-by: Alexander Rodin --- ...6-1999-api-extensions-for-lua-transform.md | 61 +++++++++++++++---- 1 file changed, 50 insertions(+), 11 deletions(-) diff --git a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md index 0429b241301c5..e2e0df8add2ae 100644 --- a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md +++ b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md @@ -4,9 +4,9 @@ This RFC proposes a new API for the `lua` transform. * [Motivation](#motivation) * [Prior Art](#prior-art) -* [Guide-level Proposal](#guide-level-proposal) - * [Motivating Example](#motivating-example) - * [Possible Configs](#possible-configs) +* [Motivating Examples](#motivating-examples) + * [Fields Manipulation](#fields-manipulation) + * [Log To Metric](#log-to-metric) * [Inline Functions](#inline-functions) * [Single Source](#single-source) * [Loadable Module](#loadable-module) @@ -96,11 +96,54 @@ Events have type [`userdata`](https://www.lua.org/pil/28.1.html) with custom [me The fields are accessed through string indexes using [Vector's field path notation](https://vector.dev/docs/about/data-model/log/). -## Guide-level Proposal +## Motivating Examples -### Motivating example +### Fields Manipulation -The motivating example is a log to metric transform which produces metric events from incoming log events using the following algorithm: +The following example illustrates fields manipulations with the new approach. + +```toml +[tranforms.lua] + type = "lua" + inputs = [] + version = "2" + hooks.process = """ + function (event, emit) + -- add new field (simple) + event.new_field = "example" + -- add new nested field + -- add new field (nested, overwriting the content of "nested" map) + event.nested = { + field = "example value" + } + -- add new field (nested, to already existing map) + event.nested.another_field = "example value" + -- add new field (nestd, without assumptions about presence of the parent map) + if event.possibly_existing == nil then + event.possibly_existing = {} + end + event.possibly_existing.example_field = "example value" + + -- remove field + event.removed_field = nil + -- remove nested field, but keep parent maps + event.nested.field = nil + -- remove nested field and, if the parent map is empty, the parent map too + event.another_nested.field = nil + if next(event.another_nested) == nil then + event.another_nested = nil + end + + -- rename field from "original_field" to "another_field" + event.original_field, event.another_field = nil, event.original_field + + emit(event) + end + """ + +### Log to Metric + +This example is a log to metric transform which produces metric events from incoming log events using the following algorithm: 1. There is an internal counter which is increased on each incoming log event. 2. The log events are discarded. @@ -109,11 +152,7 @@ The motivating example is a log to metric transform which produces metric events 1. If there are no incoming invents, the metric event with the counter equal to 0 still has to be produced. 2. On Vector's shutdown the transform has to produce the final metric event with the count of received events since the last flush. -This example would be used in the following to illustrate different ways to execute the transform. - -### Possible Configs - -Two versions of a config running the same Lua code are listed below, both of them implement the transform described in the motivating example. +Two versions of a config running the same Lua code are listed below, both of them implement the transform described above. #### Inline Functions From 2d2e839867b7ab0d221f2685aca6dd370f627eca Mon Sep 17 00:00:00 2001 From: Alexander Rodin Date: Wed, 11 Mar 2020 19:19:32 +0300 Subject: [PATCH 10/18] Change `Null` value representation Signed-off-by: Alexander Rodin --- rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md index e2e0df8add2ae..cf65e7ad104c9 100644 --- a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md +++ b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md @@ -572,7 +572,7 @@ The mapping between Vector data types and Lua data types is the following: | [`Float`](https://vector.dev/docs/about/data-model/log/#floats) | [`number`](https://docs.rs/rlua/0.17.0/rlua/type.Number.html) || | [`Boolean`](https://vector.dev/docs/about/data-model/log/#booleans) | [`boolean`](https://www.lua.org/pil/2.2.html) || | [`Timestamp`](https://vector.dev/docs/about/data-model/log/#timestamps) | [`userdata`](https://www.lua.org/pil/28.1.html) | There is no dedicated timestamp type in Lua. However, there is a standard library function [`os.date`](https://www.lua.org/manual/5.1/manual.html#pdf-os.date) which returns a table with fields `year`, `month`, `day`, `hour`, `min`, `sec`, and some others. Other standard library functions, such as [`os.time`](https://www.lua.org/manual/5.1/manual.html#pdf-os.time), support tables with these fields as arguments. Because of that, Vector timestamps passed to the transform are represented as `userdata` with the same set of accessible fields. In order to have one-to-one correspondence between Vector timestamps and Lua timestamps, `os.date` function from the standard library is patched to return not a table, but `userdata` with the same set of fields as it usually would return instead. This approach makes it possible to have both compatibility with the standard library functions and a dedicated data type for timestamps. | -| [`Null`](https://vector.dev/docs/about/data-model/log/#null-values) | [`nil`](https://www.lua.org/pil/2.1.html) | In Lua setting a table field to `nil` means deletion of this field. Likewise, accessing non-existing field in a Lua table returns `nil` instead of producing an error. So `Null` values which initially were in the event represented as `userdata` and were not modified stay there, as `Null` values, but if one field is assigned to another field with `Null` value, than the assigned field is effectively deleted. | +| [`Null`](https://vector.dev/docs/about/data-model/log/#null-values) | empty string | In Lua setting a table field to `nil` means deletion of this field. Furthermore, settin an array element to `nil` leads to deletion of this element. In order to avoid inconsistencies, already present `Null` values are visible represented as empty strings from Lua code, and it is impossible to create a new `Null` value in the user-defined code. | | [`Map`](https://vector.dev/docs/about/data-model/log/#maps) | [`userdata`](https://www.lua.org/pil/28.1.html) or [`table`](https://www.lua.org/pil/2.5.html) | Maps which are parts of events passed to the transform from Vector have `userdata` type. User-created maps have `table` type. Both types are converted to Vector's `Map` type when they are emitted from the transform. | | [`Array`](https://vector.dev/docs/about/data-model/log/#arrays) | [`sequence`](https://www.lua.org/pil/11.1.html) | Sequences in Lua are a special case of tables. Because of that fact, the indexes can in principle start from any number. However, the convention in Lua is to to start indexes from 1 instead of 0, so Vector should adhere it. | From cee2a28b78793623316b6119ec623a5a864fabe7 Mon Sep 17 00:00:00 2001 From: Alexander Rodin Date: Wed, 11 Mar 2020 19:21:30 +0300 Subject: [PATCH 11/18] Close code block Signed-off-by: Alexander Rodin --- rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md | 1 + 1 file changed, 1 insertion(+) diff --git a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md index cf65e7ad104c9..9feaf8234e809 100644 --- a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md +++ b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md @@ -140,6 +140,7 @@ The following example illustrates fields manipulations with the new approach. emit(event) end """ + ``` ### Log to Metric From cdf7c74f4a6248303f1b170deb22ac086ac15cf0 Mon Sep 17 00:00:00 2001 From: Alexander Rodin Date: Wed, 11 Mar 2020 19:33:36 +0300 Subject: [PATCH 12/18] Add a second example for modules loading Signed-off-by: Alexander Rodin --- ...6-1999-api-extensions-for-lua-transform.md | 72 ++++++++++++++++++- 1 file changed, 70 insertions(+), 2 deletions(-) diff --git a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md index 9feaf8234e809..bc774d7415d79 100644 --- a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md +++ b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md @@ -9,7 +9,8 @@ This RFC proposes a new API for the `lua` transform. * [Log To Metric](#log-to-metric) * [Inline Functions](#inline-functions) * [Single Source](#single-source) - * [Loadable Module](#loadable-module) + * [Loadable Module: Global Functions](#loadable-module-global-functions) + * [Loadable Module: Isolated Functions](#loadable-module-isolated-functions) * [Reference-level Proposal](#reference-level-proposal) * [New Concepts](#new-concepts) * [Hooks](#hooks) @@ -272,7 +273,7 @@ This version of the config uses the same Lua code as the config using inline Lua timers = [{interval_seconds = 10, handler = "timer_handler"}] ``` -#### Loadable Module +#### Loadable Module: Global Functions In this example the code from the `source` of the example above is put into a separate file: @@ -336,6 +337,73 @@ It reduces the size of the transform configuration: timers = [{interval_seconds = 10, handler = "timer_handler"}] ``` +#### Loadable Module: Isolated Functions + +The way to create modules in previous example above is simple, but might cause name collisions if there are multiple modules to be loaded. + +It is [recommended](http://lua-users.org/wiki/ModulesTutorial) to create tables for modules and put functions inside them: + +`example_transform.lua` +```lua +local example_transform = {} +local event_counter = 0 +function example_transform.init (emit) + emit({ + log = { + message = "starting up" + } + }, "auxiliary") +end + +function example_transform.process (event, emit) + event_counter = event_counter + 1 +end + +function example_transform.shutdown (emit) + emit { + metric = { + name = "counter_10s", + counter = { + value = event_counter + } + } + } + + emit({ + log = { + message = "shutting down" + } + }, "auxiliary") +end + +function example_transform.timer_handler (emit) + emit { + metric = { + name = "counter_10s", + counter = { + value = event_counter + } + } + } + counter = 0 +end +``` + +Then the transform configuration is the following: + +```toml +[transforms.lua] + type = "lua" + inputs = [] + version = "2" + search_dirs = ["/example/search/dir"] + source = "example_transform = require 'example_transform.lua'" + hooks.init = "example_transform.init" + hooks.process = "example_transform.process" + hooks.shutdown = "example_transform.shutdown" + timers = [{interval_seconds = 10, handler = "example_transform.timer_handler"}] +``` + ## Reference-level Proposal ### New Concepts From 83d21f59b6a66af84dca657a172d3aa19204ff55 Mon Sep 17 00:00:00 2001 From: Alexander Rodin Date: Wed, 11 Mar 2020 19:43:57 +0300 Subject: [PATCH 13/18] Make comments in the example more consistent Signed-off-by: Alexander Rodin --- rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md index bc774d7415d79..662c9a775b305 100644 --- a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md +++ b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md @@ -112,7 +112,6 @@ The following example illustrates fields manipulations with the new approach. function (event, emit) -- add new field (simple) event.new_field = "example" - -- add new nested field -- add new field (nested, overwriting the content of "nested" map) event.nested = { field = "example value" @@ -125,11 +124,11 @@ The following example illustrates fields manipulations with the new approach. end event.possibly_existing.example_field = "example value" - -- remove field + -- remove field (simple) event.removed_field = nil - -- remove nested field, but keep parent maps + -- remove field (nested, keep parent maps) event.nested.field = nil - -- remove nested field and, if the parent map is empty, the parent map too + -- remove field (nested, if the parent map is empty, the parent map is removed too) event.another_nested.field = nil if next(event.another_nested) == nil then event.another_nested = nil From 9780238e45f60e9afa855b5b9f6684d27d23da6f Mon Sep 17 00:00:00 2001 From: Alexander Rodin Date: Wed, 11 Mar 2020 19:59:44 +0300 Subject: [PATCH 14/18] Fix loadable module example Signed-off-by: Alexander Rodin --- rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md index 662c9a775b305..70a08db5790cb 100644 --- a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md +++ b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md @@ -386,6 +386,8 @@ function example_transform.timer_handler (emit) } counter = 0 end + +return example_transform ``` Then the transform configuration is the following: From 06e74ff2c9453d2bcce0c26d3efef896ab0b0f38 Mon Sep 17 00:00:00 2001 From: Alexander Rodin Date: Thu, 12 Mar 2020 16:59:16 +0300 Subject: [PATCH 15/18] Add a note about the version Signed-off-by: Alexander Rodin --- rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md index 70a08db5790cb..e0e610b184a1b 100644 --- a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md +++ b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md @@ -12,6 +12,7 @@ This RFC proposes a new API for the `lua` transform. * [Loadable Module: Global Functions](#loadable-module-global-functions) * [Loadable Module: Isolated Functions](#loadable-module-isolated-functions) * [Reference-level Proposal](#reference-level-proposal) + * [Versions](#versions) * [New Concepts](#new-concepts) * [Hooks](#hooks) * [Timers](#timers) @@ -407,6 +408,12 @@ Then the transform configuration is the following: ## Reference-level Proposal +### Versions + +Lua transform configuration have to be versioned in order to distinguish between the old and the new APIs. + +The old API is identified by version `1` and the new one, which is proposed in the present RFC, is identified by version `2`. The version can be set using a `version` option in the configuration file. During the transitional period, omitting the version should result in using version `1`. After all changes proposed here are implemented and sufficiently tested, version `1` could be deprecated and version `2` used as the default version. + ### New Concepts In order to enable writing complex transforms, such as the one from the motivating example, a few new concepts have to be introduced. From 024aca2cfe452cc057878f375629debae9b0ed19 Mon Sep 17 00:00:00 2001 From: Alexander Rodin Date: Thu, 12 Mar 2020 17:46:57 +0300 Subject: [PATCH 16/18] Use global emitting function `vector.emit` Signed-off-by: Alexander Rodin --- ...6-1999-api-extensions-for-lua-transform.md | 106 ++++++++++-------- 1 file changed, 61 insertions(+), 45 deletions(-) diff --git a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md index e0e610b184a1b..9ded93ecb0506 100644 --- a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md +++ b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md @@ -110,7 +110,7 @@ The following example illustrates fields manipulations with the new approach. inputs = [] version = "2" hooks.process = """ - function (event, emit) + function (event) -- add new field (simple) event.new_field = "example" -- add new field (nested, overwriting the content of "nested" map) @@ -138,7 +138,7 @@ The following example illustrates fields manipulations with the new approach. -- rename field from "original_field" to "another_field" event.original_field, event.another_field = nil, event.original_field - emit(event) + vector.emit(event) end """ ``` @@ -166,9 +166,9 @@ This config uses Lua functions defined as inline strings. It is easier to get st inputs = [] version = "2" hooks.init = """ - function init (emit) + function init () event_counter = 0 - emit({ + vector.emit({ log = { message = "starting up" } @@ -176,13 +176,13 @@ This config uses Lua functions defined as inline strings. It is easier to get st end """ hooks.process = """ - function (event, emit) + function (event) event_counter = event_counter + 1 end """ hooks.shutdown = """ - function shutdown (emit) - emit { + function shutdown () + vector.emit { metric = { name = "counter_10s", counter = { @@ -191,7 +191,7 @@ This config uses Lua functions defined as inline strings. It is easier to get st } } - emit({ + vector.emit({ log = { message = "shutting down" } @@ -201,8 +201,8 @@ This config uses Lua functions defined as inline strings. It is easier to get st [[timers]] interval_seconds = 10 handler = """ - function (emit) - emit { + function () + vector.emit { metric = { name = "counter_10s", counter = { @@ -225,21 +225,21 @@ This version of the config uses the same Lua code as the config using inline Lua inputs = [] version = "2" source = """ - function init (emit) + function init () event_counter = 0 - emit({ + vector.emit({ log = { message = "starting up" } }, "auxiliary") end - function process (event, emit) + function process (event) event_counter = event_counter + 1 end - function shutdown (emit) - emit { + function shutdown () + vector.emit { metric = { name = "counter_10s", counter = { @@ -248,15 +248,15 @@ This version of the config uses the same Lua code as the config using inline Lua } } - emit({ + vector.emit({ log = { message = "shutting down" } }, "auxiliary") end - function timer_handler (emit) - emit { + function timer_handler () + vector.emit { metric = { name = "counter_10s", counter = { @@ -279,21 +279,21 @@ In this example the code from the `source` of the example above is put into a se `example_transform.lua` ```lua -function init (emit) +function init () event_counter = 0 - emit({ + vector.emit({ log = { message = "starting up" } }, "auxiliary") end -function process (event, emit) +function process (event) event_counter = event_counter + 1 end -function shutdown (emit) - emit { +function shutdown () + vector.emit { metric = { name = "counter_10s", counter = { @@ -302,15 +302,15 @@ function shutdown (emit) } } - emit({ + vector.emit({ log = { message = "shutting down" } }, "auxiliary") end -function timer_handler (emit) - emit { +function timer_handler () + vector.emit { metric = { name = "counter_10s", counter = { @@ -345,9 +345,10 @@ It is [recommended](http://lua-users.org/wiki/ModulesTutorial) to create tables `example_transform.lua` ```lua +local emit = vector.emit -- to avoid prefixing the emitting function by "vector." local example_transform = {} local event_counter = 0 -function example_transform.init (emit) +function example_transform.init () emit({ log = { message = "starting up" @@ -355,11 +356,11 @@ function example_transform.init (emit) }, "auxiliary") end -function example_transform.process (event, emit) +function example_transform.process (event) event_counter = event_counter + 1 end -function example_transform.shutdown (emit) +function example_transform.shutdown () emit { metric = { name = "counter_10s", @@ -376,7 +377,7 @@ function example_transform.shutdown (emit) }, "auxiliary") end -function example_transform.timer_handler (emit) +function example_transform.timer_handler () emit { metric = { name = "counter_10s", @@ -424,45 +425,54 @@ Hooks are user-defined functions which are called on certain events. * `init` hook is a function with signature ```lua - function (emit) + function () -- ... end ``` - which is called when the transform is created. It takes a single argument, `emit` function, which can be used to produce new events from the hook. + which is called when the transform is created. + + The body of `init` hook or any functions called from it can call the emitting function `vector.emit`. * `shutdown` hook is a function with signature ```lua - function (emit) + function () -- ... end ``` which is called when the transform is destroyed, for example on Vector's shutdown. After the shutdown is called, no code from the transform would be called. + + The body of `shutdown` hook or any functions called from it can call the emitting function `vector.emit`. + * `process` hook is a function with signature ```lua - function (event, emit) + function (event) -- ... end ``` - which takes two arguments, an incoming event and the `emit` function. It is called immediately when a new event comes to the transform. + which takes a single argument, the incoming event. It is called immediately when a new event comes to the transform. + + The body of `process` hook or any functions called from it can call the emitting function `vector.emit`. #### Timers Timers are user-defined functions called on predefined time interval. The specified time interval sets the minimal interval between subsequent invocations of the same timer function. -The timer functions have the following signature: +The timer handler functions have the following signature: ```lua -function (emit) +function () -- ... end ``` -The `emit` argument is an emitting function which allows the timer to produce new events. +The body of a timer handler or any functions called from it can call the emitting function `vector.emit`. + +#### Emitting Function -#### Emitting Functions +The emitting function is a function called `emit` in a globally exposed module `vector` (can be called as `vector.emit`). -Emitting function is a function that can be passed to a hook or timer. It has the following signature: +It has the following signature: ```lua function (event, lane) @@ -473,7 +483,7 @@ end Here `event` is an encoded event to be produced by the transform, and `lane` is an optional parameter specifying the output lane. In order to read events produced by the transform on a certain lane, the downstream components have to use the name of the transform suffixed by `.` character and the name of the lane. **Example** -> An emitting function is called from a transform component called `example_transform` with `lane` parameter set to `example_lane`. Then the downstream `console` sink have to be defined as the following to be able to read the emitted event: +> The emitting function is called from a transform component called `example_transform` with `lane` parameter set to `example_lane`. Then the downstream `console` sink have to be defined as the following to be able to read the emitted event: > ```toml > [sinks.example_console] > type = "console" @@ -482,6 +492,12 @@ Here `event` is an encoded event to be produced by the transform, and `lane` is > ``` > Other components connected to the same transform, but with different lanes names or without lane names at all would not receive any event. +**Note** + +The emitting function can be called from hooks or timer handlers, but not from the initialization code. + +The reason why it can be called only from them, but not from the initialization code, is that the initialization code has to evaluated when the transform instance is created from the config, in order to display parsing errors to the user before the processing started. On the other hand, in order to emit events, the [`transform_stream`](https://github.com/timberio/vector/blob/7c7c221b70e56135e93ef893faf4a184d109bf15/src/transforms/mod.rs#L66-L83) function has to be called and the output stream has to be created, which happens later. + ### Event Schema Events passed to the transforms have [`userdata`](https://www.lua.org/pil/28.1.html) type with custom implementation of the [`__index` metamethod](https://www.lua.org/pil/13.4.1.html). This data type is used instead of [`table`](https://www.lua.org/pil/2.5.html) because it allows to avoid copying of the data which is not used. @@ -662,10 +678,10 @@ The new configuration options are the following: | `version` | yes | `2` | In order to use the proposed API, the config has to contain `version` option set to `2`. If it is not provided, Vector assumes that [API version 1](https://vector.dev/docs/reference/transforms/lua/) is used. | | `search_dirs` | no | `["/etc/vector/lua"]` | A list of directories where [`require`](https://www.lua.org/pil/8.1.html) function would look at if called from any part of the Lua code. | | `source` | no | `example_module = require("example_module")` | Lua source evaluated when the transform is created. It can call `require` function or define variables and handler functions inline. It is **not** called for each event like the [`source` parameter in version 1 of the transform](https://vector.dev/docs/reference/transforms/lua/#source) | -| `hooks`.`init` | no | `example_function` or `function (emit) ... end` | Contains a Lua expression evaluating to `init` hook function. | -| `hooks`.`shutdown` | no | `example_function` or `function (emit) ... end` | Contains a Lua expression evaluating to `shutdown` hook function. | -| `hooks`.`process` | yes | `example_function` or `function (event, emit) ... end` | Contains a Lua expression evaluating to `shutdown` hook function. | -| `timers` | no | `[{interval_seconds = 10, handler = "example_function"}]` or `[{interval_seconds = 10, handler = "function (emit) ... end"}]` | Contains an [array of tables](https://github.com/toml-lang/toml#user-content-array-of-tables). Each table in the array has two fields, `interval_seconds` which can take an integer number of seconds, and `handler`, which is a Lua expression evaluating to a handler function for the timer. | +| `hooks`.`init` | no | `example_function` or `function () ... end` | Contains a Lua expression evaluating to `init` hook function. | +| `hooks`.`shutdown` | no | `example_function` or `function () ... end` | Contains a Lua expression evaluating to `shutdown` hook function. | +| `hooks`.`process` | yes | `example_function` or `function (event) ... end` | Contains a Lua expression evaluating to `shutdown` hook function. | +| `timers` | no | `[{interval_seconds = 10, handler = "example_function"}]` or `[{interval_seconds = 10, handler = "function () ... end"}]` | Contains an [array of tables](https://github.com/toml-lang/toml#user-content-array-of-tables). Each table in the array has two fields, `interval_seconds` which can take an integer number of seconds, and `handler`, which is a Lua expression evaluating to a handler function for the timer. | ## Sales Pitch From c689e2378bfdd6cef6b21179b594272213f68be2 Mon Sep 17 00:00:00 2001 From: Alexander Rodin Date: Sat, 14 Mar 2020 18:33:02 +0300 Subject: [PATCH 17/18] Revert "Use global emitting function `vector.emit`" This reverts commit 024aca2cfe452cc057878f375629debae9b0ed19. Signed-off-by: Alexander Rodin --- ...6-1999-api-extensions-for-lua-transform.md | 106 ++++++++---------- 1 file changed, 45 insertions(+), 61 deletions(-) diff --git a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md index 9ded93ecb0506..e0e610b184a1b 100644 --- a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md +++ b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md @@ -110,7 +110,7 @@ The following example illustrates fields manipulations with the new approach. inputs = [] version = "2" hooks.process = """ - function (event) + function (event, emit) -- add new field (simple) event.new_field = "example" -- add new field (nested, overwriting the content of "nested" map) @@ -138,7 +138,7 @@ The following example illustrates fields manipulations with the new approach. -- rename field from "original_field" to "another_field" event.original_field, event.another_field = nil, event.original_field - vector.emit(event) + emit(event) end """ ``` @@ -166,9 +166,9 @@ This config uses Lua functions defined as inline strings. It is easier to get st inputs = [] version = "2" hooks.init = """ - function init () + function init (emit) event_counter = 0 - vector.emit({ + emit({ log = { message = "starting up" } @@ -176,13 +176,13 @@ This config uses Lua functions defined as inline strings. It is easier to get st end """ hooks.process = """ - function (event) + function (event, emit) event_counter = event_counter + 1 end """ hooks.shutdown = """ - function shutdown () - vector.emit { + function shutdown (emit) + emit { metric = { name = "counter_10s", counter = { @@ -191,7 +191,7 @@ This config uses Lua functions defined as inline strings. It is easier to get st } } - vector.emit({ + emit({ log = { message = "shutting down" } @@ -201,8 +201,8 @@ This config uses Lua functions defined as inline strings. It is easier to get st [[timers]] interval_seconds = 10 handler = """ - function () - vector.emit { + function (emit) + emit { metric = { name = "counter_10s", counter = { @@ -225,21 +225,21 @@ This version of the config uses the same Lua code as the config using inline Lua inputs = [] version = "2" source = """ - function init () + function init (emit) event_counter = 0 - vector.emit({ + emit({ log = { message = "starting up" } }, "auxiliary") end - function process (event) + function process (event, emit) event_counter = event_counter + 1 end - function shutdown () - vector.emit { + function shutdown (emit) + emit { metric = { name = "counter_10s", counter = { @@ -248,15 +248,15 @@ This version of the config uses the same Lua code as the config using inline Lua } } - vector.emit({ + emit({ log = { message = "shutting down" } }, "auxiliary") end - function timer_handler () - vector.emit { + function timer_handler (emit) + emit { metric = { name = "counter_10s", counter = { @@ -279,21 +279,21 @@ In this example the code from the `source` of the example above is put into a se `example_transform.lua` ```lua -function init () +function init (emit) event_counter = 0 - vector.emit({ + emit({ log = { message = "starting up" } }, "auxiliary") end -function process (event) +function process (event, emit) event_counter = event_counter + 1 end -function shutdown () - vector.emit { +function shutdown (emit) + emit { metric = { name = "counter_10s", counter = { @@ -302,15 +302,15 @@ function shutdown () } } - vector.emit({ + emit({ log = { message = "shutting down" } }, "auxiliary") end -function timer_handler () - vector.emit { +function timer_handler (emit) + emit { metric = { name = "counter_10s", counter = { @@ -345,10 +345,9 @@ It is [recommended](http://lua-users.org/wiki/ModulesTutorial) to create tables `example_transform.lua` ```lua -local emit = vector.emit -- to avoid prefixing the emitting function by "vector." local example_transform = {} local event_counter = 0 -function example_transform.init () +function example_transform.init (emit) emit({ log = { message = "starting up" @@ -356,11 +355,11 @@ function example_transform.init () }, "auxiliary") end -function example_transform.process (event) +function example_transform.process (event, emit) event_counter = event_counter + 1 end -function example_transform.shutdown () +function example_transform.shutdown (emit) emit { metric = { name = "counter_10s", @@ -377,7 +376,7 @@ function example_transform.shutdown () }, "auxiliary") end -function example_transform.timer_handler () +function example_transform.timer_handler (emit) emit { metric = { name = "counter_10s", @@ -425,54 +424,45 @@ Hooks are user-defined functions which are called on certain events. * `init` hook is a function with signature ```lua - function () + function (emit) -- ... end ``` - which is called when the transform is created. - - The body of `init` hook or any functions called from it can call the emitting function `vector.emit`. + which is called when the transform is created. It takes a single argument, `emit` function, which can be used to produce new events from the hook. * `shutdown` hook is a function with signature ```lua - function () + function (emit) -- ... end ``` which is called when the transform is destroyed, for example on Vector's shutdown. After the shutdown is called, no code from the transform would be called. - - The body of `shutdown` hook or any functions called from it can call the emitting function `vector.emit`. - * `process` hook is a function with signature ```lua - function (event) + function (event, emit) -- ... end ``` - which takes a single argument, the incoming event. It is called immediately when a new event comes to the transform. - - The body of `process` hook or any functions called from it can call the emitting function `vector.emit`. + which takes two arguments, an incoming event and the `emit` function. It is called immediately when a new event comes to the transform. #### Timers Timers are user-defined functions called on predefined time interval. The specified time interval sets the minimal interval between subsequent invocations of the same timer function. -The timer handler functions have the following signature: +The timer functions have the following signature: ```lua -function () +function (emit) -- ... end ``` -The body of a timer handler or any functions called from it can call the emitting function `vector.emit`. - -#### Emitting Function +The `emit` argument is an emitting function which allows the timer to produce new events. -The emitting function is a function called `emit` in a globally exposed module `vector` (can be called as `vector.emit`). +#### Emitting Functions -It has the following signature: +Emitting function is a function that can be passed to a hook or timer. It has the following signature: ```lua function (event, lane) @@ -483,7 +473,7 @@ end Here `event` is an encoded event to be produced by the transform, and `lane` is an optional parameter specifying the output lane. In order to read events produced by the transform on a certain lane, the downstream components have to use the name of the transform suffixed by `.` character and the name of the lane. **Example** -> The emitting function is called from a transform component called `example_transform` with `lane` parameter set to `example_lane`. Then the downstream `console` sink have to be defined as the following to be able to read the emitted event: +> An emitting function is called from a transform component called `example_transform` with `lane` parameter set to `example_lane`. Then the downstream `console` sink have to be defined as the following to be able to read the emitted event: > ```toml > [sinks.example_console] > type = "console" @@ -492,12 +482,6 @@ Here `event` is an encoded event to be produced by the transform, and `lane` is > ``` > Other components connected to the same transform, but with different lanes names or without lane names at all would not receive any event. -**Note** - -The emitting function can be called from hooks or timer handlers, but not from the initialization code. - -The reason why it can be called only from them, but not from the initialization code, is that the initialization code has to evaluated when the transform instance is created from the config, in order to display parsing errors to the user before the processing started. On the other hand, in order to emit events, the [`transform_stream`](https://github.com/timberio/vector/blob/7c7c221b70e56135e93ef893faf4a184d109bf15/src/transforms/mod.rs#L66-L83) function has to be called and the output stream has to be created, which happens later. - ### Event Schema Events passed to the transforms have [`userdata`](https://www.lua.org/pil/28.1.html) type with custom implementation of the [`__index` metamethod](https://www.lua.org/pil/13.4.1.html). This data type is used instead of [`table`](https://www.lua.org/pil/2.5.html) because it allows to avoid copying of the data which is not used. @@ -678,10 +662,10 @@ The new configuration options are the following: | `version` | yes | `2` | In order to use the proposed API, the config has to contain `version` option set to `2`. If it is not provided, Vector assumes that [API version 1](https://vector.dev/docs/reference/transforms/lua/) is used. | | `search_dirs` | no | `["/etc/vector/lua"]` | A list of directories where [`require`](https://www.lua.org/pil/8.1.html) function would look at if called from any part of the Lua code. | | `source` | no | `example_module = require("example_module")` | Lua source evaluated when the transform is created. It can call `require` function or define variables and handler functions inline. It is **not** called for each event like the [`source` parameter in version 1 of the transform](https://vector.dev/docs/reference/transforms/lua/#source) | -| `hooks`.`init` | no | `example_function` or `function () ... end` | Contains a Lua expression evaluating to `init` hook function. | -| `hooks`.`shutdown` | no | `example_function` or `function () ... end` | Contains a Lua expression evaluating to `shutdown` hook function. | -| `hooks`.`process` | yes | `example_function` or `function (event) ... end` | Contains a Lua expression evaluating to `shutdown` hook function. | -| `timers` | no | `[{interval_seconds = 10, handler = "example_function"}]` or `[{interval_seconds = 10, handler = "function () ... end"}]` | Contains an [array of tables](https://github.com/toml-lang/toml#user-content-array-of-tables). Each table in the array has two fields, `interval_seconds` which can take an integer number of seconds, and `handler`, which is a Lua expression evaluating to a handler function for the timer. | +| `hooks`.`init` | no | `example_function` or `function (emit) ... end` | Contains a Lua expression evaluating to `init` hook function. | +| `hooks`.`shutdown` | no | `example_function` or `function (emit) ... end` | Contains a Lua expression evaluating to `shutdown` hook function. | +| `hooks`.`process` | yes | `example_function` or `function (event, emit) ... end` | Contains a Lua expression evaluating to `shutdown` hook function. | +| `timers` | no | `[{interval_seconds = 10, handler = "example_function"}]` or `[{interval_seconds = 10, handler = "function (emit) ... end"}]` | Contains an [array of tables](https://github.com/toml-lang/toml#user-content-array-of-tables). Each table in the array has two fields, `interval_seconds` which can take an integer number of seconds, and `handler`, which is a Lua expression evaluating to a handler function for the timer. | ## Sales Pitch From cfbe3ee309c95acd526884b0e7d7e0ff9fb5dcd5 Mon Sep 17 00:00:00 2001 From: Alexander Rodin Date: Sat, 14 Mar 2020 18:48:05 +0300 Subject: [PATCH 18/18] Fix Markdown linting errors Signed-off-by: Alexander Rodin --- ...6-1999-api-extensions-for-lua-transform.md | 62 ++++++++++++------- 1 file changed, 38 insertions(+), 24 deletions(-) diff --git a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md index e0e610b184a1b..4fcb0c00cabdb 100644 --- a/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md +++ b/rfcs/2020-03-06-1999-api-extensions-for-lua-transform.md @@ -5,21 +5,21 @@ This RFC proposes a new API for the `lua` transform. * [Motivation](#motivation) * [Prior Art](#prior-art) * [Motivating Examples](#motivating-examples) - * [Fields Manipulation](#fields-manipulation) - * [Log To Metric](#log-to-metric) - * [Inline Functions](#inline-functions) - * [Single Source](#single-source) - * [Loadable Module: Global Functions](#loadable-module-global-functions) - * [Loadable Module: Isolated Functions](#loadable-module-isolated-functions) + * [Fields Manipulation](#fields-manipulation) + * [Log To Metric](#log-to-metric) + * [Inline Functions](#inline-functions) + * [Single Source](#single-source) + * [Loadable Module: Global Functions](#loadable-module-global-functions) + * [Loadable Module: Isolated Functions](#loadable-module-isolated-functions) * [Reference-level Proposal](#reference-level-proposal) - * [Versions](#versions) - * [New Concepts](#new-concepts) - * [Hooks](#hooks) - * [Timers](#timers) - * [Emitting Functions](#emitting-functions) - * [Event Schema](#event-schema) - * [Data Types](#data-types) - * [Configuration](#configuration) + * [Versions](#versions) + * [New Concepts](#new-concepts) + * [Hooks](#hooks) + * [Timers](#timers) + * [Emitting Functions](#emitting-functions) + * [Event Schema](#event-schema) + * [Data Types](#data-types) + * [Configuration](#configuration) * [Sales Pitch](#sales-pitch) * [Plan of Action](#plan-of-action) @@ -278,6 +278,7 @@ This version of the config uses the same Lua code as the config using inline Lua In this example the code from the `source` of the example above is put into a separate file: `example_transform.lua` + ```lua function init (emit) event_counter = 0 @@ -344,6 +345,7 @@ The way to create modules in previous example above is simple, but might cause n It is [recommended](http://lua-users.org/wiki/ModulesTutorial) to create tables for modules and put functions inside them: `example_transform.lua` + ```lua local example_transform = {} local event_counter = 0 @@ -423,26 +425,32 @@ In order to enable writing complex transforms, such as the one from the motivati Hooks are user-defined functions which are called on certain events. * `init` hook is a function with signature + ```lua function (emit) -- ... end ``` + which is called when the transform is created. It takes a single argument, `emit` function, which can be used to produce new events from the hook. * `shutdown` hook is a function with signature + ```lua function (emit) -- ... end ``` + which is called when the transform is destroyed, for example on Vector's shutdown. After the shutdown is called, no code from the transform would be called. * `process` hook is a function with signature + ```lua function (event, emit) -- ... end ``` + which takes two arguments, an incoming event and the `emit` function. It is called immediately when a new event comes to the transform. #### Timers @@ -472,14 +480,17 @@ end Here `event` is an encoded event to be produced by the transform, and `lane` is an optional parameter specifying the output lane. In order to read events produced by the transform on a certain lane, the downstream components have to use the name of the transform suffixed by `.` character and the name of the lane. -**Example** +##### Example + > An emitting function is called from a transform component called `example_transform` with `lane` parameter set to `example_lane`. Then the downstream `console` sink have to be defined as the following to be able to read the emitted event: +> > ```toml > [sinks.example_console] > type = "console" > inputs = ["example_transform.example_lane"] # would output the event from `example_lane` > encoding = "text" > ``` +> > Other components connected to the same transform, but with different lanes names or without lane names at all would not receive any event. ### Event Schema @@ -520,15 +531,18 @@ Both log and metrics events are encoded using [external tagging](https://serde.r > } > } > ``` + > > and then emitted through an emitting function, Vector would examine its fields and add `timestamp` containing the current timestamp and `instance_id` field with the current hostname. **Example 2** > The global schema has [default settings](https://vector.dev/docs/reference/global-options/#log_schema). > > A log event created by `stdin` source is passed to the `process` hook inside the transform, where it appears to have `userdata` type. The Lua code inside the transform deletes the `timestamp` field by setting it to `nil`: + > > ```lua > event.log.timestamp = nil > ``` + > > And then emits the event. In that case Vector would not automatically insert the `timestamp` field. * [Metric events](https://vector.dev/docs/about/data-model/metric/) could be seen as tables created using @@ -677,12 +691,12 @@ The proposal ## Plan of Action -- [ ] Implement support for `version` config option and split implementations for versions 1 and 2. -- [ ] Add support for `userdata` type for timestamps. -- [ ] Implement access to the nested structure of logs events. -- [ ] Implement metrics support. -- [ ] Support creation of events as table inside the transform. -- [ ] Support emitting functions. -- [ ] Implement hooks invocation. -- [ ] Implement timers invocation. -- [ ] Add behavior tests and examples to the documentation. +* [ ] Implement support for `version` config option and split implementations for versions 1 and 2. +* [ ] Add support for `userdata` type for timestamps. +* [ ] Implement access to the nested structure of logs events. +* [ ] Implement metrics support. +* [ ] Support creation of events as table inside the transform. +* [ ] Support emitting functions. +* [ ] Implement hooks invocation. +* [ ] Implement timers invocation. +* [ ] Add behavior tests and examples to the documentation.