feat(new transform): add type filter transform by lukesteensen · Pull Request #1998 · vectordotdev/vector

lukesteensen · 2020-03-06T16:33:46Z

This introduces a simple type_filter transform with the goal of unblocking sources that produce both logs and metrics. Previously, there was no way to go from an Any-typed input to a Log or Metric-typed component.

Unfortunately, this doesn't yet allow us to mark the vector source as Any-typed because this would be a breaking change (existing uses as a Log source would become invalid). It's also not a great UX to require users to configure this transform explicitly.

As a followup to this, I'm planning to introduce more functionality for "expanding" parts of the configured topology. We have some of this as part of swimlanes, but with the ability to take your input types into account we could automatically insert this transform where appropriate. I've been experimenting with different ways of doing that, and will run strategies by the team before committing to any particular implementation. If it turns out to be too complex for right now, we can fall back to just doing the breaking change to the vector source's type and having users use this manually.

Signed-off-by: Luke Steensen <luke.steensen@gmail.com>

binarylogic · 2020-03-06T18:51:57Z

Interesting. I'm sure you have this in mind, but I'm curious if we could automatically filter input based on the supported types. For example:

[sources.vector]
  type = "vector"

[sinks.prom]
  type = "prometheus"
  inputs = ["vector"]

[sinks.stackdriver_logging]
  type = "stackdriver_logging"
  inputs = ["vector"]

The prometheus sink would discard logs automatically and the stackdriver_logging would discard metrics automatically. If a component supported both types, and a user only wanted one, they could then set up this filter.

Finally, I'm curious if this is better solved with the swimlanes transform? @Jeffail added an is_log and is_metric condition in #1950.

lukesteensen · 2020-03-06T19:17:08Z

I'm sure you have this in mind, but I'm curious if we could automatically filter input based on the supported types.

Yes, this is pretty much what I was alluding to. The basic problem right now is that we don't have a place to (1) see what type of input we'll be getting, and (2) do type-based filtering before events get to the transform implementation.

I explored solutions to (1) a bit, and I think we could come up with something interesting by better integrating type checking into Config itself (it currently works off its own simpler representation). Right now, the place where we know input types is not one where we can modify the topology in any way.

Also slightly tricky is (2), since it gets into the way our Transform and Sink traits are defined. Both are quite "raw" right now, which doesn't really give us a place to do any automatic filtering. We've also built every component to this point under the assumption that the type checker is ensuring we only get the desired input types, so we'd need to make sure that's still the case if we loosen the type system up and start relying on automatic filters.

Both of those areas involve more design thinking about core parts of our model, so I wanted to get this out there are a building block first.

We could also modify the vector source with an option for what type of data it should forward, defaulting to the existing logs. That would maintain backwards-compat while opening the door for users to begin using metrics or any + this transform.

Jeffail · 2020-03-09T11:03:33Z

It'd be nice to spare users from needing to add a type_filter manually.

We could consider setting a precedent here where any component with an output type Any automatically has two type filter transforms attached to it called <name>.logs and <name>.metrics. We can do this during expand_macros, omitting the filters if they aren't consumed.

That would allow us to do stuff like:

[sources.foo]
  type = "vector"

[transforms.bar]
  inputs = ["foo.logs"]

And I think is easy enough to document for all Any components. This makes it both more ergonomic and also still explicit that non-matching typed events are being dropped.

In the case of name collisions (a user actually adding a <name>.logs component) we would need to catch it and give a useful error message like component 'foo' has a pseudo output 'logs' that collides with this name.

We could then go one step further and during expand_macros we detect when a component with a Log or Metric input type consumes from an Any component and automatically switch the input to <name>.logs or <name>.metrics respectively.

Then the above config could go back to:

[sources.foo]
  type = "vector"

[transforms.bar]
  inputs = ["foo"]

This takes the burden off the user entirely, but also at the cost of implicitly dropping events. We could mitigate that by giving warning logs whenever an Any component doesn't have a consumer for one of its types: component 'foo' metric events do not have a consumer and will be dropped.

binarylogic · 2020-03-09T15:16:46Z

Just noting, I'd like to be involved in the final UX decisions this change introduces.

lukesteensen · 2020-03-12T21:13:24Z

@Jeffail

We could consider setting a precedent here where any component with an output type Any automatically has two type filter transforms attached to it called <name>.logs and <name>.metrics.

This is an interesting idea. It'd slightly complicate cases where you wanted to go from an Any source to an Any sink/transform (i.e. you'd need to add both inputs), but that seems like a reasonable tradeoff. A little trickier is that it'd break backwards-compat if we didn't maintain the non-suffixed log output, but that might be something we're willing to sacrifice.

In the case of name collisions (a user actually adding a <name>.logs component) we would need to catch it and give a useful error message like component 'foo' has a pseudo output 'logs' that collides with this name.

Do we handle this with swimlanes right now? It does seem important if we're going to expand (heh) the use of this pattern. I'm trying to get a feel for what level of change these different options would require to our ability to analyze topologies.

We could then go one step further and during expand_macros we detect when a component with a Log or Metric input type consumes from an Any component and automatically switch the input to <name>.logs or <name>.metrics respectively.

This is what I'd really like, but runs into the issue that we don't currently have a place to do this kind of analysis and apply changes based on the results. It's essentially blocked on some larger rethink of our topology building.

binarylogic · 2020-03-15T16:48:20Z

We could consider setting a precedent here where any component with an output type Any automatically has two type filter transforms attached to it called <name>.logs and <name>.metrics

Meh, I'm not a big fan of this, specifically for the reason @lukesteensen mentioned. Going from any to any should be as simple as connecting the components.

We could then go one step further and during expand_macros we detect when a component with a Log or Metric input type consumes from an Any component and automatically switch the input to <name>.logs or <name>.metrics respectively.

This is my preferred solution. I think this is a simple and clear UX.

lukesteensen · 2020-03-16T15:07:00Z

We could then go one step further and during expand_macros we detect when a component with a Log or Metric input type consumes from an Any component and automatically switch the input to <name>.logs or <name>.metrics respectively.

This is my preferred solution. I think this is a simple and clear UX.

I agree, and this is exactly the type of thing we discussed needing some rework to our internal topology building.

Given that we'll want to do some deeper changes, I'm going to close this for the time being. Once some other in-flight work is complete, we can revisit with a full RFC.

feat(new transform): add type filter transform

1dc9994

Signed-off-by: Luke Steensen <luke.steensen@gmail.com>

binarylogic requested a review from Jeffail March 6, 2020 18:53

binarylogic assigned Jeffail Mar 6, 2020

binarylogic self-requested a review March 9, 2020 15:16

binarylogic self-assigned this Mar 9, 2020

Hoverbear added domain: transforms Anything related to Vector's transform components type: new feature labels Mar 15, 2020

Hoverbear assigned lukesteensen Mar 15, 2020

lukesteensen closed this Mar 16, 2020

binarylogic mentioned this pull request Apr 6, 2020

enhancement(lua transform): Implement all hooks and timers in version 2 #2126

Merged

binarylogic added type: feature A value-adding code addition that introduce new functionality. and removed type: new feature labels Jun 16, 2020

binarylogic deleted the type-filter branch July 23, 2020 17:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(new transform): add type filter transform#1998

feat(new transform): add type filter transform#1998
lukesteensen wants to merge 1 commit intomasterfrom
type-filter

lukesteensen commented Mar 6, 2020

Uh oh!

binarylogic commented Mar 6, 2020

Uh oh!

lukesteensen commented Mar 6, 2020

Uh oh!

Jeffail commented Mar 9, 2020

Uh oh!

binarylogic commented Mar 9, 2020

Uh oh!

lukesteensen commented Mar 12, 2020

Uh oh!

binarylogic commented Mar 15, 2020

Uh oh!

lukesteensen commented Mar 16, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

lukesteensen commented Mar 6, 2020

Uh oh!

binarylogic commented Mar 6, 2020

Uh oh!

lukesteensen commented Mar 6, 2020

Uh oh!

Jeffail commented Mar 9, 2020

Uh oh!

binarylogic commented Mar 9, 2020

Uh oh!

lukesteensen commented Mar 12, 2020

Uh oh!

binarylogic commented Mar 15, 2020

Uh oh!

lukesteensen commented Mar 16, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants