Skip to content

feat(new transform): add type filter transform#1998

Closed
lukesteensen wants to merge 1 commit intomasterfrom
type-filter
Closed

feat(new transform): add type filter transform#1998
lukesteensen wants to merge 1 commit intomasterfrom
type-filter

Conversation

@lukesteensen
Copy link
Member

Ref #421, #1153

This introduces a simple type_filter transform with the goal of unblocking sources that produce both logs and metrics. Previously, there was no way to go from an Any-typed input to a Log or Metric-typed component.

Unfortunately, this doesn't yet allow us to mark the vector source as Any-typed because this would be a breaking change (existing uses as a Log source would become invalid). It's also not a great UX to require users to configure this transform explicitly.

As a followup to this, I'm planning to introduce more functionality for "expanding" parts of the configured topology. We have some of this as part of swimlanes, but with the ability to take your input types into account we could automatically insert this transform where appropriate. I've been experimenting with different ways of doing that, and will run strategies by the team before committing to any particular implementation. If it turns out to be too complex for right now, we can fall back to just doing the breaking change to the vector source's type and having users use this manually.

Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
@binarylogic
Copy link
Contributor

Interesting. I'm sure you have this in mind, but I'm curious if we could automatically filter input based on the supported types. For example:

[sources.vector]
  type = "vector"

[sinks.prom]
  type = "prometheus"
  inputs = ["vector"]

[sinks.stackdriver_logging]
  type = "stackdriver_logging"
  inputs = ["vector"]

The prometheus sink would discard logs automatically and the stackdriver_logging would discard metrics automatically. If a component supported both types, and a user only wanted one, they could then set up this filter.

Finally, I'm curious if this is better solved with the swimlanes transform? @Jeffail added an is_log and is_metric condition in #1950.

@lukesteensen
Copy link
Member Author

I'm sure you have this in mind, but I'm curious if we could automatically filter input based on the supported types.

Yes, this is pretty much what I was alluding to. The basic problem right now is that we don't have a place to (1) see what type of input we'll be getting, and (2) do type-based filtering before events get to the transform implementation.

I explored solutions to (1) a bit, and I think we could come up with something interesting by better integrating type checking into Config itself (it currently works off its own simpler representation). Right now, the place where we know input types is not one where we can modify the topology in any way.

Also slightly tricky is (2), since it gets into the way our Transform and Sink traits are defined. Both are quite "raw" right now, which doesn't really give us a place to do any automatic filtering. We've also built every component to this point under the assumption that the type checker is ensuring we only get the desired input types, so we'd need to make sure that's still the case if we loosen the type system up and start relying on automatic filters.

Both of those areas involve more design thinking about core parts of our model, so I wanted to get this out there are a building block first.

We could also modify the vector source with an option for what type of data it should forward, defaulting to the existing logs. That would maintain backwards-compat while opening the door for users to begin using metrics or any + this transform.

@Jeffail
Copy link
Contributor

Jeffail commented Mar 9, 2020

It'd be nice to spare users from needing to add a type_filter manually.

We could consider setting a precedent here where any component with an output type Any automatically has two type filter transforms attached to it called <name>.logs and <name>.metrics. We can do this during expand_macros, omitting the filters if they aren't consumed.

That would allow us to do stuff like:

[sources.foo]
  type = "vector"

[transforms.bar]
  inputs = ["foo.logs"]

And I think is easy enough to document for all Any components. This makes it both more ergonomic and also still explicit that non-matching typed events are being dropped.

In the case of name collisions (a user actually adding a <name>.logs component) we would need to catch it and give a useful error message like component 'foo' has a pseudo output 'logs' that collides with this name.

We could then go one step further and during expand_macros we detect when a component with a Log or Metric input type consumes from an Any component and automatically switch the input to <name>.logs or <name>.metrics respectively.

Then the above config could go back to:

[sources.foo]
  type = "vector"

[transforms.bar]
  inputs = ["foo"]

This takes the burden off the user entirely, but also at the cost of implicitly dropping events. We could mitigate that by giving warning logs whenever an Any component doesn't have a consumer for one of its types: component 'foo' metric events do not have a consumer and will be dropped.

@binarylogic binarylogic self-requested a review March 9, 2020 15:16
@binarylogic binarylogic self-assigned this Mar 9, 2020
@binarylogic
Copy link
Contributor

Just noting, I'd like to be involved in the final UX decisions this change introduces.

@lukesteensen
Copy link
Member Author

@Jeffail

We could consider setting a precedent here where any component with an output type Any automatically has two type filter transforms attached to it called <name>.logs and <name>.metrics.

This is an interesting idea. It'd slightly complicate cases where you wanted to go from an Any source to an Any sink/transform (i.e. you'd need to add both inputs), but that seems like a reasonable tradeoff. A little trickier is that it'd break backwards-compat if we didn't maintain the non-suffixed log output, but that might be something we're willing to sacrifice.

In the case of name collisions (a user actually adding a <name>.logs component) we would need to catch it and give a useful error message like component 'foo' has a pseudo output 'logs' that collides with this name.

Do we handle this with swimlanes right now? It does seem important if we're going to expand (heh) the use of this pattern. I'm trying to get a feel for what level of change these different options would require to our ability to analyze topologies.

We could then go one step further and during expand_macros we detect when a component with a Log or Metric input type consumes from an Any component and automatically switch the input to <name>.logs or <name>.metrics respectively.

This is what I'd really like, but runs into the issue that we don't currently have a place to do this kind of analysis and apply changes based on the results. It's essentially blocked on some larger rethink of our topology building.

@Hoverbear Hoverbear added domain: transforms Anything related to Vector's transform components type: new feature labels Mar 15, 2020
@binarylogic
Copy link
Contributor

We could consider setting a precedent here where any component with an output type Any automatically has two type filter transforms attached to it called <name>.logs and <name>.metrics

Meh, I'm not a big fan of this, specifically for the reason @lukesteensen mentioned. Going from any to any should be as simple as connecting the components.

We could then go one step further and during expand_macros we detect when a component with a Log or Metric input type consumes from an Any component and automatically switch the input to <name>.logs or <name>.metrics respectively.

This is my preferred solution. I think this is a simple and clear UX.

@lukesteensen
Copy link
Member Author

We could then go one step further and during expand_macros we detect when a component with a Log or Metric input type consumes from an Any component and automatically switch the input to <name>.logs or <name>.metrics respectively.

This is my preferred solution. I think this is a simple and clear UX.

I agree, and this is exactly the type of thing we discussed needing some rework to our internal topology building.

Given that we'll want to do some deeper changes, I'm going to close this for the time being. Once some other in-flight work is complete, we can revisit with a full RFC.

@binarylogic binarylogic added type: feature A value-adding code addition that introduce new functionality. and removed type: new feature labels Jun 16, 2020
@binarylogic binarylogic deleted the type-filter branch July 23, 2020 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: transforms Anything related to Vector's transform components type: feature A value-adding code addition that introduce new functionality.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants