Skip to content

Add "downstream_nodes" optional config settings to the fault filter#1047

Merged
RomanDzhabarov merged 11 commits into
masterfrom
filter
Jun 8, 2017
Merged

Add "downstream_nodes" optional config settings to the fault filter#1047
RomanDzhabarov merged 11 commits into
masterfrom
filter

Conversation

@RomanDzhabarov
Copy link
Copy Markdown
Member

@RomanDzhabarov RomanDzhabarov commented Jun 5, 2017

Support "downstream_nodes" to allow injections only for specific downstream nodes, one of the use cases is to support fault injections for a canary downstream node only.

Allow reading downstream specific settings for faults.

@lyft/network-team

"upstream_cluster" : "...",
"headers" : []
"headers" : [],
"match_downstream_cluster": false,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/false/"..."

* http.fault.<downstream-cluster>.abort.http_status
* http.fault.<downstream-cluster>.delay.fixed_delay_percent
* http.fault.<downstream-cluster>.delay.fixed_duration_ms
In case the following settings are not found in the runtime it defaults to the config settings.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/In case the following/if the


delays_injected, Counter, Total requests that were delayed
aborts_injected, Counter, Total requests that were aborted
<downstream-cluster>.delays_injected, Counter, Total delayed requests for the given downstream cluster
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep sentence consistent with above wording? Total requests that were delayed for the given downstream cluster. Same below.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes a table (when rendered) expand a lot, i prefer to keep the sentence shorter.


.. _config_http_filters_fault_injection_match_downstream_cluster:

match_downstream_cluster:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got hung up on the logic below. I think the original cause was this name. I think we should change this name, and related function to something else: downstream_cluster_specific_settings or something like that.

config_->delayPercent())) {
uint64_t duration_ms = config_->runtime().snapshot().getInteger(
"fault.http.delay.fixed_duration_ms", config_->delayDuration());
if (config_->matchDownstreamCluster()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per comment above I think this name confuses the point of the flag. Which is: get specific cluster settings.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, will change

junr03
junr03 previously approved these changes Jun 6, 2017
Copy link
Copy Markdown
Member

@junr03 junr03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@RomanDzhabarov
Copy link
Copy Markdown
Member Author

@htuch @mattklein123 can i get the review


.. _config_http_filters_fault_injection_downstream_cluster:

downstream_cluster_specific_settings:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general it seems odd to me that this needs to be specified at all. I would just always look for downstream cluster variants if the header is set, and use the default settings as the defaults for the lookups. Is there any particular reason to have this be configurable separately?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows us to handle cases when we'd like to get settings based on other input params (not only downstream cluster) and make decisions based only on those, having it as a config setting directly expresses the intent.

I'd vote for explicit control via configuration and avoid unnecessary lookups (not everyone wants to do downstream cluster specific conf).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The defaults for the lookups for the downstream specific cluster runtime lookups can be taken from the main config. So if they aren't present, the behavior will be the same. I really don't see people wanting to do 2 runtime lookups to get a default out of runtime, then override with a cluster specific one. So I think we can collapse this logic and make it simpler. Let's go ahead and do that.

// nodes. If not set then inject for all.
Runtime::Loader& runtime_;
FaultFilterStats stats_;
std::string stats_prefix_;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will update

Event::TimerPtr delay_timer_;
std::string downstream_cluster_{};

std::string delay_percent_key_{"fault.http.delay.fixed_delay_percent"};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const for all of these

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those could not be const in current impl because it depends on whether downstream cluster specific settings are used of not.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, well per above comment, if we are going to generalize and just do one runtime lookup, these don't need to be local variables anyway. They can just all be local variables inside the lookup function.

@mattklein123
Copy link
Copy Markdown
Member

mattklein123 commented Jun 7, 2017

@RomanDzhabarov in thinking about this more, I think we still probably want a runtime generalizable way of doing global injection vs. cluster specific injection. Perhaps we should first look up cluster specific, and if no fault/delay is specified, look for global? In that case, the default runtime global keys can be static/const. I think that would be better and give us full control?

@RomanDzhabarov
Copy link
Copy Markdown
Member Author

Yes, I agree we should use downstream runtime settings if present, otherwise use global runtime settings if present or use config settings.

Copy link
Copy Markdown
Member

@rshriram rshriram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At high level, the logic for downstream specific fault injection is fine. But do we need to have it as a first class entity in the configuration? Why can't we match the nodes using headers alone?

not set, faults are injected for all downstream nodes. Downstream node name is taken from
:ref:`the HTTP x-envoy-downstream-service-node <config_http_conn_man_headers_downstream-service-node>`
header and compared against downstream_nodes list.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't this be done as header match? Does it have to be a specific option?
Same applies to downstream_cluster as well.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to make "downstream-service-node" runtime controllable (so i can switch between some nodes to global easily), which should be very easy to add on top of the current functionality.

arguably we probably do not need runtime control over all headers match and it's going to be a bigger change. Let me know your thoughts.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is perfectly fine. But what I was driving at was, do we need to add a config element for this?
Why can't this be done in the followign way:

fault_filter1:
  headers:
    x-envoy-downstream-service-node: canary1
  delay:
    duration_ms: 1000
fault_filter2:
  headers:
    x-envoy-downstream-service-node: canary2
  delay:
    duration_ms: 1000

The logic from lines 96-110 in fault_filter.cc can stay as it is [the one that checks if the downstream cluster name in the header is present as part of the runtime with overrides for delay and abort]

Make sense? Essentially piggyback on the header match itself, instead of adding a new config element. If there is a header match for X-Envoy-downstream-cluster, then pull it out, check with runtime, etc.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see how the schema above when we have two filters with different nodes in the "x-envoy-downstream-service-node" allows us to switch to the mode where we do not care about downstream node at all.

For example, I'm running a test where I inject failures on "canary" only node and now I want to switch to the mode where I do not care about downstream-node at all. How would I do that in the current header matching logic?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay. Now I get it. I think this explanation could go into the docs or somewhere, to clarify the use case.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, i'll update it!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once this PR is in, I'll shoot another one with the runtime support and doc update (tomorrow/monday)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect at some point this is going to need to be made regex capable to be generally useful. I would add a TODO for that. I think it's fine to do this in v1. We can possibly add when we add runtime support.

}

if (!matchesDownstreamNodes(headers)) {
return FilterHeadersStatus::Continue;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I was worried about. We are doing two header match passes. We could fold this into a single header match, can't we? It makes the configuration simple as well.

With this PR, I can do match based on downstreams either using the header or using the downstream_nodes option. Instead of supporting multiple ways for doing a single thing, I would suggest providing a single well defined way of achieving X, whatever it is.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with this. @RomanDzhabarov for our use, can't we just use header match and then just add runtime control to whether we match on headers or not? Seems like there should be some compromise here that doesn't do two separate things.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright we talked offline, will just do the full runtime solution and de-dup.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, talked again. This is kind of a far reaching change. To unblock let's go with this and we will add TODO to generalize and clean this up.

uint64_t duration_ms = delayDuration();

// Delay only if the duration is >0ms
if (0 != duration_ms) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are going to provide a isDelayEnabled() utility function, I suggest folding this condition into that utility. Saves an unnecessary if check and makes code simple.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense

@RomanDzhabarov RomanDzhabarov changed the title Add "downstream_nodes" and "match_downstream_cluster" optional config settings to the fault filter Add "downstream_nodes" optional config settings to the fault filter Jun 8, 2017
@RomanDzhabarov
Copy link
Copy Markdown
Member Author

@mattklein123 addressed comments.

const std::string& upstreamCluster() { return upstream_cluster_; }
Runtime::Loader& runtime() { return runtime_; }
FaultFilterStats& stats() { return stats_; }
std::unordered_set<std::string>& downstreamNodes() { return downstream_nodes_; }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

not set, faults are injected for all downstream nodes. Downstream node name is taken from
:ref:`the HTTP x-envoy-downstream-service-node <config_http_conn_man_headers_downstream-service-node>`
header and compared against downstream_nodes list.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect at some point this is going to need to be made regex capable to be generally useful. I would add a TODO for that. I think it's fine to do this in v1. We can possibly add when we add runtime support.

}

if (!matchesDownstreamNodes(headers)) {
return FilterHeadersStatus::Continue;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with this. @RomanDzhabarov for our use, can't we just use header match and then just add runtime control to whether we match on headers or not? Seems like there should be some compromise here that doesn't do two separate things.

@RomanDzhabarov RomanDzhabarov merged commit 89e71b6 into master Jun 8, 2017
@RomanDzhabarov RomanDzhabarov deleted the filter branch June 12, 2017 19:34
jpsim pushed a commit that referenced this pull request Nov 28, 2022
Signed-off-by: Alan Chiu <achiu@lyft.com>

For an explanation of how to fill out the fields, please see the relevant section
in [PULL_REQUESTS.md](https://github.com/envoyproxy/envoy/blob/master/PULL_REQUESTS.md)

Description: release: 0.3.1.08242020
Risk Level: low
Testing: ci
Docs Changes: n/a
Release Notes: n/a
[Optional Fixes #Issue]
[Optional Deprecated:]

Signed-off-by: JP Simard <jp@jpsim.com>
jpsim pushed a commit that referenced this pull request Nov 29, 2022
Signed-off-by: Alan Chiu <achiu@lyft.com>

For an explanation of how to fill out the fields, please see the relevant section
in [PULL_REQUESTS.md](https://github.com/envoyproxy/envoy/blob/master/PULL_REQUESTS.md)

Description: release: 0.3.1.08242020
Risk Level: low
Testing: ci
Docs Changes: n/a
Release Notes: n/a
[Optional Fixes #Issue]
[Optional Deprecated:]

Signed-off-by: JP Simard <jp@jpsim.com>
mathetake added a commit that referenced this pull request Mar 3, 2026
**Description**

Follow up on #1020

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants