Skip to content

Fix close_connections_on_host_health_failure#11241

Merged
snowp merged 5 commits into
envoyproxy:masterfrom
transferwise:fix-tcp-close-on-health-fail
Jun 15, 2020
Merged

Fix close_connections_on_host_health_failure#11241
snowp merged 5 commits into
envoyproxy:masterfrom
transferwise:fix-tcp-close-on-health-fail

Conversation

@JonathanO
Copy link
Copy Markdown
Contributor

close_connections_on_host_health_failure was disabled by the refactoring to
make tcp_proxy use the TCP connection pool implementation.

To fix it we:

  • Add a method to immediately close all active connections on a TCP conn pool.
  • Invoke this method instead of drainConnections() if close_connections_on_host_health_failure is set.

Additional Notes:
The breakage was introduced in the release immediately after the flag was introduced. It has been broken for some time (since v1.8). This fix may present some risk to users who currently have the flag enabled even though it doesn't work as documented, as the observed behaviour will change.

Risk Level: Low
Testing: Manual
Docs Changes: None

close_connections_on_host_health_failure was disabled by the refactoring to
make tcp_proxy use the TCP connection pool implementation.

To fix it we:
* Add a method to immediately close all active connections on a TCP conn pool.
* Invoke this method instead of drainConnections() if close_connections_on_host_health_failure is set.

Signed-off-by: Jonathan Oddy <jonathan.oddy@transferwise.com>
@jmarantz
Copy link
Copy Markdown
Contributor

/retest

@repokitteh-read-only
Copy link
Copy Markdown

🤷‍♀️ nothing to rebuild.

🐱

Caused by: a #11241 (comment) was created by @jmarantz.

see: more, trace.

@jmarantz
Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s), but failed to run 2 pipeline(s).

Copy link
Copy Markdown
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. Do you think you could come up with an integration test that verifies this behavior? Mostly so we can prevent this kind of regression from happening again.

@snowp snowp added the waiting label May 21, 2020
@stale
Copy link
Copy Markdown

stale Bot commented May 30, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale Bot added the stale stalebot believes this issue/PR has not been touched recently label May 30, 2020
@stale
Copy link
Copy Markdown

stale Bot commented Jun 6, 2020

This pull request has been automatically closed because it has not had activity in the last 14 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale Bot closed this Jun 6, 2020
Test to show connections remain open if the flag is false.
Test to show connections are closed if the flag is true.

Signed-off-by: Jonathan Oddy <jonathan.oddy@transferwise.com>
@JonathanO
Copy link
Copy Markdown
Contributor Author

@snowp Sorry for the delay, I've added some tests to the branch.

@snowp snowp reopened this Jun 10, 2020
@stale stale Bot removed the stale stalebot believes this issue/PR has not been touched recently label Jun 10, 2020
Signed-off-by: Jonathan Oddy <jonathan.oddy@transferwise.com>
Copy link
Copy Markdown
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding a test! One question

@ggreenway mind taking a look as well?

Comment thread source/common/upstream/cluster_manager_impl.cc
Comment thread test/integration/tcp_proxy_integration_test.cc Outdated
Copy link
Copy Markdown
Member

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, aside from the comment and semicolon

Signed-off-by: Jonathan Oddy <jonathan.oddy@transferwise.com>
Signed-off-by: Jonathan Oddy <jonathan.oddy@transferwise.com>
Copy link
Copy Markdown
Member

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

Copy link
Copy Markdown
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@snowp snowp merged commit 12a4bc4 into envoyproxy:master Jun 15, 2020
yashwant121 pushed a commit to yashwant121/envoy that referenced this pull request Jun 24, 2020
close_connections_on_host_health_failure was disabled by the refactoring to
make tcp_proxy use the TCP connection pool implementation.

To fix it we:
* Add a method to immediately close all active connections on a TCP conn pool.
* Invoke this method instead of drainConnections() if close_connections_on_host_health_failure is set.

Signed-off-by: Jonathan Oddy <jonathan.oddy@transferwise.com>
Signed-off-by: yashwant121 <yadavyashwant36@gmail.com>
yashwant121 pushed a commit to yashwant121/envoy that referenced this pull request Jun 24, 2020
close_connections_on_host_health_failure was disabled by the refactoring to
make tcp_proxy use the TCP connection pool implementation.

To fix it we:
* Add a method to immediately close all active connections on a TCP conn pool.
* Invoke this method instead of drainConnections() if close_connections_on_host_health_failure is set.

Signed-off-by: Jonathan Oddy <jonathan.oddy@transferwise.com>
Signed-off-by: yashwant121 <yadavyashwant36@gmail.com>
yashwant121 pushed a commit to yashwant121/envoy that referenced this pull request Jun 24, 2020
close_connections_on_host_health_failure was disabled by the refactoring to
make tcp_proxy use the TCP connection pool implementation.

To fix it we:
* Add a method to immediately close all active connections on a TCP conn pool.
* Invoke this method instead of drainConnections() if close_connections_on_host_health_failure is set.

Signed-off-by: Jonathan Oddy <jonathan.oddy@transferwise.com>
Signed-off-by: yashwant121 <yadavyashwant36@gmail.com>
songhu pushed a commit to songhu/envoy that referenced this pull request Jun 25, 2020
close_connections_on_host_health_failure was disabled by the refactoring to
make tcp_proxy use the TCP connection pool implementation.

To fix it we:
* Add a method to immediately close all active connections on a TCP conn pool.
* Invoke this method instead of drainConnections() if close_connections_on_host_health_failure is set.

Signed-off-by: Jonathan Oddy <jonathan.oddy@transferwise.com>
yashwant121 pushed a commit to yashwant121/envoy that referenced this pull request Jul 24, 2020
close_connections_on_host_health_failure was disabled by the refactoring to
make tcp_proxy use the TCP connection pool implementation.

To fix it we:
* Add a method to immediately close all active connections on a TCP conn pool.
* Invoke this method instead of drainConnections() if close_connections_on_host_health_failure is set.

Signed-off-by: Jonathan Oddy <jonathan.oddy@transferwise.com>
Signed-off-by: yashwant121 <yadavyashwant36@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants