Skip to content

config, server: Add graceful shutdown option#20649

Merged
tiancaiamao merged 8 commits into
masterfrom
unknown repository
Nov 3, 2020
Merged

config, server: Add graceful shutdown option#20649
tiancaiamao merged 8 commits into
masterfrom
unknown repository

Conversation

@ghost
Copy link
Copy Markdown

@ghost ghost commented Oct 26, 2020

What problem does this PR solve?

Issue Number: Fixes #20619

Problem Summary:

Shutting down the server immediately is problematic in load balancing scenarios. The load balancer may not detect the instance has shutdown fast enough and continue to route stray requests. Requests already in flight may also be terminated.

This PR adds a server configuration option to start returning "unhealthy" to the load balancer health check. It will wait N seconds before starting the shutdown procedure (default: 0, maintains compatibility).

What is changed and how it works?

What's Changed:

A configuration option has been added for GracefulWaitBeforeShutdown.

How it Works:

If GracefulWaitBeforeShutdown > 0, the server will set the load balancer status to unhealthy and then wait this many seconds before starting the shutdown procedure.

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test
  • Manual test (add detailed scripts or steps below)

Side effects

  • Slower test suite (it needs to sleep 2-3 seconds to realistically test the feature).

Release note

  • TiDB now supports the ability to gracefully remove instances from behind a load balancer. The configuration setting graceful-wait-before-shutdown allows you to specify the number of seconds to wait before starting the shutdown procedure.

@ghost ghost added the component/server label Oct 27, 2020
@ghost ghost requested a review from tiancaiamao October 27, 2020 05:33
Copy link
Copy Markdown
Contributor

@lysu lysu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Oct 27, 2020
@tiancaiamao
Copy link
Copy Markdown
Contributor

LGTM

@ti-srebot ti-srebot removed the status/LGT1 Indicates that a PR has LGTM 1. label Oct 27, 2020
ti-srebot
ti-srebot previously approved these changes Oct 27, 2020
@ti-srebot ti-srebot added the status/LGT2 Indicates that a PR has LGTM 2. label Oct 27, 2020
@tiancaiamao
Copy link
Copy Markdown
Contributor

/merge

@ti-srebot ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label Oct 27, 2020
@ti-srebot
Copy link
Copy Markdown
Contributor

/run-all-tests

@ti-srebot
Copy link
Copy Markdown
Contributor

@nullnotnil merge failed.

@tennix
Copy link
Copy Markdown
Member

tennix commented Oct 27, 2020

Can we cherry-pick this into release-4.0 branch, so users can use this in the next release version?

Comment thread config/config.go Outdated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using time.Duration?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I use time.Duration it will imply the value in config.toml is in nanoseconds. For an example, see max-batch-wait-time. I think because the graceful time is usually quite high (10s+) it will look quite strange in nanoseconds.

There are also some other time units which are specified as string (example: lease).

@ghost
Copy link
Copy Markdown
Author

ghost commented Oct 27, 2020

/run-all-tests

@ghost
Copy link
Copy Markdown
Author

ghost commented Oct 27, 2020

/run-all-tests

1 similar comment
@ghost
Copy link
Copy Markdown
Author

ghost commented Oct 27, 2020

/run-all-tests

@ghost
Copy link
Copy Markdown
Author

ghost commented Oct 27, 2020

/run-unit-test

2 similar comments
@ghost
Copy link
Copy Markdown
Author

ghost commented Oct 27, 2020

/run-unit-test

@zhouqiang-cl
Copy link
Copy Markdown
Contributor

/run-unit-test

july2993 added a commit to july2993/tidb-operator that referenced this pull request Oct 28, 2020
ref pingcap/tidb#20649
Add this config support like this:
```
tidb:
  readinessProbe:
  statusAPI: {}
```
This will make tidb pod use Exec handle to run curl probe the url 127.0.0.1:10080/status
Default still use TCPPPort(4000) for compatibility
july2993 added a commit to july2993/tidb-operator that referenced this pull request Oct 28, 2020
ref pingcap/tidb#20649
Add this config support like this:
```
tidb:
  readinessProbe:
  statusAPI: {}
```
This will make tidb pod use Exec handle to run curl probe the url 127.0.0.1:10080/status
Default still use TCPPPort(4000) for compatibility
@ghost
Copy link
Copy Markdown
Author

ghost commented Oct 28, 2020

/run-unit-test

@ghost
Copy link
Copy Markdown
Author

ghost commented Oct 28, 2020

/run-unit-test

@qw4990
Copy link
Copy Markdown
Contributor

qw4990 commented Oct 30, 2020

/merge

@ti-srebot
Copy link
Copy Markdown
Contributor

/run-all-tests

@ti-srebot
Copy link
Copy Markdown
Contributor

@nullnotnil merge failed.

@ghost
Copy link
Copy Markdown
Author

ghost commented Oct 30, 2020

It looks like merge failed because errors.toml differs, which is caused by pingcap/tiup#867 merging. I have added it to this PR, but if another PR merges first it will likely have to fix this.

@ghost
Copy link
Copy Markdown
Author

ghost commented Oct 30, 2020

/run-all-tests

@ghost
Copy link
Copy Markdown
Author

ghost commented Oct 30, 2020

/run-unit-test

1 similar comment
@ghost
Copy link
Copy Markdown
Author

ghost commented Oct 30, 2020

/run-unit-test

@tiancaiamao
Copy link
Copy Markdown
Contributor

/merge

@ti-srebot
Copy link
Copy Markdown
Contributor

/run-all-tests

@ti-srebot
Copy link
Copy Markdown
Contributor

@nullnotnil merge failed.

@tiancaiamao
Copy link
Copy Markdown
Contributor

/run-integration-ddl-test

@tiancaiamao
Copy link
Copy Markdown
Contributor

/run-unit-test

@tiancaiamao
Copy link
Copy Markdown
Contributor

/merge

@ti-srebot
Copy link
Copy Markdown
Contributor

/run-all-tests

@ghost
Copy link
Copy Markdown
Author

ghost commented Nov 3, 2020

The tests now pass, but it's not merging because it lost approval when new code was added to fix the errors.toml issue.

@tiancaiamao tiancaiamao merged commit 3ba36dc into pingcap:master Nov 3, 2020
ti-srebot pushed a commit to ti-srebot/tidb that referenced this pull request Nov 3, 2020
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@ti-srebot
Copy link
Copy Markdown
Contributor

cherry pick to release-4.0 in PR #20814

@ghost ghost deleted the shutdown branch November 3, 2020 14:57
ti-srebot added a commit that referenced this pull request Nov 13, 2020
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/config component/server status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support graceful shutdown API

7 participants