-
Notifications
You must be signed in to change notification settings - Fork 11
Shut down gracefully on SIGTERM. #136
Changes from all commits
11e36d5
6a0a2c5
a580e7a
be3ec2c
7071958
9d43060
2ffea56
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -14,11 +14,59 @@ | |
| * limitations under the License. | ||
| **/ | ||
|
|
||
| #include <csignal> | ||
| #include <cstdlib> | ||
| #include <initializer_list> | ||
| #include <iostream> | ||
|
|
||
| #include "agent.h" | ||
| #include "configuration.h" | ||
| #include "docker.h" | ||
| #include "instance.h" | ||
| #include "kubernetes.h" | ||
| #include "time.h" | ||
|
|
||
| namespace google { | ||
| namespace { | ||
|
|
||
| class CleanupState { | ||
| public: | ||
| CleanupState( | ||
| std::initializer_list<MetadataUpdater*> updaters, MetadataAgent* server) | ||
| : updaters_(updaters), server_(server) { server_wait_mutex_.lock(); } | ||
|
|
||
| void StartShutdown() const { | ||
| std::cerr << "Stopping server" << std::endl; | ||
| server_->Stop(); | ||
| std::cerr << "Stopping updaters" << std::endl; | ||
| for (MetadataUpdater* updater : updaters_) { | ||
| updater->NotifyStop(); | ||
| } | ||
| server_wait_mutex_.unlock(); | ||
| // Give the notifications some time to propagate. | ||
| std::this_thread::sleep_for(time::seconds(0.1)); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this guaranteed to be enough time?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Empirically, smaller delays were also sufficient, as this just needs enough time for the thread to notice the timer unlock notification and exit the loop. For poller threads, even if it doesn't, nothing bad is going to happen, so I hesitate to introduce a larger wait here.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems to me, that unlocking the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unlocking
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SGTM |
||
| } | ||
|
|
||
| void Wait() const { | ||
| std::lock_guard<std::mutex> await_server_shutdown(server_wait_mutex_); | ||
| } | ||
|
|
||
| private: | ||
| mutable std::mutex server_wait_mutex_; | ||
| std::vector<MetadataUpdater*> updaters_; | ||
| MetadataAgent* server_; | ||
| }; | ||
| const CleanupState* cleanup_state; | ||
|
|
||
| } // namespace | ||
|
|
||
| extern "C" [[noreturn]] void handle_sigterm(int signum) { | ||
| std::cerr << "Caught SIGTERM; shutting down" << std::endl; | ||
| google::cleanup_state->StartShutdown(); | ||
| std::cerr << "Exiting" << std::endl; | ||
| std::exit(0); // SIGTERM means graceful shutdown, so report success. | ||
| } | ||
|
|
||
| int main(int ac, char** av) { | ||
| google::Configuration config; | ||
|
|
@@ -33,9 +81,17 @@ int main(int ac, char** av) { | |
| google::DockerUpdater docker_updater(config, server.mutable_store()); | ||
| google::KubernetesUpdater kubernetes_updater(config, server.health_checker(), server.mutable_store()); | ||
|
|
||
| instance_updater.start(); | ||
| docker_updater.start(); | ||
| kubernetes_updater.start(); | ||
| google::cleanup_state = new google::CleanupState( | ||
| {&instance_updater, &docker_updater, &kubernetes_updater}, | ||
| &server); | ||
| std::signal(SIGTERM, handle_sigterm); | ||
|
|
||
| instance_updater.Start(); | ||
| docker_updater.Start(); | ||
| kubernetes_updater.Start(); | ||
|
|
||
| server.Start(); | ||
|
|
||
| server.start(); | ||
| // Wait for the server to shut down. | ||
| google::cleanup_state->Wait(); | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have Stop in this destructor, should we also call stop in MetadataAgent's destructor for consistency? I'm primary concerned about the inconsistency, I'm not sure what the negative effects would be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MetadataAgent's destructor will deallocate both the API server and the reporter, which will invoke their respective destructors.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused, why are we calling Stop from MetadataAgent, if we're relying on the destructor? I may not be clear, but it seems confusing that stop gets propagated through multiple channels simultaneously.
https://github.com/Stackdriver/metadata-agent/pull/136/files#diff-61b93c57ea92f91ec66fdd4a280d8e8bR40
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stop()is idempotent. It's just a notification under the covers, so it's ok to call it more than once. Calling it from the destructor guarantees that the server will also shut down cleanly when the object is deleted.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM