Skip to content

Upstart service can create orphaned processes #363

@olivielpeau

Description

@olivielpeau

Since the upstart service tracks the first process that it starts (i.e. the bin/agent/agent shell script) and not the actual agent runtime, it can orphan an agent process and then start a new one.

This typically happens like this:

  1. upstart tries to stop the agent (SIGTERM)
  2. the agent takes a long time to stop (more than the default 5 seconds upstart waits for)
  3. upstart SIGKILLs the agent (so, it actually kills the "shell script process")
  4. upstart cleans the pid and sock files, so the next agent service start launches an agent as if no other agent were running.

On the machine where this took place, no shared resource (port, etc) was still used by the orphaned process except for the go expvar port, which explains why the new agent process could start without issues. Could be a hint to explain what was preventing the orphaned process from exiting cleanly...

Needs further investigation

  • why the agent runtime process takes a long time to stop/doesn't stop at all

Potential solutions

  • Make upstart sigkill the child processes too when they don't stop in time (not sure it's possible)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions