Possible bug - coredump hangs

We had a production outage some time ago but reporting this here got lost during other tasks.
I pinpointed the issue to a big number of running coredump-composer processes that never finished processing the coredump. This in turn lead to the crashed process being kept alive (they can't be killed until the coredump handling finished) and in turn containerd becoming very unhappy about pods not terminating.

During the outage I unfortunately didn't debug further where the composer was stuck, but I could see the processes clearly running for a long time not doing active work.

My suggestion would be extending the coredump composer to have an upper bound of processing time after which it terminates itself to prevent those stuck processes staying around and potentially guarding the different parts of the file creation process with timeouts (I'm imagining crictl being stuck, but we have already written out the dump itself and I'd rather have it without lots of extra info compared to not having it at all).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible bug - coredump hangs #110

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Possible bug - coredump hangs #110

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions