Skip to content
This repository was archived by the owner on Dec 21, 2021. It is now read-only.
This repository was archived by the owner on Dec 21, 2021. It is now read-only.

The agent does not behave correctly when services fail to react to a stop command in time #136

@soenkeliebau

Description

@soenkeliebau

It seems like the agent does not wait for commands sent to systemd to finish before continuing with subsequent actions. This can cause systemd to react with an error, if a stop command has not finished before a start command is issued.

[2021-04-22T15:06:07Z INFO  stackable_agent::provider::states::pod::creating_service] Creating service unit for service default-spark-cluster-default-master
[2021-04-22T15:06:08Z INFO  stackable_agent::provider::states::pod::starting] Starting systemd unit [default-spark-cluster-default-master-spark.service]
[2021-04-22T15:06:08Z ERROR stackable_agent::provider::states::pod::starting] Error occurred starting systemd unit [default-spark-cluster-default-master-spark.service]: [Error starting service [default-spark-cluster-default-master-spark.service]: Transaction for default-spark-cluster-default-master-spark.service/start is destructive (default-spark-cluster-default-master-spark.service has 'stop' job queued, but 'start' is included in transaction).]
[2021-04-22T15:06:08Z ERROR krator::state] Object state machine exited with error. error=Error starting service [default-spark-cluster-default-master-spark.service]: Transaction for default-spark-cluster-default-master-spark.service/start is destructive (default-spark-cluster-default-master-spark.service has 'stop' job queued, but 'start' is included in transaction).

This is most probably because the agent currently does not wait until a start/stop command has actually been processed by systemd.
An example (in C) of how this could be handled can be found here: https://jonathangold.ca/blog/waiting-for-systemd-job-to-complete/
The agent is missing the signal processing at the moment.

We will create an integration test case to reproduce this and make testing a fix easier (see stackabletech/integration-test-repo#3).

When looking at this it would probably make sense to address #139 at the same time / before getting started with this.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions