#2461 fixed an important bug, but introduced another one in a specific scenario that you can reproduce as follows:
- 2 node cluster: 1 manager and 1 worker with older swarmkit (aka before that PR)
- create a service with 5 replicas
- upgrade manager to newer (aka with that PR)
- remove service (or scaledown to 1)
The tasks are marked as REMOVED but no one is deleting the object, because the old worker doesn't understand the new REMOVED state and the manager no longer handles that.
This is caused because of the silent and then-innocuous bug in the agent code where the state was checked with == and not >=.
@aluzzardi suggests that for greater backwards compatibility, instead of changing an existing field we add a new Removed bool field that would behave similarly.
cc @stevvooe @nishanttotla @dperny @anshulpundir @marcusmartins @thaJeztah
#2461 fixed an important bug, but introduced another one in a specific scenario that you can reproduce as follows:
The tasks are marked as REMOVED but no one is deleting the object, because the old worker doesn't understand the new REMOVED state and the manager no longer handles that.
This is caused because of the silent and then-innocuous bug in the agent code where the state was checked with
==and not>=.@aluzzardi suggests that for greater backwards compatibility, instead of changing an existing field we add a new
Removed boolfield that would behave similarly.cc @stevvooe @nishanttotla @dperny @anshulpundir @marcusmartins @thaJeztah