Skip to content

Conversation

@khushboobhatia01
Copy link
Contributor

Race condition causing orquesta workflow status to change to running after it completed successfully. To avoid this incorrect status update, skipping execution object update is execution is already in completed state.

More details about issue: #5222

@pull-request-size pull-request-size bot added the size/XS PR that changes 0-9 lines. Quick fix/merge. label Jun 19, 2021
Copy link
Contributor

@m4dcoder m4dcoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I don't completely recall but I think there may be some edge cases impacted by this. One test case I can think of is with items task in which one of the executions for an item failed, this leads to the workflow execution going into failed state. However, items that are still in progress are allowed to update result into the execution. I suggest getting CI jobs here to completely pass so we know it is not impacting other documented test cases.
  2. Please make sure there's unit test associated with your change.
  3. Have you tested this change locally on your workflows? Did it completely resolve?

@khushboobhatia01
Copy link
Contributor Author

Please find response inline @m4dcoder

  1. I don't completely recall but I think there may be some edge cases impacted by this. One test case I can think of is with items task in which one of the executions for an item failed, this leads to the workflow execution going into failed state. However, items that are still in progress are allowed to update result into the execution. I suggest getting CI jobs here to completely pass so we know it is not impacting other documented test cases.

----> All the test cases are fixed now. The one-off edge case you mentioned regarding with-items failure is verified with test_with_items_failure test case which is running fine.
Screenshot 2021-07-14 at 6 54 21 PM

  1. Please make sure there's unit test associated with your change.

----> Unit test case added st2common/tests/unit/test_executions_util.py

  1. Have you tested this change locally on your workflows? Did it completely resolve?

----> Yes, I've verified the changes locally. I've setup an environment where I'm able to reproduce the race condition.
Below screenshot is without my changes. You can see in the execution logs, status changed from running -> succeeded -> running.
Screenshot 2021-07-14 at 7 19 08 PM

With my changes, you can see the status not changing to running after it succeeded. Also, added screenshot of log line when execution update was skipped.
Screenshot 2021-07-14 at 7 04 37 PM
Screenshot 2021-07-14 at 7 05 37 PM

Copy link
Contributor

@m4dcoder m4dcoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything passes. But I have some more questions, specifically the patching in the unit tests.

@m4dcoder m4dcoder added this to the 3.6.0 milestone Jul 27, 2021
Copy link
Contributor

@m4dcoder m4dcoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I don't have any issue with the changes since all unit and integrate tests pass and the concern address from the change is valid. I requested some minor cleanups. I will approve when cleanups are addressed and everything goes green.

@pull-request-size pull-request-size bot added size/L PR that changes 100-499 lines. Requires some effort to review. and removed size/XS PR that changes 0-9 lines. Quick fix/merge. labels Jul 27, 2021
@khushboobhatia01
Copy link
Contributor Author

Overall I don't have any issue with the changes since all unit and integrate tests pass and the concern address from the change is valid. I requested some minor cleanups. I will approve when cleanups are addressed and everything goes green.

@m4dcoder All review comments addressed. Please approve and merge.

Copy link
Contributor

@m4dcoder m4dcoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there, still missing some cleanups. Can you check the unit tests that you've refactored and make sure there are no duplicate query/asserts on the live action and action execution? If there are asserts that are out of place after the cleanup, please move them closer to where they should be checked (i.e. move live action result check closer to where the status was checked).

@khushboobhatia01
Copy link
Contributor Author

Almost there, still missing some cleanups. Can you check the unit tests that you've refactored and make sure there are no duplicate query/asserts on the live action and action execution? If there are asserts that are out of place after the cleanup, please move them closer to where they should be checked (i.e. move live action result check closer to where the status was checked).

@m4dcoder All test case cleanups are addressed.

Copy link
Contributor

@m4dcoder m4dcoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L PR that changes 100-499 lines. Requires some effort to review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants