Skip to content

Conversation

@adoroszlai
Copy link
Contributor

What changes were proposed in this pull request?

Fix intermittent test failure in TestContainerOperations. Let the test pass if any datanode usage info has more containers than SCM's view of the same container (which may be slightly outdated).

https://issues.apache.org/jira/browse/HDDS-13310

How was this patch tested?

Before: 5/100 failures.
https://github.com/adoroszlai/ozone/actions/runs/16175931905/job/45662413733

After: no failures
https://github.com/adoroszlai/ozone/actions/runs/16174065813

@adoroszlai adoroszlai self-assigned this Jul 9, 2025
@adoroszlai adoroszlai added the test label Jul 9, 2025
Comment on lines 206 to 207
assertEquals(expected, usageInfoList.get(0).getContainerCount());
assertThat(usageInfoList.get(0).getContainerCount())
.isGreaterThanOrEqualTo(expected);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it due to some datanode having outdated information, which gets updated later before the assertion kicks in? In that case can something like GenericTestUtils.waitFor or some util which triggers datanode to update the info work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test makes two different calls to SCM that each return data about containers for a specific datanode. They both use the same ultimate source of container list. SCM collects info from datanodes in the background (incremental container report and heartbeat). So if a container gets created by a previous test case, it may be added to SCM's list between the two calls.

We could add waitFor, but would need to repeat both calls, since no matter which one we choose as the reference (first call), the second call may return the same number or more.

@adoroszlai adoroszlai marked this pull request as draft July 31, 2025 07:23
@adoroszlai
Copy link
Contributor Author

The test was changed in #8677, apparently it passes now on master: https://github.com/adoroszlai/ozone/actions/runs/16642772549

Closing. Will reopen if it happens again.

Thanks @ayushtkn for the review.

@adoroszlai adoroszlai closed this Jul 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants