-
Notifications
You must be signed in to change notification settings - Fork 3k
AWS, Core: Use Awaitility instead of Thread.sleep #8804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
010f36b to
7b2c7ac
Compare
|
@nastra , Please take a look. |
| } | ||
|
|
||
| /** wait for fixed duration */ | ||
| public static void delayInMilliseconds(String message, long delay, boolean useSameThread) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to use Awaitility to just wait a certain amount of time for a true condition to happen
| @Test | ||
| public void measureRunnable() { | ||
| Timer timer = new DefaultTimer(TimeUnit.NANOSECONDS); | ||
| Runnable runnable = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the goal here is actually to measure the duration of the runnable, so this isn't something that we want to replace with Awaitility
| "wait for IAM role policy to update.", | ||
| 1000, | ||
| 1001, | ||
| 10001, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the parameters are rather confusing. It would be better to use Awaitility directly here
| 10001, | ||
| TimeUnit.MILLISECONDS, | ||
| () -> | ||
| iam.getRolePolicy( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is that the check required whether IAM is consistent?
| } catch (InterruptedException e) { | ||
| throw new RuntimeException(e); | ||
| } | ||
| TestHelpers.delayInMilliseconds( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this goes back to an earlier comment I had on #8715 (comment). We don't just want to replace Thread.sleep() usage blindly with Awaitility by waiting the same amount of time. The important piece is that we'd need to understand what kind of condition the test is trying to eventually reach, which we can then check by using Awaitility (rather than just sleeping X seconds).
I've opened #8853 and #8852 to give an idea how that might look for other places in the code
nastra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this goes back to an earlier comment I had on #8715 (comment). We don't just want to replace Thread.sleep() usage blindly with Awaitility by waiting the same amount of time. The important piece is that we'd need to understand what kind of condition a particular test is trying to eventually reach, which we can then check by using Awaitility (rather than just sleeping X seconds).
I've opened #8853 and #8852 to give an idea how that might look for other places in the code.
That being said, I think the changes in TestHelpers should be reverted
| final int currentFilesCount = numCommittedFiles; | ||
| Awaitility.await() | ||
| .pollInterval(Duration.ofMillis(10)) | ||
| .atMost(Duration.ofMillis(1000)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .atMost(Duration.ofMillis(1000)) | |
| .atMost(Duration.ofSeconds(1)) |
| Thread.sleep(IAM_PROPAGATION_DELAY); // sleep to make sure IAM up to date | ||
| private static void waitForIamConsistency(String roleName, String policyName) { | ||
| Awaitility.await() | ||
| .pollDelay(Duration.ofMillis(1000)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .pollDelay(Duration.ofMillis(1000)) | |
| .pollDelay(Duration.ofSeconds(1)) |
| .build()) | ||
| .roleName() | ||
| .equalsIgnoreCase(roleName)); | ||
| ; // wait to make sure IAM up to date |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that seeems like a weird formatting. Maybe just put the comment at the top in L363
| Awaitility.await() | ||
| .pollInterval(Duration.ofMillis(10)) | ||
| .atMost(Duration.ofMillis(1000)) | ||
| .until(() -> !(barrier.get() < currentFilesCount * 2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .until(() -> !(barrier.get() < currentFilesCount * 2)); | |
| .until(() -> barrier.get() >= currentFilesCount * 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for the other places
| private void waitForIamConsistency() throws Exception { | ||
| Thread.sleep(10000); // sleep to make sure IAM up to date | ||
| private void waitForIamConsistency() { | ||
| Awaitility.await("wait for IAM role policy to update.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amogh-jahagirdar do you know if that would work for the AWS integration tests?
|
|
||
| import static java.util.concurrent.Executors.newFixedThreadPool; | ||
| import static java.util.concurrent.TimeUnit.SECONDS; | ||
| import static org.apache.iceberg.TestHelpers.waitUntilAfter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes on this file need to be reverted because we actually want to sleep in those tests
| final int currentFilesCount = numCommittedFiles; | ||
| Awaitility.await() | ||
| .pollInterval(Duration.ofMillis(10)) | ||
| .pollInSameThread() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this one here but not in the other places? Also this should have an atMost() configuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only this test was becoming flaky with Awaitility . I think adding pollInSameThread for all the similar tests is safe. So I have added pollInSameThread and atMost with 5 minutes to all the other similar tests.
wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nk1506 I think the question was more around what's the issue if pollInSameThread is not used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amogh-jahagirdar , My intent was to use the same test thread here. Since we have multiple runnable tasks here. I don't think it was flaky because of not using pollInSameThread() .
Please share your thoughts about adding pollInSameThread() here is a good idea or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then we need to figure out why the test was flaky. Was it occassionally failing? The purpose of using Awaitility is to make tests more robust and less flaky, so we need to figure out why this is flaky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If timeout is less(<10 seconds) it's failing randomly. To safe side, I have added timeout as 5 minutes .atMost(Duration.ofSeconds(300)) .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm -1 on having this run 5 minutes. Also I don't think we should be using pollInSameThread(), so please remove all of those. I ran those tests locally with the 10 seconds timeout consistently and they were passing
7577141 to
e571385
Compare
| Awaitility.await("wait for IAM role policy to update.") | ||
| .pollDelay(Duration.ofSeconds(1)) | ||
| .atMost(Duration.ofSeconds(10)) | ||
| .until( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be untilAsserted. The inner block would then be assertThat(iam.getRolePolicy(...)).isNotNull()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you also might need to specify .ignoreExceptions() as otherwise the API can throw a NoSuchEntityException when the role doesn't exist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially planned to add .ignoreExceptions(). But as per there javadoc
Instruct Awaitility to ignore all exceptions that occur during evaluation. Exceptions will be treated as evaluating to false. This is useful in situations where the evaluated conditions may temporarily throw exceptions..
I thought it wont make any difference.
Thanks for recommending this.
| .atMost(Duration.ofSeconds(10)) | ||
| .until( | ||
| () -> | ||
| lakeformation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the same thing @nastra pointed out https://github.com/apache/iceberg/pull/8804/files#r1371698377 applies here as well.
| final int currentFilesCount = numCommittedFiles; | ||
| Awaitility.await() | ||
| .pollInterval(Duration.ofMillis(10)) | ||
| .pollInSameThread() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nk1506 I think the question was more around what's the issue if pollInSameThread is not used?
db71fe1 to
07885ee
Compare
| Awaitility.await() | ||
| .pollDelay(Duration.ofSeconds(1)) | ||
| .atMost(Duration.ofSeconds(10)) | ||
| .until( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should do the same untilAsserted() check that I mentioned earlier
| DescribeResourceRequest.builder().resourceArn(arn).build()) | ||
| .resourceInfo() | ||
| .roleArn() | ||
| .equalsIgnoreCase(lfRegisterPathRoleArn)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should do .isEqualToIgnoringCase(lfRegisterPathRoleArn)) rather than isTrue()
| Awaitility.await() | ||
| .pollDelay(Duration.ofSeconds(1)) | ||
| .atMost(Duration.ofSeconds(10)) | ||
| .until( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be consistent with other checks
37258a8 to
daea635
Compare
| // aggregated | ||
| Awaitility.await() | ||
| .atMost(Duration.ofMillis(500)) | ||
| .atMost(Duration.ofSeconds(10)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It started becoming flaky it seems.
Ref: https://github.com/apache/iceberg/actions/runs/6678213413/job/18149016719
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this shouldn't be 10 seconds, but maybe rather 1-3 seconds. The old version of the code was relying on something around 500 millis
| // aggregated | ||
| Awaitility.await() | ||
| .atMost(Duration.ofMillis(500)) | ||
| .atMost(Duration.ofSeconds(10)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
| // aggregated | ||
| Awaitility.await() | ||
| .atMost(Duration.ofMillis(500)) | ||
| .atMost(Duration.ofSeconds(10)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
| // aggregated | ||
| Awaitility.await() | ||
| .atMost(Duration.ofMillis(500)) | ||
| .atMost(Duration.ofSeconds(10)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
|
LGTM, thanks @nk1506 |
Fixes #7154