Fix flaky realtime index task tests#8999
Conversation
| @@ -88,25 +89,30 @@ void doTest() | |||
| // the task will run for 3 minutes and then shutdown itself | |||
There was a problem hiding this comment.
update comment to 10 minutes as well?
There was a problem hiding this comment.
Oops, thanks. Changed back to 5 mins since 10 mins was for testing. I think 3 mins can be too short. Updated the comment as well.
|
|
||
| // sleep for a while to let peons finish starting up | ||
| TimeUnit.SECONDS.sleep(5); | ||
| TimeUnit.SECONDS.sleep(60); |
There was a problem hiding this comment.
why 60? can we use ITRetryUtil.retryUntil... instead of making every test wait 1 minute?
There was a problem hiding this comment.
We don't have any way to check whether the task is ready to consume data or not for now. 5 seconds was too short and I think 1 min of waiting isn't that harm in a sense that our tests usually runs for 30 mins.
There was a problem hiding this comment.
In my experience, once a sleep is added, it's pretty hard to remove it. I think it would be helpful to mention in the comment that we don't know when the start up completes because there is no endpoint and if we had one, we could get rid of the sleep
Try till success, means we'd wait at most 1 minute. Right now every time our tests will run for 31 minutes instead of 30. Happy to discuss this offline - maybe it's not such a big deal if we add an extra minute to the tests
There was a problem hiding this comment.
I agree with you on adding a sleep. But RealtimeIndexTask and AppenderatorRealtimeIndexTask are sort of deprecated (in a sense that Tranquility is deprecated which is the main user of those tasks) and I don't think it's worth to spend much time for them.
Regarding the test run time, I think the most 2 common cases when the test run time matters are the CI and manual running of tests. For CI, LGTM and Teamcity are running more than an hour and Travis is not a bottleneck. For manual running of tests, I usually do something else while tests are running and adding 1 more min doesn't bother me much. What do you think?
BTW, anything happened only offline is regarded as that hasn't happened in Apache way.
Open Communications: as a virtual organization, the ASF requires all communications related to code and decision-making to be publicly accessible to ensure asynchronous collaboration, as necessitated by a globally-distributed community.
We can discuss offline, but that discussion and the conclusion must be filed somewhere public.
There was a problem hiding this comment.
Discussed offline - these tests are for parts of the codebase that is currently deprecated. They will likely be removed in the near future. Right now the bottleneck for PR checks is LGTM (90 min) So even though this adds 2 min extra - there is no visible difference to the time it takes to merge a PR.
| @Override | ||
| int getNumExpectedRowsIngested() | ||
| { | ||
| return 22; |
There was a problem hiding this comment.
Where does 22 come from?
nit: make this a constant so we can add an explanation in the javadocs? Similar comment for ITRealtimeIndexTaskTest
There was a problem hiding this comment.
Sounds good. Added.
suneet-s
left a comment
There was a problem hiding this comment.
I'd like to discuss the sleep statement offline so we can decide if any changes are needed.
|
|
||
| // sleep for a while to let peons finish starting up | ||
| TimeUnit.SECONDS.sleep(5); | ||
| TimeUnit.SECONDS.sleep(60); |
There was a problem hiding this comment.
In my experience, once a sleep is added, it's pretty hard to remove it. I think it would be helpful to mention in the comment that we don't know when the start up completes because there is no endpoint and if we had one, we could get rid of the sleep
Try till success, means we'd wait at most 1 minute. Right now every time our tests will run for 31 minutes instead of 30. Happy to discuss this offline - maybe it's not such a big deal if we add an extra minute to the tests
| private static final String REALTIME_QUERIES_RESOURCE = "/indexer/wikipedia_realtime_index_queries.json"; | ||
| /** | ||
| * The expected number of rows ingested for this test. | ||
| * The total number of rows of raw data is 22, but two rows will be rolled up into one row. |
There was a problem hiding this comment.
Pardon my ignorance, where does the raw data come from?
There was a problem hiding this comment.
Fixes #8918.
Description
Those tests have been flaky since they are optimistically waiting for tasks to ingest all data in 5 seconds. This PR modifies those tests to actually check the number of rows ingested.
This PR has:
This change is