Skip to content

Conversation

@hanishakoneru
Copy link
Contributor

What changes were proposed in this pull request?

OM HA robot tests were disable in HDDS-2533 as they were failing intermittently. HDDS-2454 fixes some issues in the HA tests. Creating this Jira so as to run re-enable the HA acceptance tests.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-2621

How was this patch tested?

CI acceptance test suit.

Copy link
Contributor

@bharatviswa504 bharatviswa504 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 pending CI.

@elek
Copy link
Member

elek commented Dec 3, 2019

OM ha test is failing in this patch:

OSError: [Errno 13] Permission denied: '/opt/hadoop/tmptWKCFH'

In github actions based runs the /opt/hadoop might not be writable. /tmp is a better choice for temporary data...

@adoroszlai
Copy link
Contributor

@hanishakoneru I tried to fix this, found the following two changes necessary:

The test still doesn't pass completely, latest run failed with copyFromLocal: Allocate any more blocks for write failed. If you have time, please feel free to pick these two changes and continue from there.

@elek
Copy link
Member

elek commented Feb 28, 2020

/pending acceptance tests are failing

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking this issue as un-mergeable as requested.

Please use /ready comment when it's resolved.

acceptance tests are failing

@hanishakoneru
Copy link
Contributor Author

/ready

@github-actions github-actions bot dismissed their stale review March 18, 2020 00:23

Blocking review request is removed.

@hanishakoneru
Copy link
Contributor Author

/pending

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking this issue as un-mergeable as requested.

Please use /ready comment when it's resolved.

/pending

@hanishakoneru
Copy link
Contributor Author

Thanks @adoroszlai and @elek.
I have incorporated your suggestions. The tests are passing locally. I will let CI run once again to see whats happening.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hanishakoneru for the update. I also confirmed that the test passes locally. Re-triggered CI check, as previously acceptance test timed out (at a much earlier point than the OM HA test, so unrelated to this change).

One question: why did you set /pending again?

@hanishakoneru
Copy link
Contributor Author

Thanks @adoroszlai for verifying.

I set to "pending" so that someone does not merge this change by mistake as there is 1 approval already :)

@xiaoyuyao
Copy link
Contributor

xiaoyuyao commented Mar 18, 2020

OM ha test is failing in this patch:

OSError: [Errno 13] Permission denied: '/opt/hadoop/tmptWKCFH'

In github actions based runs the /opt/hadoop might not be writable. /tmp is a better choice for temporary data...

I'm seeing something similar on some of the PR runs but not all even though this PR has not been merged yet, like #520 I mentioned. Can you confirm @hanishakoneru if this is the same issue.

2019-11-16T12:01:24.9714368Z Test Multiple Failovers | FAIL |
2019-11-16T12:01:24.9715403Z OSError: [Errno 13] Permission denied: '/opt/hadoop/tmpLn9PKJ'

@hanishakoneru
Copy link
Contributor Author

@xiaoyuyao, this is the same issue but I am surprised how the OM-HA tests are running when they are disabled. Can you please point to another PR where you observed this. I don't see HA tests in #520 .

@hanishakoneru hanishakoneru force-pushed the HDDS-2621 branch 2 times, most recently from 4b79844 to 81d08c3 Compare March 19, 2020 22:47
@hanishakoneru
Copy link
Contributor Author

Thank you all for the reviews and suggestions. I will merge this shortly.

@hanishakoneru hanishakoneru merged commit 924b50c into apache:master Mar 31, 2020
@elek
Copy link
Member

elek commented Apr 1, 2020

Test is failed on the master after the merge: https://github.com/apache/hadoop-ozone/actions/runs/67591808

@adoroszlai
Copy link
Contributor

Test is failed on the master after the merge: https://github.com/apache/hadoop-ozone/actions/runs/67591808

And no OM logs are available: https://issues.apache.org/jira/browse/HDDS-3311

@elek
Copy link
Member

elek commented Apr 1, 2020

Two more failures:

I hate to be the bad guy (again), and I am very sorry, but I have to revert/reopen it as it's failed 3 times from the last 6 master build.

elek added a commit that referenced this pull request Apr 1, 2020
@elek
Copy link
Member

elek commented Apr 1, 2020

Ok. Instead of revert, I just excluded the tests from the pr / daily build. We can create an other pull request and start to test it (#ce6ad30a3)

@elek
Copy link
Member

elek commented Apr 1, 2020

I will create a separated github action to make it easier to run only one acceptance test multiple times...

@elek
Copy link
Member

elek commented Apr 1, 2020

If I didn't do any mistake, the om-ha tests (and only them) will be executed twice per hour here:

https://github.com/elek/hadoop-ozone/actions?query=workflow%3Aacceptance-single

elek@da01f32

@hanishakoneru
Copy link
Contributor Author

Thanks @elek for disabling this test. These OM HA failures are very elusive. I tested multiple times before merging and they didnt fail any of those times.
Hoping that your github actions for running ha tests will tell us something. Thanks for setting that up.

@elek
Copy link
Member

elek commented Apr 2, 2020

It seems to be flaky. Got a lot of runs and 7 out of the last 25 are failed. (Yes, hard to detect this number, because you might have even 3 green builds without detecting them).

Examples:

https://github.com/elek/hadoop-ozone/runs/554176250
https://github.com/elek/hadoop-ozone/actions/runs/68771264
https://github.com/elek/hadoop-ozone/actions/runs/68745626
https://github.com/elek/hadoop-ozone/actions/runs/68694305

@hanishakoneru hanishakoneru deleted the HDDS-2621 branch December 1, 2020 21:25
vtutrinov pushed a commit to vtutrinov/ozone that referenced this pull request Oct 14, 2025
… log level to warn

Merge in SDPOZONE/component-ozone from SDPOZN-1962 to sdp-ozone-1.4

* commit '96678f24a2637e75639571088130d5b2e955b8bb':
  SDPOZN-1962. Move notLeaderException message log level to warn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants