CI: fix more flakes, move itests to GitHub (except ARM itest)#5811
Conversation
93274ef to
4f4d48d
Compare
|
Wow, all GitHub itests green on the first run 😮 |
Is this good or bad? |
I would say this is very good. Not sure why you would think it wasn't? |
carlaKC
left a comment
There was a problem hiding this comment.
Looks good, just one q about a comment I'm unsure of.
There was a problem hiding this comment.
re commit message: why does decreasing this value slow things down?
There was a problem hiding this comment.
We already had the mineBlocksSlow function that used the 50ms delay. By replacing all instances of mineBlocks with mineBlocksSlow, we make slow everything down. To reduce the amount of overall slowdown we decrease the delay from 50ms to 20ms.
There was a problem hiding this comment.
Going to update the commit message to make this more clear.
There was a problem hiding this comment.
Not really understanding this comment? would this channel get closed when chanWatchRequests is finished with it? Also are we always sure this is close not another channel policy update?
There was a problem hiding this comment.
I think it's that if the channel has already been closed here, and we send in another request, it'll end up double closing.
If the tests were previously flakey because of them running on slow test machines, it could be that they uncovered issues that only show on slow production machines. So perhaps these are missed now with github actions. I have to admit that I don't even know for sure that tests being green is caused by faster test machines. Sorry about the lack of threading, should have put my initial comment on some line. |
I think it's mainly the lack of consistent timing w/ the series of timeouts we have. When we run w/ Travis (and their potato cluster) we end up with several processes (replicated db, 2x full node, up to 6 lnd nodes in some tests), so it's understandable that we run into some CPU scheduling weirdness that causes these flakes at times. At the same time, we've also eliminated a ton of flakes over the past 2 months due to flake hunting szn. Travis as a service has really consistently degraded over the past year or so, then they have that massive security failure on top of that. We've given them enough chances to get their services together after being acquired by that PE firm IMO. |
|
I think the bigger gain here is also the restoration of all the lost developer time (sitting there baby sitting the test to restart it, odd failures w/ the machine (?) itself) due to Travis. |
|
Also worth nothing this brings in the |
There was a problem hiding this comment.
I think it's that if the channel has already been closed here, and we send in another request, it'll end up double closing.
4f4d48d to
c987f0a
Compare
|
Rebased. But still blocked by btcsuite/btcd#1752. |
c987f0a to
ef07c27
Compare
|
Interceptor tests need a |
The latest version of btcd allows its stall handler to be disabled. We use that new config option to make sure the mining btcd node and the lnd chain backend btcd node aren't disconnected if some test takes too long and no new p2p messages are exchanged.
We now redirect the mineBlocks function to the mineBlocksSlow function which waits after each mined block. To reduce the overall time impact of using that function everywhere, we only wait 20 milliseconds instead of 50ms after each mined block to give all nodes some time to process the block. This will still slow down everything by a bit but reduce flakes that are caused by different sub systems not being up-to-date.
Fixes the docker build that was caused by docker-library/postgres#884. Using the alpine and version 13 image avoids the problem introduced with postgres 14 and debian bullseye.
ef07c27 to
134be24
Compare
|
Race cond flake is new, notified OP of that new test of it, needs a |
Depends on btcsuite/btcd#1752.
Fixes two problems in the itest:
btcdand chain backendbtcdnode losing their connection because of the peer stall detection inbtcd--> We fix this by disabling stall detection