Skip to content

itest: fix test flakes from open channel not found and tx not found in mempool#5611

Merged
guggero merged 14 commits into
lightningnetwork:masterfrom
yyforyongyu:itest-flake-chan-open
Sep 17, 2021
Merged

itest: fix test flakes from open channel not found and tx not found in mempool#5611
guggero merged 14 commits into
lightningnetwork:masterfrom
yyforyongyu:itest-flake-chan-open

Conversation

@yyforyongyu
Copy link
Copy Markdown
Member

@yyforyongyu yyforyongyu commented Aug 8, 2021

Depends #5637
This PR fixes the itest flakes from open channel not found and tx not found in mempool. FWIW, the node graph subscription is refactored so that the graph update stream is handled in the node level. We will use only one topology client to check for channel open/close/update.

Meanwhile, I think the structure of itest can be more or less defined now, we could,

  • handle all lightning node related RPC calls at HarnessNode level
  • handle miner/backend related RPC calls at NetworkHarness level
  • handle assertions in assertions.go
    Thus our test file would be better organized and the length should be greatly decreased.

In addition, improved logging is added. We can now see the node's internal state, something like,

=== RUN   TestLightningNetworkDaemon/tranche00/28-of-82/btcd/invoice_routing_hints
    test_harness.go:91: Failed: (invoice routing hints): exited with error: 
        *errors.errorString timed out waiting for channel open alice-bob: channel:f97a3494fe9face5b75a55fa9ffa5439ed3c96baaa8cec11f5b5132269a4aeee:0 not opened before timeout: 
        node state: {
            "NodeID": 0,
            "Name": "Alice",
            "PubKey": "020696e7c9f64eecc0eb65c7f02094bbe0d291376f4160a8c29846698468cd249e",
            "OpenChans": {
                "188037d1670118bfaf6d99048f38ea100de8959e29133a65c98cb3d59b1d07cd:0": 2,
                "9c2a33d872dd201436ad2ecfcd6d9a440adfbe07a916d89d6b8a7d9e2ab9ad82:0": 2,
                "a53db446aec7066117c556595fce760dc2b2b02805991739ddca96af87dde415:0": 2,
                "c1f39c2265dd1346cc2f6b0cd777e1c7ac5f7e681c0cde7e7484db76472c43f8:0": 1,
                "f97a3494fe9face5b75a55fa9ffa5439ed3c96baaa8cec11f5b5132269a4aeee:0": 1
            },
            "ClosedChans": {},
            "PolicyUpdates": {
                "188037d1670118bfaf6d99048f38ea100de8959e29133a65c98cb3d59b1d07cd:0": {
                    "02bdd10b212a52bd97909c2c3b931765a0dc72001635917c3d19933c9152dbac8c": [
                        {
                            "time_lock_delta": 40,
                            "min_htlc": 1000,
                            "fee_base_msat": 1000,
                            "fee_rate_milli_msat": 1,
                            "max_htlc_msat": 99000000
                        }
                    ]
                },
                "9c2a33d872dd201436ad2ecfcd6d9a440adfbe07a916d89d6b8a7d9e2ab9ad82:0": {
                    "03965bafe134a54b218c0e821d0200f044713c20c0bc668520d5ac0bed989684a0": [
                        {
                            "time_lock_delta": 40,
                            "min_htlc": 1000,
                            "fee_base_msat": 1000,
                            "fee_rate_milli_msat": 1,
                            "max_htlc_msat": 99000000
                        }
                    ]
                },
                "a53db446aec7066117c556595fce760dc2b2b02805991739ddca96af87dde415:0": {
                    "03173454d3e554e64653369d4ea4a491ac7a5287f5aa77b40118e072b131459a0c": [
                        {
                            "time_lock_delta": 40,
                            "min_htlc": 1000,
                            "fee_base_msat": 1000,
                            "fee_rate_milli_msat": 1,
                            "max_htlc_msat": 99000000
                        }
                    ]
                }
            },
            "NodeCfg": {
                "LogFilenamePrefix": "invoice_routing_hints",
                "ExtraArgs": [
                    "--default-remote-max-htlcs=483"
                ],
                "HasSeed": false,
                "P2PPort": 5586,
                "RPCPort": 5587,
                "RESTPort": 5589,
                "ProfilePort": 0,
                "AcceptKeySend": true,
                "AcceptAMP": false,
                "FeeURL": "http://localhost:5584/fee-estimates.json"
            }
        }
        .../lnd/lntest/itest/lnd_routing_test.go:1312 (0x1b94def)
            testInvoiceRoutingHints: t.Fatalf("timed out waiting for channel open %s: %v",
        .../lnd/lntest/itest/test_harness.go:115 (0x1b2f98e)
            (*harnessTest).RunTestCase: testCase.test(h.lndHarness, h)
        .../lnd/lntest/itest/lnd_test.go:247 (0x1bd2d4f)
            TestLightningNetworkDaemon.func4: ht.RunTestCase(testCase)
        /usr/local/Cellar/go/1.16/libexec/src/testing/testing.go:1194 (0x111980f)
            tRunner: fn(t)
        /usr/local/Cellar/go/1.16/libexec/src/runtime/asm_amd64.s:1371 (0x10717e1)
            goexit: BYTE    $0x90   // NOP

@yyforyongyu yyforyongyu force-pushed the itest-flake-chan-open branch from 563df9c to 9a7cd60 Compare August 8, 2021 10:52
@yyforyongyu yyforyongyu requested review from Roasbeef and guggero August 9, 2021 13:31
@Roasbeef Roasbeef added this to the v0.14.0 milestone Aug 9, 2021
@Roasbeef Roasbeef modified the milestones: v0.14.0, flake hunting szn Aug 9, 2021
@Roasbeef Roasbeef added the itests Issues related to integration tests. label Aug 9, 2021
@Roasbeef
Copy link
Copy Markdown
Member

Will the channel update changes (haven't dove into the PR yet) address this flake?

    test_harness.go:91: Failed: (update channel status): exited with error: 
        *errors.errorString did not receive channel update
        C:/Users/travis/gopath/src/github.com/lightningnetwork/lnd/lntest/itest/assertions.go:722 (0x15220b0)
        	waitForChannelUpdate: t.Fatalf("did not receive channel update")
        C:/Users/travis/gopath/src/github.com/lightningnetwork/lnd/lntest/itest/lnd_channel_graph_test.go:203 (0x153dade)
        	testUpdateChanStatus: waitForChannelUpdate(
        C:/Users/travis/gopath/src/github.com/lightningnetwork/lnd/lntest/itest/test_harness.go:115 (0x152eed5)
        	(*harnessTest).RunTestCase: testCase.test(h.lndHarness, h)
        C:/Users/travis/gopath/src/github.com/lightningnetwork/lnd/lntest/itest/lnd_test.go:247 (0x15d32d4)
        	TestLightningNetworkDaemon.func4: ht.RunTestCase(testCase)
        C:/Users/travis/.gimme/versions/go1.16.3.windows.amd64/src/testing/testing.go:1193 (0xb0536f)
        	tRunner: fn(t)
        C:/Users/travis/.gimme/versions/go1.16.3.windows.amd64/src/runtime/asm_amd64.s:1371 (0xa452a1)
        	goexit: BYTE	$0x90	// NOP

Copy link
Copy Markdown
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flake hunting continues!

Great set of findings here, did an initial pass over the set of changes and only found one or two things worth discussing/revising.

Comment thread routing/notifications.go Outdated
Comment thread lntest/node.go Outdated
Comment thread lntest/node.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outpoint -> makeOutpoint

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment thread lntest/node.go Outdated
Comment thread lntest/node.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the current itests we have for gossip/node updates just check the channel graph directly?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. We use describe graph iirc.

Comment thread lntest/node.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: indentation is a bit off here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Comment thread lntest/node.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

watch for a*

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed!

Comment thread lntest/node.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why send a new watch request each time vs a single one?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is an easy way to check the want vs got policies inside handleChannelEdgeUpdates. Unlike checking for open channels, where we just increment counts whenever we see a channel point and have the logic of deciding it being open or not inside the function, we have no knowledge about what policy to compare to. Plus I think it's a bit more explicit(imo) to code it this way.

Comment thread lntest/node.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the advantage of doing this here vs the graphUpdates case?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to handle the subscription and following checks in one place. And the lines of code decreased. And this is what I have in mind atm, to makeitest better,

handle all lightning node related RPC calls at HarnessNode level
handle miner/backend related RPC calls at NetworkHarness level
handle assertions in assertions.go
Thus our test file would be better organized and the length should be greatly decreased.

Comment thread server.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, looks like the very same bug fix is actually implemented in this very old PR: #1783

I wonder if we should cherry-pick it as is, as it attempts to address other known dependency startup issues like what you set out to fix here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out that pr is pretty outdated and some of the suggested changes were already made so I just added the needed changes here manually.

@yyforyongyu yyforyongyu force-pushed the itest-flake-chan-open branch 3 times, most recently from aca5ead to cc98939 Compare August 10, 2021 19:27
@yyforyongyu
Copy link
Copy Markdown
Member Author

Will the channel update changes (haven't dove into the PR yet) address this flake?

    test_harness.go:91: Failed: (update channel status): exited with error: 
        *errors.errorString did not receive channel update
        C:/Users/travis/gopath/src/github.com/lightningnetwork/lnd/lntest/itest/assertions.go:722 (0x15220b0)
        	waitForChannelUpdate: t.Fatalf("did not receive channel update")
        C:/Users/travis/gopath/src/github.com/lightningnetwork/lnd/lntest/itest/lnd_channel_graph_test.go:203 (0x153dade)
        	testUpdateChanStatus: waitForChannelUpdate(
        C:/Users/travis/gopath/src/github.com/lightningnetwork/lnd/lntest/itest/test_harness.go:115 (0x152eed5)
        	(*harnessTest).RunTestCase: testCase.test(h.lndHarness, h)
        C:/Users/travis/gopath/src/github.com/lightningnetwork/lnd/lntest/itest/lnd_test.go:247 (0x15d32d4)
        	TestLightningNetworkDaemon.func4: ht.RunTestCase(testCase)
        C:/Users/travis/.gimme/versions/go1.16.3.windows.amd64/src/testing/testing.go:1193 (0xb0536f)
        	tRunner: fn(t)
        C:/Users/travis/.gimme/versions/go1.16.3.windows.amd64/src/runtime/asm_amd64.s:1371 (0xa452a1)
        	goexit: BYTE	$0x90	// NOP

I think so.

Copy link
Copy Markdown
Collaborator

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great set of improvements! Really like how thoroughly you approach those flake fixes 💯

Just a few minor comments, looks mostly good to me!

Comment thread lntest/node.go Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use rpcPointToWirePoint in lnd_channel_backup_test.go? Or move/rename that one to this file?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Use this method in rpcPointToWirePoint instead.

Comment thread lntest/node.go Outdated
Comment thread lntest/node.go Outdated
Comment thread lntest/node.go Outdated
Comment thread lntest/node.go Outdated
Comment thread lntest/node.go Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to return here as we'd run into a nil pointer further down the line if we don't. Or pass in t to fail the test.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok the more I change the more I feel that, maybe we should use hn.AddToLog to print to console instead of the log?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the same... Because downloading the logs from Travis is a bit tedious anyway. Or we don't log but indeed start to panic in cases like this. That should make it very obvious why the itests fail...

Comment thread lntest/node.go Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to return here to make sure we don't use an invalid outpoint? Or pass in t to fail the test?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a return here.
We could perhaps use panic instead?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah, a panic could work here. Though we don't use panics anywhere else in our tests just yet.

Also nit: add space after %v.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I ended up printing the error to the console. Thought about using panic which made me panic, being afraid of it would panic where it shouldn't panic😂
Also the nits are fixed!

Comment thread lntest/node.go Outdated
@yyforyongyu yyforyongyu force-pushed the itest-flake-chan-open branch from cc98939 to 86ae217 Compare August 12, 2021 02:21
@yyforyongyu
Copy link
Copy Markdown
Member Author

Lots of the itest failed, weird, will address them later

Copy link
Copy Markdown
Collaborator

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a rebase!

Otherwise LGTM barring the discussion re return vs. panic 🎉

Comment thread lntest/node.go Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah, a panic could work here. Though we don't use panics anywhere else in our tests just yet.

Also nit: add space after %v.

Comment thread lntest/node.go Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the same... Because downloading the logs from Travis is a bit tedious anyway. Or we don't log but indeed start to panic in cases like this. That should make it very obvious why the itests fail...

@yyforyongyu yyforyongyu force-pushed the itest-flake-chan-open branch 2 times, most recently from 1e3557e to 1dec6f3 Compare August 24, 2021 02:42
@yyforyongyu yyforyongyu requested a review from Roasbeef August 24, 2021 02:55
@yyforyongyu yyforyongyu force-pushed the itest-flake-chan-open branch from 1dec6f3 to 5eeb998 Compare August 24, 2021 04:07
@guggero
Copy link
Copy Markdown
Collaborator

guggero commented Aug 24, 2021

Needs a rebase.

@yyforyongyu yyforyongyu force-pushed the itest-flake-chan-open branch 2 times, most recently from c4fbad0 to 334aec6 Compare August 24, 2021 23:39
@yyforyongyu
Copy link
Copy Markdown
Member Author

Rebased upon #5637 to see the cumulative itest results.

@yyforyongyu yyforyongyu force-pushed the itest-flake-chan-open branch from 334aec6 to 5614fad Compare August 25, 2021 01:07
@yyforyongyu yyforyongyu force-pushed the itest-flake-chan-open branch 3 times, most recently from 0681eb7 to 81be9a9 Compare September 13, 2021 23:35
@Roasbeef
Copy link
Copy Markdown
Member

Can be rebased now that the dependent PR is in!

@yyforyongyu
Copy link
Copy Markdown
Member Author

Can be rebased now that the dependent PR is in!

Done! Now pending the build results!

Copy link
Copy Markdown
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ✏️

Just needs a rebase and this can land, death to flakes!

@Roasbeef Roasbeef enabled auto-merge September 16, 2021 21:57
@yyforyongyu
Copy link
Copy Markdown
Member Author

LGTM ✏️

Just needs a rebase and this can land, death to flakes!

Done!

@guggero guggero disabled auto-merge September 17, 2021 07:51
@guggero guggero merged commit 583ccfe into lightningnetwork:master Sep 17, 2021
@yyforyongyu yyforyongyu deleted the itest-flake-chan-open branch September 18, 2021 00:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

flake fix itests Issues related to integration tests. P2 should be fixed if one has time

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants