Skip to content

feat: tune observability log level#610

Merged
sam701 merged 2 commits intomainfrom
alexei/tune-observability-log-level
Apr 24, 2025
Merged

feat: tune observability log level#610
sam701 merged 2 commits intomainfrom
alexei/tune-observability-log-level

Conversation

@sam701
Copy link
Contributor

@sam701 sam701 commented Apr 23, 2025

Log observability tracing events with log level DEBUG to have the INFO log stream clean.

Close #605

@sam701 sam701 requested a review from sanderpick April 23, 2025 08:39
Copy link
Contributor

@sanderpick sanderpick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Comment on lines +176 to +178
if tracing::enabled!(Level::DEBUG) {
nc.enable_actor_debugging();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perfect

@sam701 sam701 merged commit b2d5c4c into main Apr 24, 2025
13 checks passed
@sam701 sam701 deleted the alexei/tune-observability-log-level branch April 24, 2025 07:29
maksmore pushed a commit that referenced this pull request Jun 18, 2025
This took me forever to diagnose. After testing many different
configurations (diff fendermint configs, diff commit refs, old localnet
vs. new localnet, node restart sequences), I saw that changing the
fendermint log level on a live network lead to a consensus failure...
which is, super weird and didn't make any sense. I finally realized this
is related to actor debugging.

The actor debug flag cannot change on a live network due to how
debugging works with the FVM / Wasmtime. Since WASM modules can't
actually log to stdout/err, they are returned via syscall, which effects
the respond data for a function call. The upshot is that changing
to/from actor debugging on a live network leads to non-determinism and
consensus failures. This can now happen when the fendermint log level
goes from `info` to `debug` or `debug` to `info`, due to this change
which ties the actor debugging flag to the fendermint log level
introduced in #610. I hysterically
commented "perfect" on this change :)

Initial consensus failure:
```
E[2025-06-13|05:39:34.296] CONSENSUS FAILURE!!!                         module=consensus err="+2/3 committed an invalid block: wrong Block.Header.LastResultsHash.  Expected 165B026A01F89BE3D648285701BA0A5E679050E0C1ADE78B4F0D124897F56CB4, got D6372018325A08E8AF1D65C8C63283783DADC5EA5E90B8C141F4B10868729B32" stack="goroutine 2626 [running]:\nruntime/debug.Stack()\n\truntime/debug/stack.go:24 +0x5e\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine.func2()\n\tgithub.com/cometbft/cometbft/consensus/state.go:737 +0x46\npanic({0xf78a00?, 0xc001b0cca0?})\n\truntime/panic.go:770 +0x132\ngithub.com/cometbft/cometbft/consensus.(*State).finalizeCommit(0xc000325888, 0x333)\n\tgithub.com/cometbft/cometbft/consensus/state.go:1651 +0xfb3\ngithub.com/cometbft/cometbft/consensus.(*State).tryFinalizeCommit(0xc000325888, 0x333)\n\tgithub.com/cometbft/cometbft/consensus/state.go:1620 +0x2e8\ngithub.com/cometbft/cometbft/consensus.(*State).handleCompleteProposal(0xc000325888, 0x333)\n\tgithub.com/cometbft/cometbft/consensus/state.go:2016 +0x372\ngithub.com/cometbft/cometbft/consensus.(*State).handleMsg(0xc000325888, {{0x131e3e0, 0xc0027a4ff0}, {0xc00035a810, 0x28}})\n\tgithub.com/cometbft/cometbft/consensus/state.go:847 +0x178\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine(0xc000325888, 0x0)\n\tgithub.com/cometbft/cometbft/consensus/state.go:773 +0x3f1\ncreated by github.com/cometbft/cometbft/consensus.(*State).OnStart in goroutine 2568\n\tgithub.com/cometbft/cometbft/consensus/state.go:384 +0x10c\n"
```

Later you will see:
```
E[2025-06-13|15:39:23.489] Error in validation                          module=blockchain err="wrong Block.Header.LastResultsHash.  Expected CBE836990A8F3B39C27F40D7A7D39240064170C99BA8E7315B6A9E802AD10B3E, got F56F3216717F0641D8B8720918A4B425448E2B3F25BE20F43D2E544DF526DAE7"
```

Ideally, we could somehow alter this flag without rebuilding fendermint,
but we need to think about how to do that in a way that's non
"configuration".

Signed-off-by: Sander Pick <sanderpick@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reduce log level of observability logs

2 participants