Improve logging: SLF4J bridge, less startup noise, fix shutdown log loss

## Background

The current Java-tron logging and startup observability has five concrete problems that hurt operator diagnosis:

1. **gRPC logs bypass Logback and go to stderr.** grpc-java uses `java.util.logging` (JUL) internally. The project already ships `jcl-over-slf4j`, but the JUL→SLF4J bridge is absent. As a result, gRPC diagnostic messages — connection resets, TLS handshake failures, etc. — never reach `tron.log`, forcing operators to watch stderr and log files simultaneously (see grpc-java#1577).
2. **DB stats flood startup at INFO.** [DbStat.statProperty()](chainbase/src/main/java/org/tron/common/storage/metric/DbStat.java) logs at INFO, burying critical initialization messages.
3. **Shutdown logs are incomplete on abnormal exits.** [TronLogShutdownHook](framework/src/main/java/org/tron/program/TronLogShutdownHook.java) waits up to 180s after SIGTERM; if the node hangs inside that window, critical shutdown-phase logs may never be flushed.
4. **Signature verification is invisible.** ECDSA/SM2 signature recovery is a CPU hotspot. A single slow verification can stall block application or mempool admission, but today there is no per-call timing signal to locate it.
5. **LevelDB startup silently blocks on an oversized MANIFEST.** In [LevelDbDataSourceImpl.openDatabase()](chainbase/src/main/java/org/tron/common/storage/leveldb/LevelDbDataSourceImpl.java:116), `factory.open()` is a JNI blocking call. When the MANIFEST file is large, the startup hangs there for minutes (or longer) with no log output. Operators cannot tell "stuck" from "recovering," and are not pointed at the Toolkit's `DbArchive` tool.

---

## Core Changes

### 1. JUL→SLF4J bridge

At the earliest point during node startup (`FullNode.main` / `SolidityNode.main`), reset JUL's default handlers and install the bridge so gRPC's JUL output is routed through SLF4J/Logback:

```java
// Remove default JUL handlers to avoid duplicate output
java.util.logging.LogManager.getLogManager().reset();
org.slf4j.bridge.SLF4JBridgeHandler.install();
```

```groovy
// build.gradle
implementation 'org.slf4j:jul-to-slf4j:1.7.36'
```

### 2. Downgrade DB stats log level

Change `log.info(...)` in [DbStat.statProperty()](chainbase/src/main/java/org/tron/common/storage/metric/DbStat.java) and similar periodic stat methods to `log.debug(...)`, cutting startup noise without losing diagnostic value on demand.

### 3. Shutdown log completeness

Introduce a `volatile boolean shuttingDown` flag in [TronLogShutdownHook](framework/src/main/java/org/tron/program/TronLogShutdownHook.java), shorten unnecessary wait intervals, and make flushing critical log buffers the first-priority step in the JVM shutdown hook.

### 4. signature-verify INFO log

Instrument [TransactionCapsule.validatePubSignature()](chainbase/src/main/java/org/tron/core/capsule/TransactionCapsule.java:636) around the [validateSignature(Transaction, ...)](chainbase/src/main/java/org/tron/core/capsule/TransactionCapsule.java:464) call. When a **single verification takes more than 10 ms**, emit an INFO log with txId, signature count, permissionId, and elapsed ms:

```java
long start = System.nanoTime();
try {
  if (!validateSignature(this.transaction, hash, accountStore, dynamicPropertiesStore)) {
    isVerified = false;
    throw new ValidateSignatureException("sig error");
  }
} finally {
  long costMs = (System.nanoTime() - start) / 1_000_000L;
  if (costMs > 10) {
    logger.info("slow sig verify: txId={}, sigCount={}, permissionId={}, cost={}ms",
        getTransactionId(), transaction.getSignatureCount(),
        transaction.getRawData().getContract(0).getPermissionId(), costMs);
  }
}
```

- The 10 ms threshold is a constant for now; it can be promoted to config later.
- Use `nanoTime()` to avoid millisecond-granularity sampling error.
- Covered hot paths: block application, mempool admission, P2P broadcast validation.

### 5. LevelDB startup watchdog

In [LevelDbDataSourceImpl.openDatabase()](chainbase/src/main/java/org/tron/common/storage/leveldb/LevelDbDataSourceImpl.java:116), wrap the JNI-blocking `factory.open(dbPath.toFile(), dbOptions)` call with a watchdog:

- Before calling `factory.open`, submit a one-shot `ScheduledExecutorService` task. If `factory.open` has **not returned within 60 s**, emit the first WARN, then repeat every 30 s.
- WARN payload: dbName, dbPath, elapsed seconds, current MANIFEST filename and size (resolved by reading the `CURRENT` file to find the active MANIFEST, then `File.length()`).
- Each WARN includes a remediation hint:

  > Database open is taking unusually long. If MANIFEST is oversized, run:
  > `java -jar Toolkit.jar db archive -d <database-directory>`
  > See [DbArchive](plugins/src/main/java/x86/org/tron/plugins/DbArchive.java) for details.

- Cancel the watchdog task as soon as `factory.open` returns, so no threads leak.
- The watchdog is purely observational: it does not change `factory.open` semantics or the startup flow.

---

## Configuration Changes

- Add an explicit `io.grpc` logger entry in `logback.xml` so operators can tune gRPC log verbosity independently.
- DB stats loggers default to DEBUG; operators can promote them on demand.
- Slow-sig threshold (10 ms) and LevelDB watchdog timings (first WARN at 60 s, repeat every 30 s) ship as constants; whether to expose them as config will be decided based on feedback.

## No API / Protocol Changes

No impact on external APIs or the network protocol.

---

## Testing Strategy

1. **gRPC bridge.** Reproduce a TLS handshake failure or connection reset; verify the JUL-originated messages appear in `tron.log` and are no longer duplicated on stderr.
2. **DB stats downgrade.** Compare INFO-level line counts during startup before/after — expect a significant reduction and clearer initialisation output.
3. **Shutdown completeness.** SIGTERM followed by a deliberately stalled process; verify shutdown-phase logs are complete and flushed in time.
4. **Slow-sig logging.**
   - Unit test: craft a multi-sig transaction near `totalSignNum`; assert an INFO log fires when the cost exceeds 10 ms and does not fire below the threshold.
   - Load test: observe INFO frequency on a realistic workload; confirm it does not spam the log.
5. **LevelDB watchdog.**
   - Unit test: mock `factory.open` to block >60 s; assert the WARN payload includes dbName, MANIFEST size, and the remediation hint.
   - Fast-path test: on a normal fast open, assert the watchdog task is cancelled and no WARN is emitted.

---

## Performance Considerations

- `SLF4JBridgeHandler` adds marginal overhead to JUL forwarding, but gRPC diagnostic logs are rare in normal operation — negligible impact.
- DB stats downgrade is pure log-I/O reduction, no runtime cost.
- Slow-sig instrumentation is two `nanoTime()` calls per verification — nanosecond-level overhead.
- The watchdog is a single scheduled task that exists only during startup; zero runtime cost afterwards.

## Scope of Impact

Logging initialisation, DB stats log level, node shutdown lifecycle, transaction signature verification path, and LevelDB startup path.

**Breaking Changes:** None
**Backward Compatibility:** Fully compatible
**Alternatives Considered:** None

---

## References

- grpc/grpc-java#1577
- [SLF4J JUL Bridge](https://www.slf4j.org/legacy.html#jul-to-slf4j)
- DbArchive— Toolkit subcommand that rewrites oversized LevelDB MANIFEST files

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve logging: SLF4J bridge, less startup noise, fix shutdown log loss #6583

Background

Core Changes

1. JUL→SLF4J bridge

2. Downgrade DB stats log level

3. Shutdown log completeness

4. signature-verify INFO log

5. LevelDB startup watchdog

Configuration Changes

No API / Protocol Changes

Testing Strategy

Performance Considerations

Scope of Impact

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve logging: SLF4J bridge, less startup noise, fix shutdown log loss #6583

Description

Background

Core Changes

1. JUL→SLF4J bridge

2. Downgrade DB stats log level

3. Shutdown log completeness

4. signature-verify INFO log

5. LevelDB startup watchdog

Configuration Changes

No API / Protocol Changes

Testing Strategy

Performance Considerations

Scope of Impact

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions