Skip to content

Improve logging: SLF4J bridge, less startup noise, fix shutdown log loss #6583

@halibobo1205

Description

@halibobo1205

Background

The current Java-tron logging and startup observability has five concrete problems that hurt operator diagnosis:

  1. gRPC logs bypass Logback and go to stderr. grpc-java uses java.util.logging (JUL) internally. The project already ships jcl-over-slf4j, but the JUL→SLF4J bridge is absent. As a result, gRPC diagnostic messages — connection resets, TLS handshake failures, etc. — never reach tron.log, forcing operators to watch stderr and log files simultaneously (see grpc-java#1577).
  2. DB stats flood startup at INFO. DbStat.statProperty() logs at INFO, burying critical initialization messages.
  3. Shutdown logs are incomplete on abnormal exits. TronLogShutdownHook waits up to 180s after SIGTERM; if the node hangs inside that window, critical shutdown-phase logs may never be flushed.
  4. Signature verification is invisible. ECDSA/SM2 signature recovery is a CPU hotspot. A single slow verification can stall block application or mempool admission, but today there is no per-call timing signal to locate it.
  5. LevelDB startup silently blocks on an oversized MANIFEST. In LevelDbDataSourceImpl.openDatabase(), factory.open() is a JNI blocking call. When the MANIFEST file is large, the startup hangs there for minutes (or longer) with no log output. Operators cannot tell "stuck" from "recovering," and are not pointed at the Toolkit's DbArchive tool.

Core Changes

1. JUL→SLF4J bridge

At the earliest point during node startup (FullNode.main / SolidityNode.main), reset JUL's default handlers and install the bridge so gRPC's JUL output is routed through SLF4J/Logback:

// Remove default JUL handlers to avoid duplicate output
java.util.logging.LogManager.getLogManager().reset();
org.slf4j.bridge.SLF4JBridgeHandler.install();
// build.gradle
implementation 'org.slf4j:jul-to-slf4j:1.7.36'

2. Downgrade DB stats log level

Change log.info(...) in DbStat.statProperty() and similar periodic stat methods to log.debug(...), cutting startup noise without losing diagnostic value on demand.

3. Shutdown log completeness

Introduce a volatile boolean shuttingDown flag in TronLogShutdownHook, shorten unnecessary wait intervals, and make flushing critical log buffers the first-priority step in the JVM shutdown hook.

4. signature-verify INFO log

Instrument TransactionCapsule.validatePubSignature() around the validateSignature(Transaction, ...) call. When a single verification takes more than 10 ms, emit an INFO log with txId, signature count, permissionId, and elapsed ms:

long start = System.nanoTime();
try {
  if (!validateSignature(this.transaction, hash, accountStore, dynamicPropertiesStore)) {
    isVerified = false;
    throw new ValidateSignatureException("sig error");
  }
} finally {
  long costMs = (System.nanoTime() - start) / 1_000_000L;
  if (costMs > 10) {
    logger.info("slow sig verify: txId={}, sigCount={}, permissionId={}, cost={}ms",
        getTransactionId(), transaction.getSignatureCount(),
        transaction.getRawData().getContract(0).getPermissionId(), costMs);
  }
}
  • The 10 ms threshold is a constant for now; it can be promoted to config later.
  • Use nanoTime() to avoid millisecond-granularity sampling error.
  • Covered hot paths: block application, mempool admission, P2P broadcast validation.

5. LevelDB startup watchdog

In LevelDbDataSourceImpl.openDatabase(), wrap the JNI-blocking factory.open(dbPath.toFile(), dbOptions) call with a watchdog:

  • Before calling factory.open, submit a one-shot ScheduledExecutorService task. If factory.open has not returned within 60 s, emit the first WARN, then repeat every 30 s.

  • WARN payload: dbName, dbPath, elapsed seconds, current MANIFEST filename and size (resolved by reading the CURRENT file to find the active MANIFEST, then File.length()).

  • Each WARN includes a remediation hint:

    Database open is taking unusually long. If MANIFEST is oversized, run:
    java -jar Toolkit.jar db archive -d <database-directory>
    See DbArchive for details.

  • Cancel the watchdog task as soon as factory.open returns, so no threads leak.

  • The watchdog is purely observational: it does not change factory.open semantics or the startup flow.


Configuration Changes

  • Add an explicit io.grpc logger entry in logback.xml so operators can tune gRPC log verbosity independently.
  • DB stats loggers default to DEBUG; operators can promote them on demand.
  • Slow-sig threshold (10 ms) and LevelDB watchdog timings (first WARN at 60 s, repeat every 30 s) ship as constants; whether to expose them as config will be decided based on feedback.

No API / Protocol Changes

No impact on external APIs or the network protocol.


Testing Strategy

  1. gRPC bridge. Reproduce a TLS handshake failure or connection reset; verify the JUL-originated messages appear in tron.log and are no longer duplicated on stderr.
  2. DB stats downgrade. Compare INFO-level line counts during startup before/after — expect a significant reduction and clearer initialisation output.
  3. Shutdown completeness. SIGTERM followed by a deliberately stalled process; verify shutdown-phase logs are complete and flushed in time.
  4. Slow-sig logging.
    • Unit test: craft a multi-sig transaction near totalSignNum; assert an INFO log fires when the cost exceeds 10 ms and does not fire below the threshold.
    • Load test: observe INFO frequency on a realistic workload; confirm it does not spam the log.
  5. LevelDB watchdog.
    • Unit test: mock factory.open to block >60 s; assert the WARN payload includes dbName, MANIFEST size, and the remediation hint.
    • Fast-path test: on a normal fast open, assert the watchdog task is cancelled and no WARN is emitted.

Performance Considerations

  • SLF4JBridgeHandler adds marginal overhead to JUL forwarding, but gRPC diagnostic logs are rare in normal operation — negligible impact.
  • DB stats downgrade is pure log-I/O reduction, no runtime cost.
  • Slow-sig instrumentation is two nanoTime() calls per verification — nanosecond-level overhead.
  • The watchdog is a single scheduled task that exists only during startup; zero runtime cost afterwards.

Scope of Impact

Logging initialisation, DB stats log level, node shutdown lifecycle, transaction signature verification path, and LevelDB startup path.

Breaking Changes: None
Backward Compatibility: Fully compatible
Alternatives Considered: None


References

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    In Progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions