Background
The current Java-tron logging and startup observability has five concrete problems that hurt operator diagnosis:
- gRPC logs bypass Logback and go to stderr. grpc-java uses
java.util.logging (JUL) internally. The project already ships jcl-over-slf4j, but the JUL→SLF4J bridge is absent. As a result, gRPC diagnostic messages — connection resets, TLS handshake failures, etc. — never reach tron.log, forcing operators to watch stderr and log files simultaneously (see grpc-java#1577).
- DB stats flood startup at INFO. DbStat.statProperty() logs at INFO, burying critical initialization messages.
- Shutdown logs are incomplete on abnormal exits. TronLogShutdownHook waits up to 180s after SIGTERM; if the node hangs inside that window, critical shutdown-phase logs may never be flushed.
- Signature verification is invisible. ECDSA/SM2 signature recovery is a CPU hotspot. A single slow verification can stall block application or mempool admission, but today there is no per-call timing signal to locate it.
- LevelDB startup silently blocks on an oversized MANIFEST. In LevelDbDataSourceImpl.openDatabase(),
factory.open() is a JNI blocking call. When the MANIFEST file is large, the startup hangs there for minutes (or longer) with no log output. Operators cannot tell "stuck" from "recovering," and are not pointed at the Toolkit's DbArchive tool.
Core Changes
1. JUL→SLF4J bridge
At the earliest point during node startup (FullNode.main / SolidityNode.main), reset JUL's default handlers and install the bridge so gRPC's JUL output is routed through SLF4J/Logback:
// Remove default JUL handlers to avoid duplicate output
java.util.logging.LogManager.getLogManager().reset();
org.slf4j.bridge.SLF4JBridgeHandler.install();
// build.gradle
implementation 'org.slf4j:jul-to-slf4j:1.7.36'
2. Downgrade DB stats log level
Change log.info(...) in DbStat.statProperty() and similar periodic stat methods to log.debug(...), cutting startup noise without losing diagnostic value on demand.
3. Shutdown log completeness
Introduce a volatile boolean shuttingDown flag in TronLogShutdownHook, shorten unnecessary wait intervals, and make flushing critical log buffers the first-priority step in the JVM shutdown hook.
4. signature-verify INFO log
Instrument TransactionCapsule.validatePubSignature() around the validateSignature(Transaction, ...) call. When a single verification takes more than 10 ms, emit an INFO log with txId, signature count, permissionId, and elapsed ms:
long start = System.nanoTime();
try {
if (!validateSignature(this.transaction, hash, accountStore, dynamicPropertiesStore)) {
isVerified = false;
throw new ValidateSignatureException("sig error");
}
} finally {
long costMs = (System.nanoTime() - start) / 1_000_000L;
if (costMs > 10) {
logger.info("slow sig verify: txId={}, sigCount={}, permissionId={}, cost={}ms",
getTransactionId(), transaction.getSignatureCount(),
transaction.getRawData().getContract(0).getPermissionId(), costMs);
}
}
- The 10 ms threshold is a constant for now; it can be promoted to config later.
- Use
nanoTime() to avoid millisecond-granularity sampling error.
- Covered hot paths: block application, mempool admission, P2P broadcast validation.
5. LevelDB startup watchdog
In LevelDbDataSourceImpl.openDatabase(), wrap the JNI-blocking factory.open(dbPath.toFile(), dbOptions) call with a watchdog:
-
Before calling factory.open, submit a one-shot ScheduledExecutorService task. If factory.open has not returned within 60 s, emit the first WARN, then repeat every 30 s.
-
WARN payload: dbName, dbPath, elapsed seconds, current MANIFEST filename and size (resolved by reading the CURRENT file to find the active MANIFEST, then File.length()).
-
Each WARN includes a remediation hint:
Database open is taking unusually long. If MANIFEST is oversized, run:
java -jar Toolkit.jar db archive -d <database-directory>
See DbArchive for details.
-
Cancel the watchdog task as soon as factory.open returns, so no threads leak.
-
The watchdog is purely observational: it does not change factory.open semantics or the startup flow.
Configuration Changes
- Add an explicit
io.grpc logger entry in logback.xml so operators can tune gRPC log verbosity independently.
- DB stats loggers default to DEBUG; operators can promote them on demand.
- Slow-sig threshold (10 ms) and LevelDB watchdog timings (first WARN at 60 s, repeat every 30 s) ship as constants; whether to expose them as config will be decided based on feedback.
No API / Protocol Changes
No impact on external APIs or the network protocol.
Testing Strategy
- gRPC bridge. Reproduce a TLS handshake failure or connection reset; verify the JUL-originated messages appear in
tron.log and are no longer duplicated on stderr.
- DB stats downgrade. Compare INFO-level line counts during startup before/after — expect a significant reduction and clearer initialisation output.
- Shutdown completeness. SIGTERM followed by a deliberately stalled process; verify shutdown-phase logs are complete and flushed in time.
- Slow-sig logging.
- Unit test: craft a multi-sig transaction near
totalSignNum; assert an INFO log fires when the cost exceeds 10 ms and does not fire below the threshold.
- Load test: observe INFO frequency on a realistic workload; confirm it does not spam the log.
- LevelDB watchdog.
- Unit test: mock
factory.open to block >60 s; assert the WARN payload includes dbName, MANIFEST size, and the remediation hint.
- Fast-path test: on a normal fast open, assert the watchdog task is cancelled and no WARN is emitted.
Performance Considerations
SLF4JBridgeHandler adds marginal overhead to JUL forwarding, but gRPC diagnostic logs are rare in normal operation — negligible impact.
- DB stats downgrade is pure log-I/O reduction, no runtime cost.
- Slow-sig instrumentation is two
nanoTime() calls per verification — nanosecond-level overhead.
- The watchdog is a single scheduled task that exists only during startup; zero runtime cost afterwards.
Scope of Impact
Logging initialisation, DB stats log level, node shutdown lifecycle, transaction signature verification path, and LevelDB startup path.
Breaking Changes: None
Backward Compatibility: Fully compatible
Alternatives Considered: None
References
Background
The current Java-tron logging and startup observability has five concrete problems that hurt operator diagnosis:
java.util.logging(JUL) internally. The project already shipsjcl-over-slf4j, but the JUL→SLF4J bridge is absent. As a result, gRPC diagnostic messages — connection resets, TLS handshake failures, etc. — never reachtron.log, forcing operators to watch stderr and log files simultaneously (see grpc-java#1577).factory.open()is a JNI blocking call. When the MANIFEST file is large, the startup hangs there for minutes (or longer) with no log output. Operators cannot tell "stuck" from "recovering," and are not pointed at the Toolkit'sDbArchivetool.Core Changes
1. JUL→SLF4J bridge
At the earliest point during node startup (
FullNode.main/SolidityNode.main), reset JUL's default handlers and install the bridge so gRPC's JUL output is routed through SLF4J/Logback:2. Downgrade DB stats log level
Change
log.info(...)in DbStat.statProperty() and similar periodic stat methods tolog.debug(...), cutting startup noise without losing diagnostic value on demand.3. Shutdown log completeness
Introduce a
volatile boolean shuttingDownflag in TronLogShutdownHook, shorten unnecessary wait intervals, and make flushing critical log buffers the first-priority step in the JVM shutdown hook.4. signature-verify INFO log
Instrument TransactionCapsule.validatePubSignature() around the validateSignature(Transaction, ...) call. When a single verification takes more than 10 ms, emit an INFO log with txId, signature count, permissionId, and elapsed ms:
nanoTime()to avoid millisecond-granularity sampling error.5. LevelDB startup watchdog
In LevelDbDataSourceImpl.openDatabase(), wrap the JNI-blocking
factory.open(dbPath.toFile(), dbOptions)call with a watchdog:Before calling
factory.open, submit a one-shotScheduledExecutorServicetask. Iffactory.openhas not returned within 60 s, emit the first WARN, then repeat every 30 s.WARN payload: dbName, dbPath, elapsed seconds, current MANIFEST filename and size (resolved by reading the
CURRENTfile to find the active MANIFEST, thenFile.length()).Each WARN includes a remediation hint:
Cancel the watchdog task as soon as
factory.openreturns, so no threads leak.The watchdog is purely observational: it does not change
factory.opensemantics or the startup flow.Configuration Changes
io.grpclogger entry inlogback.xmlso operators can tune gRPC log verbosity independently.No API / Protocol Changes
No impact on external APIs or the network protocol.
Testing Strategy
tron.logand are no longer duplicated on stderr.totalSignNum; assert an INFO log fires when the cost exceeds 10 ms and does not fire below the threshold.factory.opento block >60 s; assert the WARN payload includes dbName, MANIFEST size, and the remediation hint.Performance Considerations
SLF4JBridgeHandleradds marginal overhead to JUL forwarding, but gRPC diagnostic logs are rare in normal operation — negligible impact.nanoTime()calls per verification — nanosecond-level overhead.Scope of Impact
Logging initialisation, DB stats log level, node shutdown lifecycle, transaction signature verification path, and LevelDB startup path.
Breaking Changes: None
Backward Compatibility: Fully compatible
Alternatives Considered: None
References