Description
RollingFile rotates files incorrectly if an event is logged shortly (in our case up to 10ms) after the rollover time and the application is restarted within the same rollover period on ext4 (Linux).
Configuration
Version: 2.24.1
Operating system: Debian 11 / Linux 5.10.0-32-amd64
JDK: 21.0.4 Corretto
<RollingFile name="app" fileName="app.log" filePattern="app-%d{yyyy-MM-dd}.log">
<EcsLayout serviceName="foobar"/><!-- layout doesn't matter -->
<Policies>
<TimeBasedTriggeringPolicy modulate="true" interval="1"/>
</Policies>
</RollingFile>
Reproduction
We've been into a deep rabbit hole to analyze these conditions, but here's what happens and how to reproduce:
- Let's say we use daily rotation and an event gets logged at
2024-10-07 00:00:000.000000000 (you can also use minute based rotations for easier reproduction)
TimeBasedTriggeringPolicy#isTriggeringEvent will return true, which results in a rotation
- The current
app.log file will rotate to app-2024-10-07.log
RollingFileManager#createFileAfterRollover will create a new app.log (in FileManager#createOutputStream)
- The new file apparently gets a
creationTime set, but that call is ignored on ext3 / ext4 filesystems.
- Instead, the created file will get its creation timestamp assigned by the OS, which happens to use
current_fs_time() on ext4 – current_fs_time() returns a cached timestamp that's only updated on kernel clock ticks (see getconf CLK_TCK)
- The default frequency for kernel clock ticks is 100 / second, so the cached timestamp from
current_fs_time() might lag behind up to 10 milliseconds
- We end up with a file creation time of
2024-10-06 23:59:59.997423036 (use stat app.log to confirm)
- After the night, we restart the application (e.g. due to a deployment)
TimeBasedTriggeringPolicy#initialize reads RollingFileManager#getFileTime (initalized by RollingFileManager#initialFileTime), which returns the above borked timestamp.
- It will also calculate the next rollover to be
2024-10-07 00:00:000.000000000, which in turn will trigger a new rollover right on application start.
- This "unplanned" rollover will move the current
app.log to app-2024-10-06.log, therefore leading to:
- Loss of the original
app-2024-10-06.log file contents
- Log events of
2024-10-07 00:00:00 until the restart being unexpectedly in app-2024-10-06.log instead of app.log (if still on the same day) or app-2024-10-07.log (later)
- Unexpected file rotation, causing
tail -f app.log to hang (our original symptom) – tail will (by default) follow the file descriptor on the rename to app-2024-10-06.log and new events will be written to app.log instead.
Description
RollingFilerotates files incorrectly if an event is logged shortly (in our case up to 10ms) after the rollover time and the application is restarted within the same rollover period on ext4 (Linux).Configuration
Version: 2.24.1
Operating system: Debian 11 / Linux 5.10.0-32-amd64
JDK: 21.0.4 Corretto
Reproduction
We've been into a deep rabbit hole to analyze these conditions, but here's what happens and how to reproduce:
2024-10-07 00:00:000.000000000(you can also use minute based rotations for easier reproduction)TimeBasedTriggeringPolicy#isTriggeringEventwill returntrue, which results in a rotationapp.logfile will rotate toapp-2024-10-07.logRollingFileManager#createFileAfterRolloverwill create a newapp.log(inFileManager#createOutputStream)creationTimeset, but that call is ignored onext3/ext4filesystems.current_fs_time()on ext4 –current_fs_time()returns a cached timestamp that's only updated on kernel clock ticks (seegetconf CLK_TCK)current_fs_time()might lag behind up to 10 milliseconds2024-10-06 23:59:59.997423036(usestat app.logto confirm)TimeBasedTriggeringPolicy#initializereadsRollingFileManager#getFileTime(initalized byRollingFileManager#initialFileTime), which returns the above borked timestamp.2024-10-07 00:00:000.000000000, which in turn will trigger a new rollover right on application start.app.logtoapp-2024-10-06.log, therefore leading to:app-2024-10-06.logfile contents2024-10-07 00:00:00until the restart being unexpectedly inapp-2024-10-06.loginstead ofapp.log(if still on the same day) orapp-2024-10-07.log(later)tail -f app.logto hang (our original symptom) –tailwill (by default) follow the file descriptor on the rename toapp-2024-10-06.logand new events will be written toapp.loginstead.