-
Notifications
You must be signed in to change notification settings - Fork 15
Accommodate syslog date formats #12
Description
Linux still follows the "syslog" date format from RFC3164. RFC5424 intended to obsolete that and implement a new timestamp format, but that transition has not occurred simply due to industry momentum (we're too lazy to change ;) ).
Issue #1 requests support for ISO 8601 format. That's valid. But Again, ref section 6.2.3 of RFC5424 :
The TIMESTAMP field is a formalized timestamp derived from RFC3339. Whereas RFC3339 makes allowances for multiple syntaxes, this document imposes further restrictions.
And in the RFC3339 Abstract :
This document defines a date and time format for use in Internet protocols that is a profile of the ISO 8601 standard...
Rephrasing ... we use RFC3164, which is obsoleted by RFC5424, which uses a modified version of ISO 8601. So all three formats are valid. But since common systems don't use RFC5424 yet, we only really need RFC3164. ISO 8601 is valid for other applications. RFC5424 may be valid for rare environments, and should be implemented for forward-thinking applications.
Therefore, for this request, I believe logmerge should support RFC3164 directly in the parsing to find a valid date format.
I will post a PR on this.
Until then, the simplest command-line for Linux logfiles is:
./logmerge.py -r "^(.{14})" -f "%b %d %H:%M:%S" file1 file2 ...
or
./logmerge.py -r "^(...............)" -f "%b %d %H:%M:%S" file1 file2 ...
Those are "naive" regex patterns to just select the first 14 characters and match them to the formatting codes. That is the equivalent of saying "It doesn't matter what those first 14 characters are, I'm telling you what they mean" - there's no actual need to be more rigorous from the command-line. See below.
For the code, I'm supporting this pattern:
./logmerge.py -r "(.{3} \d\d \d\d:\d\d:\d\d)" -f "%b %d %H:%M:%S" file1 file2 ...
The more-specific regex can avoid errors, but I don't think any log files are going to deviate from that. The "naive" regex should not be used in the code because the regex will match any 14 characters, and when the format is applied in strptime() it will abort if the local system does not use that format. The initial three characters are the localized month. I'm using three "any characters" because I'm not positive about how different localizations will translate to three "alpha characters". The RFC specifies a three-character month identifier. The %b function will not work reliably in a fringe environment where the system parsing the log is in a different locale from the system saving the log.