in_tail: only rely on fstat() to detect file rotation#10280
in_tail: only rely on fstat() to detect file rotation#10280david-garcia-garcia wants to merge 16 commits intofluent:masterfrom
Conversation
|
Example configuration file Debug logs file truncation |
|
thanks for contributing this PR. This is a very sensitive change (actually this is one of the parts of the plugin I avoid to touch :D ), wondering how we can extend testing to avoid regressions, I remember there are a couple of corner cases. adding @leonardo-albertovich as extra eyes for this one. |
|
|
||
| int64_t size_delta = st.st_size - file->size; | ||
| if (size_delta != 0) { | ||
| file->size = st.st_size; |
There was a problem hiding this comment.
This is the only place in this PR where the change is not restricted to the method scope and might have any impact outside the method. This file->size assignment was not here before (and it could be removed as we only need size_delta to detect truncation). I introduced this only for consistency with the other implementations.
I had the feeling file_tail might be one of those things that was implemented at first and everyone uses. The kind of thing you don't want to change if it's not broken. Reading logs from NFS might no be the most common use case, but as people move workloads to the cloud, shared remote storage will become more common. Plus this bug only surfaces with some particular configurations of NFS where file metadata is cached. The PR is actually the second approach to solve the issue (I had some fix deployed that worked, but it was too complex and overthought). The current change proposal is very scope limited and its impact can be easily grasped by reading the code changes. The only possible side effect of this change is that on NFS with metadata cache detecting truncation might be delayed until the metadata cache is updated. But this is in any case much better than having false truncation/rotation detections that lead to repeated log ingest. I've been running a fork with this fix in production for some days now without issues.
As per what this PR tries to solve I don't think there is a feasible way of testing it as it relies on the stream offset and fstat() providing non synchronized information which is something you cannot easily artificially produce. |
|
we will assign some time to review this early next week, moving it to the next milestone. note that DCO (sign-off) must be fixed |
Signed-off-by: deivid.garcia.garcia@gmail.com <deivid.garcia.garcia@gmail.com>
Signed-off-by: Thomas Devoogdt <thomas@devoogdt.com> Signed-off-by: deivid.garcia.garcia@gmail.com <deivid.garcia.garcia@gmail.com>
Signed-off-by: Thomas Devoogdt <thomas@devoogdt.com> Signed-off-by: deivid.garcia.garcia@gmail.com <deivid.garcia.garcia@gmail.com>
Signed-off-by: Thomas Devoogdt <thomas@devoogdt.com> Signed-off-by: deivid.garcia.garcia@gmail.com <deivid.garcia.garcia@gmail.com>
Not needed when using system libs. Signed-off-by: Thomas Devoogdt <thomas@devoogdt.com> Signed-off-by: deivid.garcia.garcia@gmail.com <deivid.garcia.garcia@gmail.com>
Not needed when using system libs. Signed-off-by: Thomas Devoogdt <thomas@devoogdt.com> Signed-off-by: deivid.garcia.garcia@gmail.com <deivid.garcia.garcia@gmail.com>
- added a install system libraries step To make it clear that this is part of the test. - assert the linked system libs Check if the system lib is effectively linked as expected. Signed-off-by: Thomas Devoogdt <thomas@devoogdt.com> Signed-off-by: deivid.garcia.garcia@gmail.com <deivid.garcia.garcia@gmail.com>
Signed-off-by: Thomas Devoogdt <thomas@devoogdt.com> Signed-off-by: deivid.garcia.garcia@gmail.com <deivid.garcia.garcia@gmail.com>
Signed-off-by: Thomas Devoogdt <thomas@devoogdt.com> Signed-off-by: deivid.garcia.garcia@gmail.com <deivid.garcia.garcia@gmail.com>
Signed-off-by: Thomas Devoogdt <thomas@devoogdt.com> Signed-off-by: deivid.garcia.garcia@gmail.com <deivid.garcia.garcia@gmail.com>
00ca436 to
78090a5
Compare
# Conflicts: # .github/workflows/pr-compile-check.yaml # CMakeLists.txt
When reading logs from NFS the results obtained by calling fstat() might be outdated. This leads to looped ingest as the plugin detects truncation over and over again when the file has not been truncated because it compares the current stream offset with the file size reported in the metadata (which is stale).
This PR solves this by relying exclusively on the information provided by fstat() to decide wether or not to rotate, where before it used offset and fstat()->file_size (offset could be larger the the file_size reported by fstat as fstat comes from cache and takes some time to update).
Fixes #10276
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.