Skip to content

tail: fix race condition#3798

Merged
sylvestre merged 9 commits intouutils:mainfrom
jhscheer:tail_fix_3765
Aug 13, 2022
Merged

tail: fix race condition#3798
sylvestre merged 9 commits intouutils:mainfrom
jhscheer:tail_fix_3765

Conversation

@jhscheer
Copy link
Copy Markdown
Contributor

@jhscheer jhscheer commented Aug 8, 2022

fixes #3765

There exists a race condition (RC) that can occur if changes to a path
happen after the initial print loop in uu_tail(), but before the
path is added to the notify-Watcher thread in follow().

@niyaznigmatullin found this RC and identified it as the reason why "gnu/tests/tail-2/F-headers.sh" failed sometimes in the CI and he also came up with a fix for it.

The proposed fix manually handles possible events that could have been missed before.
However, this does not reduce the possibility for the RC to occur, and it uses a lot of duplicate code instead of utilizing the functionality of the Watcher thread together with the handle_event() function.

To minimize the window where the RC can occur, this PR moves starting the
Watcher thread and adding paths to it from follow() to the initial
print loop in uu_tail().

Additionally, to make sure the RC cannot happen in
"gnu/tests/tail-2/F-headers.sh", the error message that is used as a trigger
in this test, is delayed until the path is added to the Watcher thread.

The workarounds for "tests/tail-2/F-headers.sh" in build-gnu.sh should no longer be needed.

There exists a race condition (RC) that can occur if changes to a path
happen after the initial print loop in `uu_tail()`, but before the
path is added to the notify-Watcher thread in `follow()`.

To minimize the window where the RC can occur, this moves starting the
Watcher thread and adding paths to it from `follow()` to the initial
print loop in `uu_tail()`.

Additionally, to make sure the RC cannot happen in
"gnu/tests/tail-2/F-headers.sh", the error message that is used as a trigger
in this test, is delayed until the path is added to the Watcher thread.
Remove workarounds for "tests/tail-2/F-headers.sh" which are
(presumably) no longer needed because of the race condition fix.
Remove workarounds for "tests/tail-2/F-headers.sh" which are
(presumably) no longer needed because of the race condition fix.
@sylvestre
Copy link
Copy Markdown
Contributor

@niyaznigmatullin how do you feel about this? thanks

Comment thread src/uu/tail/src/tail.rs Outdated
@niyaznigmatullin
Copy link
Copy Markdown
Contributor

I like the change, but can we move the watcher addition before tail actually writes anything?

@niyaznigmatullin
Copy link
Copy Markdown
Contributor

niyaznigmatullin commented Aug 9, 2022

I think I found another race condition on detecting the orphans. Because it's detected after follow is run, if a file didn't exist and it's parent directory didn't exist, we print that we cannot access it. And then if it's created, we don't watch it, and it's not detecting that it's an orphan, because the directory could have been created by that time.

As I understand, the idea was, that we need to make sure for each file either it's being watched, or it's parent is watched or it's added to orphans. So the last is not true sometimes.

@jhscheer
Copy link
Copy Markdown
Contributor Author

jhscheer commented Aug 9, 2022

@niyaznigmatullin Thanks for the feedback. I think both of your points are valid. I'll implement them soon.

Move "adding paths to Watcher thread" to its own loop and run this loop
before the initial tail-print-loop in order to minimize the window for
race conditions.
@jhscheer
Copy link
Copy Markdown
Contributor Author

This should be good to go.

@niyaznigmatullin
Copy link
Copy Markdown
Contributor

I tested all tests with timeouts, everything passed but descriptor-vs-rename.sh. I've found the reason, it moves the file:
mv a b and then immediately writes echo y >> b, but when we handle event of renaming, we seek to the end of the new file, skipping the last written part, which never occurs in tail output.

I propose to use old opened reader, since it stores the file descriptor inside it, so no need to recreate one. Proposed change is here: niyaznigmatullin@fd145ee

@jhscheer
Copy link
Copy Markdown
Contributor Author

I tested all tests with timeouts, everything passed

Nice, thanks for testing.

I propose to use old opened reader, since it stores the file descriptor inside it, so no need to recreate one. Proposed change is here: niyaznigmatullin@fd145ee

That looks like a good idea, please open a follow-up PR.
(I would keep the ? for the File::open() call and remove the None case. Otherwise if there is an error, we will never know about it.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GNU tests/tail-2/F-headers is failing often in the GNU CI

3 participants