Skip to content

Races between readdir/getdents64 and name access. #908

@verygreen

Description

@verygreen

Currently borg reads the directory content, then iterates the names, does lstat on them and calls a processing function based on a file type.
This is racy as between lstat and open/whatever inside of the Archive.process_* method somebody might come in and replace the underlying file (atomically via rename, or via delete + new create).

If the scandir code in #905 is also extended to its logical conclusion and would skip lstat altogether, using file type to determine what process method to call, the race window would extend even more since there might be significant time passing between reading the directory and actually getting into some of the files later in the output.
Additional races are possible even inside of the process methods, some of which are documented in #906

I suspect the correct way to deal with this would be to pass in inode number into the process_* method.
The very first thing such method would then do is to open the name presented (with possible exception of process_symlink, since symlinks cannot be opened).
Code needs to be prepared to get ENOENT on open/readlink. if this happens, this should be handled as if this name was not even in the readdir list (since we lost the race to access it anyway).
Once the open is done, fstat on the file descriptor will tell us if inode number has changed and along with it the file type so that we might need to call another processing function (if we can pass an open fd there, we can totally avoid additional racing there).
The rest of the processing function needs to keep using fd-based accesses since in a live filesystem a file name might start point to a different location at any moment and assuming it's always the same inode is unsafe.

Additional problem is present in Archiver._process method for the directory handling.
Upon noticing that we are dealing with a directory, first archive.process_directory is called on the file name, second, we do readdir or scandir on it, bt by the time we get to it, the directory might disappear (which is ok, I guess, we'll just have an entry for an empty dir), or it might be replaced with another file type (also ok, I guess, we still get the empty dir in the archive) or it might be replaced by another dir - this is more problematic as we'd then proceed to insert wrong dir content into the archive which might pose problems starting from just inconsistent data in the archive and then security issues (imagine if the first scan had permissible access, but by the time we got to read the contents - we got one that was a lot more restrictive).
Finally even worse variation of this would be file-replacement - since iteration is happening with the full path every time (yuck), if I have
/path/dir1/file1
/path/dir2/file1
and I rename dir2 to dir1 just at the right time, I am backing up wrong file1 while processing dir1 and I am none the wiser.
The fix here would be to move all the iterating into process_dir().
it would start as usual with open (O_DIRECTORY), then once we have that fd, we are golden, we just need to use fgetdirents64 so that we read this directory at the fd not at some inconclusive path.
Also when we process the entries - we need to perform openat(2) to open that named in the directory referenced by directory filedescriptor so they don't switch under us.
I guess the alternative here would be for _process to be called not with a full name, but with a file descriptor, fstat it if needed, and pass it down to process_*.
On dir processing it would do iteration via fgetdents, perform openat(2) on results and pass that fd down recursively instead of the name.
For platforms not implementing fgetdents/openat, I guess it's possible to chdir into every subdir and do relative lookups, but returning to the previous level might get tricky then.

bountry (paid using borgbackup org funds): https://www.bountysource.com/issues/32968949-races-between-readdir-getdents64-and-name-access

Metadata

Metadata

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions