Skip to content

Order of traversing files in a directory #905

@verygreen

Description

@verygreen

Moving from #325 in a ticket of its own.

While looking around Archiver::_process I noticed that:

            if not dry_run:
                status = archive.process_dir(path, st)
            try:
                entries = os.listdir(path)
            except OSError as e:
                status = 'E'
                self.print_warning('%s: %s', path, e)
            else:
                for filename in sorted(entries):
                    entry_path = os.path.normpath(os.path.join(path, filename))
                    self._process(archive, cache, matcher, exclude_caches, exclude_if_present,
                                  keep_tag_files, skip_inodes, entry_path, restrict_dev,
                                  read_special=read_special, dry_run=dry_run)

I imagine the sort(entries) just sorts stuff alphabetically? If so, this is a pretty bad idea on a bunch of filesystems for large directories on spinning media. Affected filesystems include reiserfs, ext4 and probably many more that have static inode allocation.
The problem at hand is when you sort this list in an order different than inode numbers, every lstat that you do seeks the disk to a different inode table block somewhere which is really slow.
Not sorting the output is also not an option since then you get some "random" underlying hash sorting that also might not correspond to creation order. The most optimal strategy here for things with fixed inode tables is to sort by inode number (available in readdir output).
For filesystems with no static inode tables (like reiserfs) I imagine that would also lead to desired effect because it still is a good proxy for create time that leads to ondisk location (earlier created files get closer to start of disk in a chunk if they were written together).
More advanced COW filesystems like zfs and bttrfs - I am less sure, those might be all over the place depending on a bunch of stuff, though. I assume sort ordering would not matter there all that much.

A question was also raised if this makes any practical difference. And it certainly does for large directories (with millions of entries) that are not cached.
Here are some links to ext3 discussions about when htree code became popular and discussion arose. You'll see that performance impact is quite high:
https://bugzilla.kernel.org/show_bug.cgi?id=417
https://lkml.org/lkml/2008/2/18/263

There are tons more and the same issue is present in other filesystems with hashed names. I know about this because I am a Linux fs developer on my day job (and we met this when I was still working on reiserfs).

Additionally as we look outside of Linux (which is really optimized in these areas), other systems caches are (in some cases? in many cases) a lot more suboptimal or even non-exstent (e.g. in BSD where every fs driver used to need to implement their own cache).
I had a big confirmation of this yesterday. So I had a ~2M structure of files in on my Linux server, all on rotating media. find / -perm 0 (to cause stat calls) took 15 seconds there.
Now I also have this macbook I am working on, it has ssd and about 2.5M files on it, same find takes 2m50s. Quite a difference thanks to many-many variables I am sure.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions