-
Notifications
You must be signed in to change notification settings - Fork 89
Description
Hello,
I encountered an issue today with bagit.py failing to deal with a broken soft link, and halting bagging an otherwise-intact file system tree. I was attempting to use the script on a directory tree that included a soft-linked file apparently meant to be set at a later date (e.g. .../foo.cfg pointing to .../config/populated_by_user/foo.cfg, similar to what the Apache webserver does with config files). The execution environment in this case is in a POSIX-interfaced file system.
The problem in bagit.py appears to stem from the function _can_read, on (today's) Line 1362. The broken soft link in this case pointed at a non-existent directory, raising an error on Line 206.
I suggest that a broken soft link should not prevent a directory tree from being bagged. It may be better for _can_read to only report actual directories and files that are unreadable, possibly with broken links as a new third output.
If helpful, there is a script that converts a file system walk (via os.walk) to DFXML, and it has an if-ladder that goes through all file-system-level file types, not just directories and regular files. See the walk_to_dfxml.py function filepath_to_fileobject, and all assignment statements matching name_type = (starting on Line 36 today). You may want _can_read to skip operating on other file types as well.
--Alex