-
-
Notifications
You must be signed in to change notification settings - Fork 837
Sanitize paths during archive creation and extraction #7108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
438cf2e
db96c0c
b7ce3b1
518c4fb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -217,12 +217,66 @@ def dir_is_tagged(path, exclude_caches, exclude_if_present): | |
| return tag_names | ||
|
|
||
|
|
||
| _safe_re = re.compile(r"^((\.\.)?/+)+") | ||
| def make_path_safe(path): | ||
| """ | ||
| Make path safe by making it relative and normalized. | ||
|
|
||
| `path` is sanitized by making it relative, removing | ||
| consecutive slashes (e.g. '//'), removing '.' elements, | ||
| and removing trailing slashes. | ||
|
|
||
| def make_path_safe(path): | ||
| """Make path safe by making it relative and local""" | ||
| return _safe_re.sub("", path) or "." | ||
| For reasons of security, a ValueError is raised should | ||
| `path` contain any '..' elements. | ||
| """ | ||
| path = path.lstrip("/") | ||
| if "\\" in path: # borg always wants slashes, never backslashes. | ||
| raise ValueError(f"unexpected backslash(es) in path {path!r}") | ||
| if path.startswith("../") or "/../" in path or path.endswith("/..") or path == "..": | ||
| raise ValueError(f"unexpected '..' element in path {path!r}") | ||
| path = os.path.normpath(path) | ||
ThomasWaldmann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| return path | ||
|
|
||
|
|
||
| _dotdot_re = re.compile(r"^(\.\./)+") | ||
|
|
||
|
|
||
| def remove_dotdot_prefixes(path): | ||
| """ | ||
| Remove '../'s at the beginning of `path`. Additionally, | ||
| the path is made relative. | ||
|
|
||
| `path` is expected to be normalized already (e.g. via `os.path.normpath()`). | ||
| """ | ||
| path = path.lstrip("/") | ||
| path = _dotdot_re.sub("", path) | ||
| if path in ["", ".."]: | ||
| return "." | ||
| return path | ||
|
Comment on lines
+251
to
+254
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also, i am asking myself whether this is useful, it might completely change the path so it points somewhere else. maybe rather reject than modify-and-accept?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Which kind of paths should be accepted/rejected? I assume we still want to allow users to specify absolute paths and simple relative paths like a/b/c or ./a/b/c. So, are you suggesting to refuse any ../some/path? Note that the regex starts with ^ and we only remove prefixes from the path.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Guess we can approach this from 2 perspectives: a) what does your PR change compared to the existing code for archiving items the new b) what do we really want it to be usually borg recurses starting from the recursion roots (which we can normalize first) and then only pretty normal paths will be generated by the recursion. for this, we do not need special path processing per item. when fed with a paths list via stdin (and not using borg's recursor), borg does not have control over what's coming in from there, but borg is also not required to accept too crappy stuff and still make great sense from it (the admin or tool feeding that list into borg can be expected to provide reasonable paths), borg instead could skip invalid paths with an error msg. So:
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm not sure I agree with it not being useful. When I create an archive for '.', I'd also expect that it contains '.' (user, permissions, attributes). So, that I can extract it again, move it to it's original location and permission are correct again.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I, personally, wouldn't give this too much consideration. Seems unlikely to me that people will end up providing such crappy paths to Borg. Paths with './' or '//', sure, but something like "root/foo/../../../bar" seems rather unlikely to me.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Handling of '.' we should perhaps be looked at separately. There are some cases where it isn't handled ideally: Empty line at end is interpreted as '.' somehow. Probably related to this:
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah, likely. about archiving "." directory making sense: yes, i somehow agree, but how would you ever extract that? borg usually expects nothing being in the way, so it usually rmdirs target and then extracts target. not sure whether that would work with target == ".".
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Guess the main issue remaining here is that the code removes arbitrary amounts of leading We won't remember whether it was The original code before your PR also had this issue, but guess if we clean it up, we should do it right. I guess the only context where this could make sense is if we want "no warnings / no errors" borg1 archive transfer and accept that information loss (preferring it over introducing some security issue into borg2 archives). So, in case this is only meant for For borg2 "borg create", I guess we rather want to reject if something starts with
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, thought more about this, and guess I have to correct myself:
|
||
|
|
||
|
|
||
| def assert_sanitized_path(path): | ||
| assert isinstance(path, str) | ||
| # `path` should have been sanitized earlier. Some features, | ||
| # like pattern matching rely on a sanitized path. As a | ||
| # precaution we check here again. | ||
| if make_path_safe(path) != path: | ||
| raise ValueError(f"path {path!r} is not sanitized") | ||
| return path | ||
|
|
||
|
|
||
| def to_sanitized_path(path): | ||
ThomasWaldmann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| assert isinstance(path, str) | ||
| # Legacy versions of Borg still allowed non-sanitized paths | ||
| # to be stored. So, we sanitize them when reading. | ||
| # | ||
| # Borg 2 ensures paths are safe before storing them. Thus, when | ||
| # support for reading Borg 1 archives is dropped, this should be | ||
| # changed to a simple check to verify paths aren't malicious. | ||
| # Namely, absolute paths and paths containing '..' elements must | ||
| # be rejected. | ||
| # | ||
| # Also checks for '..' elements in `path` for reasons of security. | ||
| return make_path_safe(path) | ||
|
|
||
|
|
||
| class HardLinkManager: | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.