Skip to content

Conversation

@juergbi
Copy link
Contributor

@juergbi juergbi commented Jul 18, 2025

Recent Python security fixes to the tarfile module (included in 3.13.4 but also backported to older branches) require link targets to pass the specified filter function as well. This can result in hardlinked files to not be extracted when a base-dir is set.

This commit separates the filtering from the extraction to avoid this issue. Pass filter="tar" as Python 3.14+ will default to the too restrictive filter="data".

python/cpython#135037

Fixes #2029.

Recent Python security fixes (included in 3.13.4 but also backported to
older branches) require link targets to pass the specified filter
function as well. This can result in hardlinked files to not be
extracted when a `base-dir` is set.

This commit separates the filtering from the extraction to avoid this
issue. Pass `filter="tar"` as Python 3.14+ will default to the too
restrictive `filter="data"`.

python/cpython#135037

Fixes #2029.
@gtristan
Copy link
Contributor

Looking at python/cpython#135037, it looks like cpython is messing with extracted links, calling os.path.realpath() (with a new "allow missing" flag)... for some purpose... perhaps for avoiding creating hardlinked files outside of the destination directory...

Your commit itself looks sensible, and addresses the mentioned test case where we are testing tarball behavior - however the behavioral change in python is deep and a bit worrying.

Now that I'm seeing these os.path.realpath() calls popping up in the CPython code around "links", I wonder if this might effect symbolic links ? Surely it would be an absurd python bug if relative symbolic link targets ended up becoming absolute paths when extracted by TarFile, but it also looks like we don't have much coverage around symbolic links in tarball extraction in tests/sources/tar.py, so I'm not sure we would notice if such an absurd bug occurred.

@juergbi I think it is unlikely that python broke symlinks with this, I'll leave it up to you to decide whether you think we need to add more symlink coverage on our side, otherwise lets just go ahead with this merge.

@juergbi
Copy link
Contributor Author

juergbi commented Jul 21, 2025

Now that I'm seeing these os.path.realpath() calls popping up in the CPython code around "links", I wonder if this might effect symbolic links ? Surely it would be an absurd python bug if relative symbolic link targets ended up becoming absolute paths when extracted by TarFile, but it also looks like we don't have much coverage around symbolic links in tarball extraction in tests/sources/tar.py, so I'm not sure we would notice if such an absurd bug occurred.

I've pushed a new test to verify that symlinks are supported and targets are not mangled.

The new test passes without code changes. It would fail on Python 3.14 without filter="tar" (as filter="data" will be the new default), so it's certainly nice to have that covered in the test suite.

@juergbi juergbi merged commit 3032c18 into master Jul 21, 2025
17 checks passed
@juergbi juergbi deleted the jbilleter/tar branch July 21, 2025 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test_out_of_basedir_hardlinks test fails on Python 3.13.x

2 participants