Skip to content

Conversation

@huonw
Copy link
Contributor

@huonw huonw commented Oct 5, 2022

This fixes a (small) bug in #128: the virtualenv may have a symlink to the 'actual' path of the python executable, and the python_path value used may've itself been a symlink or a shim. This uses sys.executable as an estimate of this path, which will hopefully be good enough for the circumstances where bootstrap-cache-key is useful.

After this patch, running on my system shows (note the difference between python_path and python_executable_path):

os_name=Darwin arch=arm64 python_path=/opt/homebrew/bin/python3.9 python_executable_path=/opt/homebrew/opt/python@3.9/bin/python3.9 python_version=Python 3.9.14 pex_version=2.1.103 virtualenv_requirements_sha256=80b5a45ee3ee507e268799305d4e5e347c7e8346df6551b6a83df05396b3d941 pants_version=2.13.0

It looks like virtualenv does more complicated things looking at sys._base_executable (if it exists).

I don't think it's worth reproducing that logic, and it's not possible to invoke virtualenv directly at this point (this needs to be able to run before installing any packages). An alternative would be to no longer use virtualenv and instead switch to the stdlib venv, which can be run as part of the cache-key process. I suspect the use of virtualenv may've been from before pants required Python version >= 3.3 (when venv was added)? Or maybe pants uses features beyond what python -m venv ... can offer?

Copy link
Contributor

@benjyw benjyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want/need to call os.path.realpath() on that?

@huonw
Copy link
Contributor Author

huonw commented Oct 6, 2022

Ah, good catch. I think virtualenv does deference any symlinks, and I've made the update.

My test: if running within a first virtualenv with the python executable a symbolic link, sys.executable is the link, but when using that virtualenv to create a second one, virtualenv resolves the symlinks (i.e. not just symlinking to first virtualenv's symlink).

# create one virtualenv
virtualenv first
. first/bin/activate

python -c 'import sys; print(sys.executable)'
# .../first/bin/python
file -h first/bin/python
# first/bin/python:       symbolic link to /Users/huon/.pyenv/versions/3.9.10/bin/python3.9

# use this virtualenv to create a second one
pip install virtualenv
virtualenv second
file -h second/bin/python
# second/bin/python:      symbolic link to /Users/huon/.pyenv/versions/3.9.10/bin/python3.9

(Mildly interestingly, use of python -m venv instead has different behaviour: the second virtualenv just symlinks to the first virtualenv's symlink. But that's not relevant here.)

@@ -404,6 +408,7 @@ function run_bootstrap_tools {
"os_name=$(uname -s)"
"arch=$(uname -m)"
"python_path=${python}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we omit python_path from the key now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, now I'm wondering if realpath is actually a mistake - possibly we do want different virtualenvs that symlink to the same interpreter to have different cache keys?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the realpath is required to ensure this key works to reliably stop issues like pantsbuild/actions#5. That is: the cached directory includes a symlink to the system, so the key should change if the literal target of the symlink changes, and since virtualenv seems to flatten symlinks (rather than having symlinks to symlinks), the key should too.

As you say, maybe different virtualenvs that happen to link to the same interpreter should end up with different keys, and I think python_path may capture that. That said, I do also wonder if python_path is no longer necessary. 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess even the same venv could change over time, if someone gets really fiddly. There is no easy way to truly fingerprint a python interpreter's full state. But yeah, let's leave python_path in as an extra bulwark.

@benjyw benjyw merged commit 33124c3 into pantsbuild:gh-pages Oct 7, 2022
@huonw huonw deleted the bugfix/python-path-key branch October 7, 2022 04:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants