Skip to content

Fall back to most recent known osx version for bootstrap binaries#6681

Merged
cosmicexplorer merged 5 commits intopantsbuild:masterfrom
cosmicexplorer:bootstrap-macos-binary-download-workaround
Oct 28, 2018
Merged

Fall back to most recent known osx version for bootstrap binaries#6681
cosmicexplorer merged 5 commits intopantsbuild:masterfrom
cosmicexplorer:bootstrap-macos-binary-download-workaround

Conversation

@cosmicexplorer
Copy link
Contributor

@cosmicexplorer cosmicexplorer commented Oct 25, 2018

Problem

See the end of #6591 -- while we fall back to the most recent OSX version to download binaries for use with e.g. NativeTool, this fix does not apply for the binaries we fetch in shell scripts to bootstrap the pants rust code, leading to the following on OSX Mojave (10.14) in the pants repo:

Downloading https://binaries.pantsbuild.org/bin/cmake/mac/10.14/3.9.5/cmake.tar.gz ...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (22) The requested URL returned error: 404

Solution

  • Append a main() method at the bottom of binary_util.py which can be invoked in the bootstrap phase by download_binary.sh.

Result

I can run ./pants goals on my OSX 10.14 laptop in the pants repo after an rm -rf ~/.cache/pants.

@cosmicexplorer
Copy link
Contributor Author

There's an open question of how we test this fix (and anything around the rust bootstrap scripts) in CI.

@jsirois
Copy link
Contributor

jsirois commented Oct 25, 2018

There's an open question of how we test this fix (and anything around the rust bootstrap scripts) in CI.

It's crude, but nuke the appropriate Travis caches, which include the Rust bootstrapping cache dirs.

Copy link
Contributor

@jsirois jsirois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is too much code in the heredoc, should be lifted into python source code and possibly tested, but this quick POC works and leverages all existing BinaryUtil tests:

diff --git a/build-support/bin/download_binary.sh b/build-support/bin/download_binary.sh
index 268b29113..c226ec7c2 100755
--- a/build-support/bin/download_binary.sh
+++ b/build-support/bin/download_binary.sh
@@ -1,4 +1,4 @@
-#!/bin/bash -eu
+#!/bin/bash -e
 
 # Downloads the specified binary at the specified version from the specified binary host if it's not already present.
 # Uses a reimplementation of the BinaryUtils mechanism.
@@ -11,40 +11,42 @@ REPO_ROOT=$(cd $(dirname "${BASH_SOURCE[0]}") && cd ../.. && pwd -P)
 
 # Defines:
 # + CACHE_ROOT: The pants cache directory, ie: ~/.cache/pants.
-source "${REPO_ROOT}/build-support/common.sh"
+source "${REPO_ROOT}/build-support/pants_venv"
 
-if (( $# != 3 && $# != 4 )); then
+if (( $# != 2 && $# != 3 )); then
   die "$(cat << USAGE
 Usage: $0 host util_name version [filename]
 Example: $0 binaries.pantsbuild.org go 1.7.3 go.tar.gz
 USAGE
 )"
 fi
-readonly host="$1"
-readonly util_name="$2"
-readonly version="$3"
-readonly filename="${4:-${util_name}}"
-
-os=$("${REPO_ROOT}/build-support/bin/get_os.sh")
-readonly path="bin/${util_name}/${os}/${version}/${filename}"
-readonly cache_path="${CACHE_ROOT}/${path}"
-
-if [[ ! -f "${cache_path}" ]]; then
-  mkdir -p "$(dirname "${cache_path}")"
-
-  readonly binary_url="https://${host}/${path}"
-  echo >&2 "Downloading ${binary_url} ..."
-  curl --fail "${binary_url}" > "${cache_path}".tmp
-  mv "${cache_path}"{.tmp,}
-  if [[ "${filename}" == *tar.gz ]]; then
-    tar -C "$(dirname "${cache_path}")" -xzf "${cache_path}"
-  else
-    chmod 0755 "${cache_path}"
-  fi
-fi
-
-to_output="${cache_path}"
-if [[ "${filename}" == *tar.gz ]]; then
-  to_output="$(dirname "${to_output}")"
-fi
-echo "${to_output}"
+readonly util_name="$1"
+readonly version="$2"
+readonly filename="${3:-${util_name}}"
+
+activate_pants_venv 1>&2
+PYTHONPATH="${REPO_ROOT}/src/python" python << SCRIPT
+import os
+from functools import reduce
+
+from pants.binaries.binary_util import BinaryUtil
+from pants.option.global_options import GlobalOptionsRegistrar
+from pants.option.options_bootstrapper import OptionsBootstrapper
+from pants.subsystem.subsystem import Subsystem
+
+
+options_bootstrapper = OptionsBootstrapper()
+subsystems = (GlobalOptionsRegistrar, BinaryUtil.Factory)
+known_scope_infos = reduce(set.union, (ss.known_scope_infos() for ss in subsystems), set())
+options = options_bootstrapper.get_full_options(known_scope_infos)
+for subsystem in subsystems:
+  subsystem.register_options_on_scope(options)
+Subsystem.set_options(options)
+
+binary_util = BinaryUtil.Factory.create()
+path = binary_util.select_binary(supportdir='bin/${util_name}',
+                                 version='${version}',
+                                 name='${filename}')
+
+print(os.path.dirname(path) if path.endswith('.tar.gz') else path)
+SCRIPT
diff --git a/build-support/bin/native/cargo.sh b/build-support/bin/native/cargo.sh
index 2bf4dcfb5..69bd03493 100755
--- a/build-support/bin/native/cargo.sh
+++ b/build-support/bin/native/cargo.sh
@@ -16,11 +16,11 @@ download_binary="${REPO_ROOT}/build-support/bin/download_binary.sh"
 
 # The following is needed by grpcio-sys and we have no better way to hook its build.rs than this;
 # ie: wrapping cargo.
-cmakeroot="$("${download_binary}" "binaries.pantsbuild.org" "cmake" "3.9.5" "cmake.tar.gz")"
-goroot="$("${download_binary}" "binaries.pantsbuild.org" "go" "1.7.3" "go.tar.gz")/go"
+cmakeroot="$("${download_binary}" "cmake" "3.9.5" "cmake.tar.gz")"
+goroot="$("${download_binary}" "go" "1.7.3" "go.tar.gz")/go"
 
 # Code generation in the bazel_protos crate needs to be able to find protoc on the PATH.
-protoc="$("${download_binary}" "binaries.pantsbuild.org" "protobuf" "3.4.1" "protoc")"
+protoc="$("${download_binary}" "protobuf" "3.4.1" "protoc")"
 
 export GOROOT="${goroot}"
 export PATH="${cmakeroot}/bin:${goroot}/bin:${CARGO_HOME}/bin:$(dirname "${protoc}"):${PATH}"

@cosmicexplorer
Copy link
Contributor Author

Oh now this is much much better. I'll drop the python into a separate file and look at possible testing.

Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for fixing this 🙌


# Use this version as a fallback for platform-specific binaries to download if we are on a version
# of OSX that we don't explicitly have binaries for yet.
# TODO: Keep this up to date automatically somehow!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably link to a GitHub issue here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! This is obviated with the new python approach but this is correct.

readonly binary_url="https://${host}/${path}"
echo >&2 "Downloading ${binary_url} ..."
curl --fail "${binary_url}" > "${cache_path}".tmp
readonly output_tmp_file="${cache_path}.tmp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah didn't know about readonly #FunctionalBashScripts 😎

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some downsides which actually make it incredibly hard to use readonly correctly because bash is a horrible piece of software -- see build-support/bin/check_shell.sh, which runs in the pre-commit hook. zsh has never had any of these issues.


curl_output_file_with_fail "${binary_url}" "$output_tmp_file" \
2> >(tee >&2 "$curl_err_tmp_file") \
|| if grep -q -F 'The requested URL returned error: 404' "$curl_err_tmp_file" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do -q and -F have long flags?

Copy link
Contributor Author

@cosmicexplorer cosmicexplorer Oct 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do (I think --quiet/--silent and --fixed-strings?). They should absolutely have been used here, as in any script in this repo.

|| if grep -q -F 'The requested URL returned error: 404' "$curl_err_tmp_file" \
&& [[ -n "$maybe_alternate_os" ]]; then
echo >&2 "Failed initial curl, trying an alternate url..."
alternate_intermediate_path="$(make_download_path_for_os "$maybe_alternate_os")"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also be readonly?

Suggested change
alternate_intermediate_path="$(make_download_path_for_os "$maybe_alternate_os")"
readonly alternate_intermediate_path="$(make_download_path_for_os "$maybe_alternate_os")"

Copy link
Contributor Author

@cosmicexplorer cosmicexplorer Oct 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the things check_shell.sh disallows and leads to very strange correctness issues if you do. Like I said, bash is a horrible piece of software because it doesn't care about bugs like this (and there are 1300 versions in play at once).

path="$(make_download_path_for_os "$("${REPO_ROOT}/build-support/bin/get_os.sh")")"
readonly cache_path="${CACHE_ROOT}/${path}"

maybe_alternate_os="$([[ "$(uname)" == 'Darwin' ]] \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function only adds compatibility for macOS, right? It wouldn't help with Windows or some Linux distro? If so, consider renaming to something like maybe_macos to be more explicit.

Copy link
Contributor Author

@cosmicexplorer cosmicexplorer Oct 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is absolutely a correct comment. In the commit I just pushed, we use BinaryUtil to avoid having to duplicate this regardless, but this is definitely an edit I would have made.

@cosmicexplorer
Copy link
Contributor Author

@Eric-Arellano I'll reply in full to your review because bash is strange and nasty but will be pushing a commit which uses @jsirois's approach in his earlier review which delegates to BinaryUtil, obviating some of the issues you pointed out.

Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This python code makes me much happier haha 🐍

path = binary_util.select(binary_request)

print(path)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be here, or leftover from debugging? Maybe print a message before the path to explain what it’s for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is invoked by a shell script to read the output! There could be a comment here -- I will add that to make it clear.



def main():
util_name, version, filename = tuple(sys.argv[1:])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argparse is much more robust and has the benefit of allowing —help. It’s fine for what we have now, but will make it harder to do things like extend this script down the road

Have you worked with it before? If not, I can give the code to get it working! (I’m on phone atm)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have used it once years ago and not long after I ended up fixing the coffeescript argument parsing because I had been thinking about argument parsing so much: jashkenas/coffeescript#3946. I would love the code to do so -- that feels like it's probably half of the point of making this a python script. No rush to do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talk to your team-mates, in particular Yi: Fire.fire(a_method_with_the_obvious_signature)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argparse is one of my favorite parts of Python. It's so easy to make high quality scripts with it.

Below is a demo of how it works so that you can see things like optional arguments, which it looks like we don't actually use. Should be easy to convert to this actual program, which I'm happy to help with also.

import argparse

def create_parser():
  parser = argparse.create_parser(description="My script now has fancy CLI options!")
  parser.add_argument("util_name", help="...")
  parser.add_argument("-o", "--optional", help="...")
  return parser

def main():
  args = create_parser().parse_args()
  supportdir = 'bin/{}'.format(args.util_name)

If you don't like this line args = create_parser().parse_args(), you can have create_parser() instead be parse_args() and simply return the args. I only set it up this way to make create_parse() a pure function & keep side effects in main().

Copy link
Contributor

@jsirois jsirois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If bootstrap_binary_util.py was housed under src/python/pants/binaries/ - perhaps just at the bottom of binary_util.py - actually testing the script would be that much easier.



def main():
util_name, version, filename = tuple(sys.argv[1:])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talk to your team-mates, in particular Yi: Fire.fire(a_method_with_the_obvious_signature)

# BinaryRequest requires the `name` field to be provided without an extension, as it appends the
# archiver's extension if one is provided, so we have to remove it here.
filename = re.sub(
r'\.{}\Z'.format(re.escape(archiver_for_current_binary.extension)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to look up \Z, which was fun, I did not know $ had a synonym or vice-versa - but, if you got an acrhiver back, you know you can just lop off the filename verbatim - perhaps filename[:-(len(extension)+1)] or filename.rsplit('.' + extension, 1)[0]?

Copy link
Contributor Author

@cosmicexplorer cosmicexplorer Oct 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaults for multiline modes differ across languages and other languages such as emacs lisp have even more customized extensions for e.g. symbol boundaries so it's easier for me to be explicit than to wonder whether a stray $ is going to slurp a whole file in, or worse, make repeated regex applications start at increasingly more incorrect positions (realizing now that's also more commonly performed in elisp than elsewhere).

This particular regex replace, however, was definitely a 3-4 second solution -- either of what you suggested seem more appropriate and I will make that fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t fully follow this conversation, but if you go with regex consider a very brief comment explaining what Z means

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry -- "being explicit" refers to using \Z compulsively instead of the more recognizable $ (although if we wish to be truly pedantic, filenames can have newlines, so this should really also have a TODO to replace all such filename manipulations with methods in util or something so we can contain the pedantry).

@cosmicexplorer
Copy link
Contributor Author

Moved the main method to the bottom of binary_util.py, then added an integration test.

Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argparse code looks great! I like the nargs part. If you haven’t already, run the script with —help 😎

@cosmicexplorer
Copy link
Contributor Author

I like the nargs part.

I would take credit but it's straight from https://docs.python.org/3/library/argparse.html#usage! Love python.

@cosmicexplorer cosmicexplorer merged commit f2e335a into pantsbuild:master Oct 28, 2018
@illicitonion illicitonion mentioned this pull request Nov 2, 2018
cosmicexplorer added a commit that referenced this pull request Nov 16, 2018
)

### Problem

See the end of #6591 -- while we fall back to the most recent OSX version to download binaries for use with e.g. NativeTool, this fix does not apply for the binaries we fetch in shell scripts to bootstrap the pants rust code, leading to the following on OSX Mojave (10.14) in the pants repo:

```
Downloading https://binaries.pantsbuild.org/bin/cmake/mac/10.14/3.9.5/cmake.tar.gz ...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (22) The requested URL returned error: 404
```

### Solution

- Append a `main()` method at the bottom of `binary_util.py` which can be invoked in the bootstrap phase by `download_binary.sh`.

### Result

I can run `./pants goals` on my OSX 10.14 laptop in the pants repo after an `rm -rf ~/.cache/pants`.
@cosmicexplorer cosmicexplorer changed the title fall back to most recent known osx version for bootstrap binaries Fall back to most recent known osx version for bootstrap binaries Nov 16, 2018
@benjyw
Copy link
Contributor

benjyw commented Dec 29, 2018

See #7009 for issue arising out of this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants