Add experimental SIG Build RBE + toolchain configuration #67
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I'm experimenting with a toolchain based specifically on the SIG Build
dockerfiles. Since they were created to match these pre-existing
toolchains, their configuration is actually the same for the most part,
except I've ripped out all of the conditional options to instead specify
env parameters explicitly.
The platform is called "sigbuild-r2.9". My line of thinking is:
internally, the DevInfra team only uses one RBE configuration at a time,
which is currently based on Python 3.9. tensorflow/toolchains must
retain previous toolchain configurations (not RBE configurations) for
future patch releases of TensorFlow. Therefore, much like the way the
containers are always tagged with the upcoming release version, I figure
we can continuously update the latest+1 toolchain with however we're
building at master HEAD, and once the branch is cut, create a new config
and use it in the dockerfiles instead.
For instance, the next version of TF is 2.9. The dockerfiles therefore
reference the "@sigbuild-r2.9" toolchain, which is configured here to
use CUDA 11.2. Whenever the containers are updated, they get pushed to
the "latest" and "2.9" tags. After the "next" version of TF changes to
2.10 when the branch is cut, we'd create a new "@sigbuild-r2.10"
toolchain here (with, say, CUDA 11.4, but it doesn't need to change) or
promote an ongoing testing toolchain, and change the toolchain
referenced in the dockerfiles. That way, the old 2.9 containers -- which
are now static -- correctly reference the 2.9 toolchain, which remains
available.
That's the idea, anyway; I need to do more testing to see if this will
work out correctly.