-
-
Notifications
You must be signed in to change notification settings - Fork 677
Description
Description of the feature request:
Add options on one or more of the py_library rule itself, the python toolchain configuration, or the bazel command line, to generate python bytecode files as part of the build.
In addition to the basic on/off option, it might be useful to include options to
- Control the "optimization" level (for what that's worth)
- Leave the source
.pyfile out of the build outputs - the.pyc-only use case.
What underlying problem are you trying to solve with this feature?
When python loads a module, it parses the source code to an AST, which it then attempts to cache in bytecode format. Building those as part of the build process solves a few issues:
-
To quote the documentation for the py_compile module:
Though not often needed, this function can be useful when installing modules for shared use, especially if some of the users may not have permission to write the byte-code cache files in the directory containing the source code.
Particularly in the case where bazel is being used to build a tarball of code that includes (but may not be limited to) python, and might then be deployed somewhere that's read-only to most users, it would be useful to be able to include these precompiled bytecode files.
-
The attempt to compile the bytecode files would fail on syntactically-invalid python code, which is probably a good thing for catching failures earlier on in the build process.
-
Having
.pycfiles available improves application startup time. Especially for large python codebases, if some module is transitively imported from thousands of unit tests, currently each of those tests would end up re-parsing the python source file, which is a waste of time. Having the.pycfiles is also helpful for improving startup times for "serverless" platforms. -
The
.pycfiles can be substantially smaller than the source files. For situations where application distribution size is important, e.g. "serverless" platforms, this can matter. -
Some people place value on the marginal degree of obfuscation and tamper resistance offered by
.pyc-only distributions. While reverse-engineering the source from a.pycfile isn't hard, it's also not nothing.
Which operating system are you running Bazel on?
Linux
What is the output of bazel info release?
release 5.3.0
If bazel info release returns development version or (@non-git), tell us how you built Bazel.
No response
What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?
No response
Have you found anything relevant by searching the web?
No response
Any other information, logs, or outputs that you want to share?
References/notes:
https://docs.python.org/3/library/py_compile.html
The default compilation mode is TIMESTAMP; this is probably a bad idea. In a bazel build we'd probably want to use UNCHECKED_HASH. Would also need to ensure that the embedded path name was appropriately relative.
From PEP-3147:
Python will still support pyc-only distributions, however it will only do so when the pyc file lives in the directory where the py file would have been, i.e. not in the pycache directory. pyc file outside of pycache will only be imported if the py source file is missing.
This means that in the case where the .py file is still being included, the output path would need to depend on the python interpreter version. This probably would require an attribute to be added to py_runtime for that purpose.