Skip to content

rules/python: Option to generate .pyc files for py_library. #1761

@adam-azarchs

Description

@adam-azarchs

Description of the feature request:

Add options on one or more of the py_library rule itself, the python toolchain configuration, or the bazel command line, to generate python bytecode files as part of the build.

In addition to the basic on/off option, it might be useful to include options to

  1. Control the "optimization" level (for what that's worth)
  2. Leave the source .py file out of the build outputs - the .pyc-only use case.

What underlying problem are you trying to solve with this feature?

When python loads a module, it parses the source code to an AST, which it then attempts to cache in bytecode format. Building those as part of the build process solves a few issues:

  1. To quote the documentation for the py_compile module:

    Though not often needed, this function can be useful when installing modules for shared use, especially if some of the users may not have permission to write the byte-code cache files in the directory containing the source code.

    Particularly in the case where bazel is being used to build a tarball of code that includes (but may not be limited to) python, and might then be deployed somewhere that's read-only to most users, it would be useful to be able to include these precompiled bytecode files.

  2. The attempt to compile the bytecode files would fail on syntactically-invalid python code, which is probably a good thing for catching failures earlier on in the build process.

  3. Having .pyc files available improves application startup time. Especially for large python codebases, if some module is transitively imported from thousands of unit tests, currently each of those tests would end up re-parsing the python source file, which is a waste of time. Having the .pyc files is also helpful for improving startup times for "serverless" platforms.

  4. The .pyc files can be substantially smaller than the source files. For situations where application distribution size is important, e.g. "serverless" platforms, this can matter.

  5. Some people place value on the marginal degree of obfuscation and tamper resistance offered by .pyc-only distributions. While reverse-engineering the source from a .pyc file isn't hard, it's also not nothing.

Which operating system are you running Bazel on?

Linux

What is the output of bazel info release?

release 5.3.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

References/notes:
https://docs.python.org/3/library/py_compile.html
The default compilation mode is TIMESTAMP; this is probably a bad idea. In a bazel build we'd probably want to use UNCHECKED_HASH. Would also need to ensure that the embedded path name was appropriately relative.

From PEP-3147:

Python will still support pyc-only distributions, however it will only do so when the pyc file lives in the directory where the py file would have been, i.e. not in the pycache directory. pyc file outside of pycache will only be imported if the py source file is missing.

This means that in the case where the .py file is still being included, the output path would need to depend on the python interpreter version. This probably would require an attribute to be added to py_runtime for that purpose.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions