Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
253 changes: 253 additions & 0 deletions docs/2023-eb-eessi-uk-workshop/easybuild-terminology.md
Original file line number Diff line number Diff line change
@@ -1 +1,254 @@
# EasyBuild Terminology

Over the years, we have come up with some terminology specific to EasyBuild
to refer to particular components, which we use alongside established terminology relevant to the context
of building and installing software on HPC systems.

It is important to be familiar with these terms, so we will briefly cover them one by one.

---

### Framework

The EasyBuild *framework* consists of a set of Python modules organised in packages (``easybuild.framework``,
``easybuild.toolchains``, ``easybuild.tools``, etc.) that collectively form **the core of EasyBuild**,
and is developed in the [``easybuild-framework`` repository on GitHub](https://github.com/easybuilders/easybuild-framework).

It implements the **common functionality that you need when building software from source**,
and provides functions for unpacking source files, applying patch files, collecting the output produced
by shell commands that are being run and checking their exit code, generating environment module files, etc.

The EasyBuild framework does *not* implement any specific installation procedure, it only provides
the necessary functionality to facilitate this.


---

### Easyblocks

An *easyblock* is **a Python module that implements a specific software installation procedure**,
and can be viewed as a plugin to the [EasyBuild framework](#framework).

Easyblocks can be either *generic* or *software-specific*.

A **generic easyblock** implements an installation procedure that can be used for
multiple different software packages. Commonly used examples include the ``ConfigureMake`` easyblock
which implements the ubiquitous ``configure``/``make``/``make install`` procedure, and the
``PythonPackage`` easyblock that can be used to install a Python package.

A **software-specific easyblock** implements an installation procedure that is specific to a particular software package.
Infamous examples include the easyblocks we have for ``GCC``, ``OpenFOAM``, ``TensorFlow``, ``WRF``, ...

The installation procedure performed by an easyblock can be controlled by defining
[**easyconfig parameters**](#easyconfig-parameters) in an [easyconfig file](#easyconfig-files).

A collection of (generic and software-specific) easyblocks is developed by the EasyBuild community
in the [``easybuild-easyblocks`` repository on GitHub](https://github.com/easybuilders/easybuild-easyblocks).

---

### Easyconfig parameters

An **easyconfig parameter** specifies a particular aspect of a software installation that should be performed by
EasyBuild.

Some easyconfig parameters are **mandatory**. The following parameters *must* be defined in *every* easyconfig file:

* ``name`` and ``version``, which specify the name and version of the software to install;
* ``homepage`` and ``description``, which provide key metadata for the software;
* ``toolchain``, which specifies the compiler toolchain to use to install the software (see
``toolchains`` section);

Other easyconfig parameters are **optional**: they can be used to provide required information,
or to control specific aspects of the installation procedure performed by the easyblock.

Some commonly used optional easyconfig parameters include:

* ``easyblock``, which specifies the (generic) easyblock that EasyBuild should use for the installation;
* ``sources`` and ``source_urls``, which specify the list of source files and where to download them;
* ``dependencies`` and ``builddependencies``, which specify the list of (build) dependencies;
* ``configopts``, ``buildopts``, and ``installopts``, which specify options for the configuration/build/install commands, respectively;

If no value is specified for an optional easyconfig parameter, the corresponding default value will be used.

There are two groups of easyconfig parameters. *General* easyconfig parameters can be defined for any software
package, and (usually) control a specific aspect of the installation. *Custom* easyconfig parameters are
only supported by certain easyblocks, and only make sense for particular (types of) software.

---

### Easyconfig files

*Easyconfig files* (or *easyconfigs* for short), are **simple text files written in Python syntax
that specify what EasyBuild should install**.
Each easyconfig file defines the set of [**easyconfig parameters**](#easyconfig-parameters)
that collectively form a complete specification for a particular software installation.

The **filename** of an easyconfig file usually ends with the ``.eb`` extension.

In some contexts the filename is expected to correspond with the value of a handful of key
easyconfig parameters: ``name``, ``version``, ``toolchain``, and ``versionsuffix``. The general format for
the filename of an easyconfig file is: ``<name>-<version><toolchain><versionsuffix>.eb``,
where the toolchain part is omitted when the ``system`` toolchain is used, and the `<versionsuffix>` can be empty.

The filename of easyconfig files is particularly relevant when EasyBuild is *searching* for easyconfig files to
resolve dependencies, since it does this purely based on filenames: interpreting the contents of every (potential)
easyconfig file it encounters would be too time-consuming.

In the [``easybuild-easyconfigs`` repository on GitHub](https://github.com/easybuilders/easybuild-easyconfigs),
the EasyBuild community maintains a large (and growing) collection of easyconfig files, for a wide range of
(scientific) software.

---

### Easystack files

[**Easystack files**](https://docs.easybuild.io/easystack-files)
are a relatively new concept in EasyBuild, providing a way to define a *software stack*
that should be installed by EasyBuild.

They are written in [YAML syntax](https://yaml.org/), and include a list of *software specifications*
which correspond to a list of easyconfig files, with support for providing specific EasyBuild
configuration options for particular software packages, and including or excluding specific software
packages based on labels.

The support for using easystack files is currently marked as *experimental*, which means that the implementation is
considered to be incomplete, is subject to change in future EasyBuild releases, and may be prone to errors.

---

### Extensions

*Extensions* is the collective term used in EasyBuild for **additional software packages that can be installed
on top of another software package**. Common examples are *Perl modules*, *Python packages*, and *R libraries*.

As you can tell the common terminology here is a bit messy, so we came up with *extensions* as a unifying term.

Extensions can be installed in different ways:

* *stand-alone*, as a separate installation on top of one or more other installations;
* as a part of a *bundle* of extensions that collectively form a separate installation;
* or as an actual *extension* to a specific installation to yield a "batteries included"
type of installation (for example installing Python bindings along with a C++ library);

---

### Dependencies

A *dependency* is a common term in the context of software. It refers to **a software
package that is either strictly required by other software, or that can be leveraged to
enhance other software** (for example to support specific features).

There are three main types of software dependencies:

* a **build dependency** is only required when building/installing a software package;
once the software package is installed, it is no longer needed to *use* that software
(examples: `CMake`, `pkg-config`);
* a **run-time dependency** (often referred to simply as *dependency*) is a software package that is
required to *use* (or *run*) another software package (example: `Python`);
* a **link-time dependency** is somewhere in between a build and runtime dependency:
it is only needed when *linking* a software package; it can become either a build or runtime
dependency, depending on exactly how the software is installed (example: `OpenBLAS`);

The distinction between link-time and run-time dependencies is mostly irrelevant for this tutorial,
but we will discriminate between build and run-time dependencies.

---

### Toolchains

A *compiler toolchain* (or just *toolchain* for short) is a **set of [compilers](https://en.wikipedia.org/wiki/Compiler)**,
which are used to build software from source, together with a set of **additional libraries** that provide further core functionality.

We refer to the different parts of a toolchain as **toolchain components**.

The *compiler component* of a toolchain typically consists of [C](https://en.wikipedia.org/wiki/C_(programming_language)),
[C++](https://en.wikipedia.org/wiki/C%2B%2B), and [Fortran](https://en.wikipedia.org/wiki/Fortran)
compilers in the context of HPC, but additional compilers (for example,
a [CUDA](https://developer.nvidia.com/cuda-zone) compiler for
[GPGPU](https://en.wikipedia.org/wiki/General-purpose_computing_on_graphics_processing_units) software)
can also be included.

Additional toolchain components are usually special-purpose libraries:

* an MPI library to support distributed computations (for example, [Open MPI](https://www.open-mpi.org/));
* libraries providing efficient linear algebra routines ([BLAS](http://performance.netlib.org/blas/),
[LAPACK](http://performance.netlib.org/lapack/));
* a library supporting computing Fast Fourier Transformations (for example, [FFTW](http://fftw.org/));

A toolchain that includes all of these libraries is referred to as a **full toolchain**, while
a **subtoolchain** is a toolchain that is missing one or more of these libraries.
A **compiler-only toolchain** only consists of compilers (no additional libraries).

#### System toolchain

The **`system` toolchain** is a special case which corresponds to using the compilers and libraries
*provided by the operating system*, rather than using toolchain components that were installed using EasyBuild.

It is used sparingly, mostly to install software where no actual compilation is done,
or to build a set of toolchain compilers and its dependencies, since the versions of the system compilers
and libraries are beyond the control of EasyBuild, which could affect the reproducibility of the installation.

#### Common toolchains

The `foss` and `intel` toolchains are also known as the *common toolchains*,
because they are widely adopted by the EasyBuild community.

The `foss` toolchain consists of all open source components (hence the name:
"FOSS" stands for Free & Open Source Software): [GCC](https://gcc.gnu.org/), [Open MPI](https://www.open-mpi.org/), [FlexiBLAS](https://gitlab.mpi-magdeburg.mpg.de/software/flexiblas-release) with
[OpenBLAS](https://www.openblas.net/) as default backend,
[ScaLAPACK](https://www.netlib.org/scalapack/) and [FFTW](http://fftw.org/).

The `intel` toolchain consists of the [Intel C, C++ and Fortran compilers](https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler.html) (on top of a `GCC` version
controlled through EasyBuild) alongside the [Intel MPI](https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library.html) and [Intel MKL](https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html) libraries.

Roughly every 6 months a new version of these common toolchains is agreed upon
in the EasyBuild community, after extensive testing.

More information on these toolchains is available [in the EasyBuild documentation](https://docs.easybuild.io/common-toolchains).

---

### Modules

*Module* is a massively overloaded term in (scientific) software and IT in general
(kernel modules, Python modules, and so on).
In the context of EasyBuild, the term 'module' usually refers to an **environment module (file)**.

[Environment modules](https://en.wikipedia.org/wiki/Environment_Modules_(software)) is a well established concept
on HPC systems: it is a way to specify changes that should be made to one or more
[environment variables](https://en.wikipedia.org/wiki/Environment_variable) in a
[shell](https://en.wikipedia.org/wiki/Shell_(computing))-agnostic way.
*Environment module files* are written in either [Tcl](https://en.wikipedia.org/wiki/Tcl) or
[Lua](https://en.wikipedia.org/wiki/Lua_(programming_language)) syntax,
and specify which *environment variables* should be updated, and how (append,
prepend, set, unset, redefine, etc.) upon *loading* the environment module.
*Unloading* an environment module will restore the shell environment to its previous state,
by reverting the changes that were made to the environment when that environment module was loaded.

Environment module files are processed via a **modules tool**, of which there
are several conceptually similar yet slightly different implementations.
The most commonly used ones are the Tcl-based [Environment Modules](https://sourceforge.net/projects/modules/)
implementation, and [Lmod](https://lmod.readthedocs.io), a more recent (and more popular) Lua-based implementation
(which also supports module files written in Tcl syntax).

**Environment module files are automatically generated for each software installation** by EasyBuild,
and *loading* a module results in changes being made to the environment of the current shell
session such that the corresponding software installation can be used.

---

### Bringing it all together

The EasyBuild **framework** leverages **easyblocks** to automatically build and install
(scientific) software, potentially including additional **extensions**, using a particular compiler **toolchain**,
as specified in **easyconfig files** which each define a set of **easyconfig parameters**.

EasyBuild ensures that the specified **(build) dependencies** are in place,
and automatically generates a set of **(environment) modules** that facilitate access to the installed software.

An **easystack file** can be used to specify a collection of software to install with EasyBuild.

---

[*next: Installing EasyBuild*](easybuild-installation.md) - [*(back to overview page)*](index.md)