Software package management: modules
On most HPC clusters, software package are not accessible directly. You need to load module in order to gain access to certain software.
Motivation for modules
In order to explain why modules are needed on HPC cluster, we need to explain
how the executables for the commands we type in the terminal are found.
Obviously, the shell cannot do a search on the entire filesystem in order to
find an executable with a name corresponding to the command, it would take too
much time. Instead, only a handful of directories are searched. The search paths
for the executables are stored in the PATH
environment variable. By default
the only path present in the search path is the one corresponding to the
directories where most of the basic commands executables are located. We can see
the content of this variable with the command:
The paths to search are a list of directories separated by a :
. Adding a path
to a directory at the front or at the end of this list will make the shell
search for executables in this directory:
The directories are searched in the order they appear in the list and the shell stop the search as soon as a matching executable is found.
Now, consider an HPC clusters, shared by a large number of users, each with
their specific needs. One user may use an old version of a software package
while another needs a newer version. If in the PATH
variable, we have
then, as the executables name for these two versions are the same, the user that needs the old version will end up using the new one. It might not be what this user wants as he/she might be relying on a particular feature that was removed from the newer version. Also, for reproducibility reason researcher tends to stick to one particular version for an entire research project.
A solution for the user that needs the old version might be to invoke the executable using the full path:
but this solution is not really user-friendly as it requires to type quite a long command and remember where the executable is located.
Another problem is that, nowadays, most executable binaries are not statically
linked, i.e., they load libraries at runtime. This means we have to make sure
that the executable we want to use have access to the libraries on which it
depends. In the same way as for an executable, libraries search paths are
defined with an environment variable: LD_LIBRARY_PATH
. The consequence is that
id we want to use a particular version of a software package we have to set
- in
PATH
, the path to the directory where our executable is located. - in
LD_LIBRARY_PATH
, all the paths to the directories where the libraries that are dependencies of our executable are located. In addition, we need to make sure that the dependencies of the dependencies are also available. For an executable with a complex dependencies structure, it can quickly become Unmanageable.
The problems presented above highlight the reason why environment modules were created. Modules allow for the dynamic modification of a user environment. With modules, a user can get access to software and switch between versions with ease, letting the module system take care of the search paths and of the environment in general.
Finding modules
All available modules can be listed using the module av
command which will, on
most HPC clusters produce quite a long list. You can navigate this list using
the Up and Down keys or exiting the pager mode by pressing Q.
You can perform a more narrow search for available module by adding a keyword to search in the command. For example, to find modules related to Python:
$ module av Python
----------------------------- Releases (2021b) ------------------------------
IPython/7.26.0-GCCcore-11.2.0
KAT/2.4.2-foss-2021b-Python-3.9.6
Meep/1.22.0-foss-2021b-Python-3.9.6
Python/3.9.6-GCCcore-11.2.0-bare
Python/3.9.6-GCCcore-11.2.0 (D)
flatbuffers-python/2.0-GCCcore-11.2.0
pkgconfig/1.5.5-GCCcore-11.2.0-python
protobuf-python/3.17.3-GCCcore-11.2.0
The result is a list of all module that have Python
in their name as well as
two version of Python it self: 3.9.6-GCCcore-11.2.0-bare
and
3.9.6-GCCcore-11.2.0
. The first is a Python with quite a few packages
installed while the second is a minimal installation with no extra packages.
When multiple module have the same name, the default module will be marked with
a D
. In the case of the Python module, 3.9.6-GCCcore-11.2.0
is the default
Module naming scheme
Modules on NIC5 uses the following naming scheme:
where
PACKAGE_NAME
: the name of the software packagePACKAGE_VERSION
: the version of the software packageTOOLCHAIN_NAME
: the name of the toolchain (compiler) used to compile the packageTOOLCHAIN_VERSION
: the version of the toolchain used to compile the package
Sometimes, like the Python example, variant of a same package can be installed
and a -SUFFIX
is added at the end of the name (-bare
in the Python example).
This naming scheme originates from the tool we use to install most of the software on NIC5 (EasyBuild) but highlight a very important fact: most software on an HPC system are installed from source. The main reason is that in order get maximum performance, the software needs to be compiled with optimizations specific to the CPUs of NIC5.
Loading modules
Continuing with our Python example, we will now discuss how to load a module.
Right after we login to NIC5, if we check the Python version, we can see that
we have version 3.6.9
and that this Python is installed in /usr/bin
.
module load
command.
where PACKAGE_NAME
is the name of the software package we want to load.
Knowing that, we can load the Python module with the following command
Then, if we run the same commands as before to determine the Python version and where it is installed, we get
$ python --version
Python 3.9.6
$ which python
/opt/cecisw/arch/easybuild/2021b/software/Python/3.9.6-GCCcore-11.2.0/bin/python
We can see that we now have version 3.9.6
and that is is installed in a
completly different location.
In our example, we did not specify the version of the module we wanted to load.
As a result, the module default module has been loaded
(Python/3.9.6-GCCcore-11.2.0
). If we want the "bare" variant of this module,
which is not the default, we need the explicitly provide the version when
loading the module.
Listing loaded modules
Listing loaded module is done using the module list
command. For example, if
we continue the previous section example when we loaded the python module
$ module list
Currently Loaded Modules:
1) tis/2018.01 (S) 9) libreadline/8.1-GCCcore-11.2.0
2) releases/2021b (S) 10) Tcl/8.6.11-GCCcore-11.2.0
3) StdEnv (H) 11) SQLite/3.36-GCCcore-11.2.0
4) GCCcore/11.2.0 12) XZ/5.2.5-GCCcore-11.2.0
5) zlib/1.2.11-GCCcore-11.2.0 13) GMP/6.2.1-GCCcore-11.2.0
6) binutils/2.37-GCCcore-11.2.0 14) libffi/3.4.2-GCCcore-11.2.0
7) bzip2/1.0.8-GCCcore-11.2.0 15) OpenSSL/1.1
8) ncurses/6.2-GCCcore-11.2.0 16) Python/3.9.6-GCCcore-11.2.0
The first three modules in the list are modules that are loaded by default when you log in. All the other modules result from loading the Python module. As we can see, we did load more modules than just the Python module itself. These additional modules are dependencies, i.e., packages needed by Python at run time.
Unloading modules
To remove the Python module of our environment, we can use the module unload
command
Then, if we check the effect of the command by listing the currently loaded modules
$ module list
Currently Loaded Modules:
1) tis/2018.01 (S) 9) libreadline/8.1-GCCcore-11.2.0
2) releases/2021b (S) 10) Tcl/8.6.11-GCCcore-11.2.0
3) StdEnv (H) 11) SQLite/3.36-GCCcore-11.2.0
4) GCCcore/11.2.0 12) XZ/5.2.5-GCCcore-11.2.0
5) zlib/1.2.11-GCCcore-11.2.0 13) GMP/6.2.1-GCCcore-11.2.0
6) binutils/2.37-GCCcore-11.2.0 14) libffi/3.4.2-GCCcore-11.2.0
7) bzip2/1.0.8-GCCcore-11.2.0 15) OpenSSL/1.1
8) ncurses/6.2-GCCcore-11.2.0
we see that, indeed, the Python module is not loaded in the environment but its dependencies are still loaded. This is by design. While the tool we use to install software on NIC5 allows the generation of modules that unload their dependencies, it might have undesired side effects. If two modules have the same dependencies, unloading the first module will lead to the dependency module to be unloaded and possibly break the functionality of the second module.
To remove all loaded modules, we can use the module purge
command.
$ module purge
The following modules were not unloaded:
(Use "module --force purge" to unload all):
1) releases/2021b 2) tis/2018.01
The module system informed us that two modules were not unloaded. These two
modules are "sticky" modules, i.e., modules that should always be present in
the environment. You can force the unloading of these modules using the
module --force purge
command but it's not recommended.
Summary
Command | Description |
---|---|
module av |
List all available modules |
module av PACKAGE_NAME |
List all modules with name matching PACKAGE_NAME |
module load PACKAGE_NAME |
Load the a module with name PACKAGE_NAME |
module unload PACKAGE_NAME |
Unload the a module with name PACKAGE_NAME |
module list |
List loaded modules |
module purge |
Unload all loaded modules |