Your first MPI application
The MPI Hello World
A good way to begin exploring MPI is through the classic Hello World program.
#include <mpi.h>
#include <stdio.h>
int main(int argc, char** argv) {
MPI_Init(&argc, &argv);
int num_ranks;
MPI_Comm_size(MPI_COMM_WORLD, &num_ranks);
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processor_name, &name_len);
printf("Hello world from processor %s,"
"rank %d out of %d ranks\n",
processor_name, rank, num_ranks);
MPI_Finalize();
return 0;
}
The program illustrates the essential structure of every MPI application. The
program begins by including the MPI header file and then calling MPI_Init,
which initializes the MPI execution environment. This initialization step must
occur before any other MPI function is used: it sets up the infrastructure that
allows multiple processes to communicate and coordinate.
Once MPI is initialized, each process in the program can discover two key pieces
of information: the total number of processes that are participating in the
computation, and its own unique identifier, or rank, among them. These values
are retrieved using MPI_Comm_size and MPI_Comm_rank, respectively. Both
functions operate within a communicator, which is an abstraction representing a
group of processes that can communicate with one another. The communicator
MPI_COMM_WORLD is provided by default and encompasses all processes in the
program.
Each process then retrieves the name of the processor or compute node on which
it is running by calling MPI_Get_processor_name. This information is often
useful in parallel systems where processes may be distributed across many nodes
in a cluster. Finally, the program prints a message indicating which processor
and rank it represents and then calls MPI_Finalize to terminate the MPI
environment. After this point, no further MPI routines may be invoked.
MPI initialization
Every MPI program must begin by calling MPI_Init. This function sets up the
internal MPI environment, allocates necessary resources, and prepares the system
for communication among processes. It must be the first MPI routine called in a
program, and no other MPI calls may occur before it.
In C and C++, MPI_Init takes two arguments, the argc and argv parameters from
the main function. Although most MPI implementations ignore these arguments,
they are included to maintain compatibility with command-line argument handling.
The function returns an integer error code indicating success or failure. In
Fortran, the corresponding subroutine takes an optional final argument that
returns the error status.
Without calling MPI_Init, no MPI communication or coordination is possible, as
the runtime system has not yet been configured to support them.
MPI finalization
At the end of every MPI program, the function MPI_Finalize must be called.
This function terminates the MPI environment, cleaning up any internal data
structures and closing communication channels. Once MPI_Finalize has been
executed, the program can no longer make any MPI calls, not even MPI_Init
again.
Before finalization, all communication involving the process should have been completed to ensure a clean exit. This step marks the proper conclusion of the parallel computation and allows the system to reclaim resources allocated for MPI.
Together, MPI_Init and MPI_Finalize are mandatory, defining the beginning
and end of the parallel section.
Querying the number of processes
| Argument | Meaning |
|---|---|
comm |
(in) A communicator |
size |
(out) Number of processes in the group of comm |
MPI programs often need to know how many processes are participating. The
routine MPI_Comm_size provides this information. It returns the number of
processes within a given communicator. When the communicator MPI_COMM_WORLD is
used, MPI_Comm_size returns the total number of processes that were started
when the program began.
This information allows programs to make decisions based on the available parallelism. For example, dividing data among processes or assigning tasks dynamically. Because communicators are flexible abstractions, this routine can also be used in more advanced cases to query the size of subgroups of processes.
Querying the process rank
| Argument | Meaning |
|---|---|
comm |
(in) A communicator |
rank |
(out) Rank of the calling process in group of comm |
While MPI_Comm_size reveals the total number of processes, MPI_Comm_rank
tells each process who it is within the group. Every process in a communicator
is assigned a unique integer rank, ranging from 0 to NRANKS - 1. When a
communicator such as MPI_COMM_WORLD is used, the rank identifies the process
among all others in the program.
Ranks are fundamental to MPI communication. They serve as unique addresses: when a process wants to send a message, it specifies the rank of the destination process; when it receives data, it can check which rank the message came from.
Querying the Hardware Name
| Argument | Meaning |
|---|---|
name |
(out) A unique specifier for the actual (as opposed to virtual) node |
resultlen |
(out) Length (in characters) of result returned in name |
The function MPI_Get_processor_name allows a process to retrieve the name of
the processor or compute node on which it is running. Although the term
"processor" is used, in most modern systems this function returns the hostname
of the node. This can be helpful for debugging, performance testing, or simply
verifying how the workload is distributed across a cluster.
Compiling an MPI Application with OpenMPI
On most high-performance computing clusters, MPI implementations such as OpenMPI are provided as preinstalled software modules. To use OpenMPI, one typically loads the corresponding environment module using a command like module load OpenMPI.
Compilation is handled using compiler wrappers provided by the MPI
implementation. These wrapper, mpicc for C, mpicxx for C++, and mpifort
for Fortran, are not compilers themselves but scripts that invoke the underlying
compiler (such as GCC) with the appropriate include paths and linker options.
This ensures that MPI headers and libraries are correctly linked without
requiring the user to specify them manually. You can inspect what flags a
wrapper adds by typing mpicc -show.
$ mpicc -show
gcc -I/opt/cecisw/arch/easybuild/2021b/software/OpenMPI/4.1.2-GCC-11.2.0/include \
-L/opt/cecisw/arch/easybuild/2021b/software/OpenMPI/4.1.2-GCC-11.2.0/lib \
-L/opt/cecisw/arch/easybuild/2021b/software/hwloc/2.5.0-GCCcore-11.2.0/lib \
-L/opt/cecisw/arch/easybuild/2021b/software/libevent/2.1.12-GCCcore-11.2.0/lib \
-Wl,-rpath -Wl,/opt/cecisw/arch/easybuild/2021b/software/OpenMPI/4.1.2-GCC-11.2.0/lib \
-Wl,-rpath -Wl,/opt/cecisw/arch/easybuild/2021b/software/hwloc/2.5.0-GCCcore-11.2.0/lib \
-Wl,-rpath -Wl,/opt/cecisw/arch/easybuild/2021b/software/libevent/2.1.12-GCCcore-11.2.0/lib \
-Wl,--enable-new-dtags -lmpi
Running an MPI Application with mpirun
Once the program has been compiled, it can be launched using the mpirun
command. The syntax is straightforward:
This command starts NPROCESSES instances of the specified executable and
connects them using MPI. On clusters managed by SLURM or another resource
manager, mpirun typically integrates with the job scheduler to launch
processes across multiple nodes. If used on a login node, it is important to
explicitly specify the number of processes; otherwise, all available cores may
be used, potentially overloading the system.
The command mpiexec serves the same function and is defined by the MPI
standard, though mpirun remains more common in practice.
Running an MPI Application with srun
In systems using the SLURM job scheduler, MPI programs can also be launched
using SLURM’s built-in command srun. This command integrates directly with
SLURM’s resource allocation mechanism. For example:
This command starts the specified number of MPI processes as part of the current SLURM job allocation. A minimal SLURM batch script might look like this:
The script can be submitted to the queue with sbatch. The same script could also
use mpirun instead of srun if desired.
Running the MPI Hello World Example
After writing the C Hello World program, the next step is to compile it. On the login node, load OpenMPI and use the wrapper compiler:
Then, create a simple SLURM batch file named mpi_hello.job:
#!/bin/bash
#SBATCH --ntasks=4
#SBATCH --time=01:00
#SBATCH --output=mpi_hello.out
module load OpenMPI
srun ./mpi_hello
Finally, submit the job with:
When the job finishes, view the output with:
$ cat mpi_hello.out
Hello world from processor nic5-w004, rank 1 out of 4 ranks
Hello world from processor nic5-w004, rank 3 out of 4 ranks
Hello world from processor nic5-w004, rank 2 out of 4 ranks
Hello world from processor nic5-w004, rank 0 out of 4 ranks
You will see four Hello World lines, one from each process, possibly in an unexpected order. This non-deterministic order is a hallmark of parallel execution: each process runs independently, and without synchronization, there is no guarantee of which process prints first.
Experimenting with SLURM options
SLURM provides flexible options to control how MPI processes are distributed
across nodes. The --ntasks option sets the total number of MPI processes,
while --nodes requests a minimum number of nodes to allocate. The
--ntasks-per-node option controls how many processes run on each node.
Adjusting these options allows fine control over process placement, which can
influence performance.
For example, if we run the following job script
#!/bin/bash
#SBATCH --ntasks=4
#SBATCH --nodes=4
#SBATCH --time=01:00
#SBATCH --output=mpi_hello_world.out
module load OpenMPI
srun ./mpi_hello_world
it will produce an output where the 4 processes are distributed on 4 nodes:
$ cat mpi_hello.out
Hello world from processor nic5-w036, rank 0 out of 4 ranks
Hello world from processor nic5-w041, rank 2 out of 4 ranks
Hello world from processor nic5-w044, rank 3 out of 4 ranks
Hello world from processor nic5-w039, rank 1 out of 4 ranks
On the other hand, if we run this job script
#!/bin/bash
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=2
#SBATCH --time=01:00
#SBATCH --output=mpi_hello_world.out
module load OpenMPI
srun ./mpi_hello_world
it will produce an output where we have 8 processes, 4 nodes and two processes distributed on each node:
$ cat mpi_hello.out | sort
Hello world from processor nic5-w039, rank 0 out of 8 ranks
Hello world from processor nic5-w039, rank 1 out of 8 ranks
Hello world from processor nic5-w041, rank 2 out of 8 ranks
Hello world from processor nic5-w041, rank 3 out of 8 ranks
Hello world from processor nic5-w044, rank 4 out of 8 ranks
Hello world from processor nic5-w044, rank 5 out of 8 ranks
Hello world from processor nic5-w045, rank 6 out of 8 ranks
Hello world from processor nic5-w045, rank 7 out of 8 ranks
Early exit of an MPI application
| Argument | Meaning |
|---|---|
comm |
(in) Communicator of tasks to abort |
errorcode |
(in) Error code to return to invoking environment |
MPI provides the function MPI_Abort to forcefully terminate an application.
Theoretically, this call aborts all processes associated with a given
communicator, allowing selective termination. In practice, however, most
implementations treat MPI_Abort as a fatal error that ends the entire MPI job.
Once invoked, the MPI environment becomes invalid, and no further MPI routines
should be called.
MPI_Abort is primarily used in error handling, where it ensures that all
processes exit immediately if a serious problem is detected.
Measuring wallclock time
Finally, MPI offers a convenient way to measure elapsed time through the
function MPI_Wtime. This function returns a double-precision floating-point
number representing the wallclock time in seconds on the calling process. To
time a section of code, one can record the time before and after its execution: