Skip to content

Your first MPI application

The MPI Hello World

A good way to begin exploring MPI is through the classic Hello World program.

#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
  MPI_Init(&argc, &argv);

  int num_ranks;
  MPI_Comm_size(MPI_COMM_WORLD, &num_ranks);

  int rank;
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

  char processor_name[MPI_MAX_PROCESSOR_NAME];
  int name_len;
  MPI_Get_processor_name(processor_name, &name_len);

  printf("Hello world from processor %s,"
         "rank %d out of %d ranks\n",
         processor_name, rank, num_ranks);

  MPI_Finalize();

  return 0;
}

The program illustrates the essential structure of every MPI application. The program begins by including the MPI header file and then calling MPI_Init, which initializes the MPI execution environment. This initialization step must occur before any other MPI function is used: it sets up the infrastructure that allows multiple processes to communicate and coordinate.

Once MPI is initialized, each process in the program can discover two key pieces of information: the total number of processes that are participating in the computation, and its own unique identifier, or rank, among them. These values are retrieved using MPI_Comm_size and MPI_Comm_rank, respectively. Both functions operate within a communicator, which is an abstraction representing a group of processes that can communicate with one another. The communicator MPI_COMM_WORLD is provided by default and encompasses all processes in the program.

Each process then retrieves the name of the processor or compute node on which it is running by calling MPI_Get_processor_name. This information is often useful in parallel systems where processes may be distributed across many nodes in a cluster. Finally, the program prints a message indicating which processor and rank it represents and then calls MPI_Finalize to terminate the MPI environment. After this point, no further MPI routines may be invoked.

MPI initialization

int MPI_Init(int *argc, char ***argv)

Every MPI program must begin by calling MPI_Init. This function sets up the internal MPI environment, allocates necessary resources, and prepares the system for communication among processes. It must be the first MPI routine called in a program, and no other MPI calls may occur before it.

In C and C++, MPI_Init takes two arguments, the argc and argv parameters from the main function. Although most MPI implementations ignore these arguments, they are included to maintain compatibility with command-line argument handling. The function returns an integer error code indicating success or failure. In Fortran, the corresponding subroutine takes an optional final argument that returns the error status.

Without calling MPI_Init, no MPI communication or coordination is possible, as the runtime system has not yet been configured to support them.

MPI finalization

int MPI_Finalize()

At the end of every MPI program, the function MPI_Finalize must be called. This function terminates the MPI environment, cleaning up any internal data structures and closing communication channels. Once MPI_Finalize has been executed, the program can no longer make any MPI calls, not even MPI_Init again.

Before finalization, all communication involving the process should have been completed to ensure a clean exit. This step marks the proper conclusion of the parallel computation and allows the system to reclaim resources allocated for MPI.

Together, MPI_Init and MPI_Finalize are mandatory, defining the beginning and end of the parallel section.

Querying the number of processes

int MPI_Comm_size(MPI_Comm comm, int *size)
Argument Meaning
comm (in) A communicator
size (out) Number of processes in the group of comm

MPI programs often need to know how many processes are participating. The routine MPI_Comm_size provides this information. It returns the number of processes within a given communicator. When the communicator MPI_COMM_WORLD is used, MPI_Comm_size returns the total number of processes that were started when the program began.

This information allows programs to make decisions based on the available parallelism. For example, dividing data among processes or assigning tasks dynamically. Because communicators are flexible abstractions, this routine can also be used in more advanced cases to query the size of subgroups of processes.

Querying the process rank

int MPI_Comm_rank(MPI_Comm comm, int *rank)
Argument Meaning
comm (in) A communicator
rank (out) Rank of the calling process in group of comm

While MPI_Comm_size reveals the total number of processes, MPI_Comm_rank tells each process who it is within the group. Every process in a communicator is assigned a unique integer rank, ranging from 0 to NRANKS - 1. When a communicator such as MPI_COMM_WORLD is used, the rank identifies the process among all others in the program.

Ranks are fundamental to MPI communication. They serve as unique addresses: when a process wants to send a message, it specifies the rank of the destination process; when it receives data, it can check which rank the message came from.

Querying the Hardware Name

int MPI_Get_processor_name(char *name, int *resultlen)
Argument Meaning
name (out) A unique specifier for the actual (as opposed to virtual) node
resultlen (out) Length (in characters) of result returned in name

The function MPI_Get_processor_name allows a process to retrieve the name of the processor or compute node on which it is running. Although the term "processor" is used, in most modern systems this function returns the hostname of the node. This can be helpful for debugging, performance testing, or simply verifying how the workload is distributed across a cluster.

Compiling an MPI Application with OpenMPI

On most high-performance computing clusters, MPI implementations such as OpenMPI are provided as preinstalled software modules. To use OpenMPI, one typically loads the corresponding environment module using a command like module load OpenMPI.

Compilation is handled using compiler wrappers provided by the MPI implementation. These wrapper, mpicc for C, mpicxx for C++, and mpifort for Fortran, are not compilers themselves but scripts that invoke the underlying compiler (such as GCC) with the appropriate include paths and linker options. This ensures that MPI headers and libraries are correctly linked without requiring the user to specify them manually. You can inspect what flags a wrapper adds by typing mpicc -show.

 $ mpicc -show
gcc -I/opt/cecisw/arch/easybuild/2021b/software/OpenMPI/4.1.2-GCC-11.2.0/include \
    -L/opt/cecisw/arch/easybuild/2021b/software/OpenMPI/4.1.2-GCC-11.2.0/lib  \
    -L/opt/cecisw/arch/easybuild/2021b/software/hwloc/2.5.0-GCCcore-11.2.0/lib  \
    -L/opt/cecisw/arch/easybuild/2021b/software/libevent/2.1.12-GCCcore-11.2.0/lib  \
    -Wl,-rpath -Wl,/opt/cecisw/arch/easybuild/2021b/software/OpenMPI/4.1.2-GCC-11.2.0/lib  \
    -Wl,-rpath -Wl,/opt/cecisw/arch/easybuild/2021b/software/hwloc/2.5.0-GCCcore-11.2.0/lib  \
    -Wl,-rpath -Wl,/opt/cecisw/arch/easybuild/2021b/software/libevent/2.1.12-GCCcore-11.2.0/lib  \
    -Wl,--enable-new-dtags -lmpi

Running an MPI Application with mpirun

Once the program has been compiled, it can be launched using the mpirun command. The syntax is straightforward:

mpirun -np NPROCESSES EXECUTABLE

This command starts NPROCESSES instances of the specified executable and connects them using MPI. On clusters managed by SLURM or another resource manager, mpirun typically integrates with the job scheduler to launch processes across multiple nodes. If used on a login node, it is important to explicitly specify the number of processes; otherwise, all available cores may be used, potentially overloading the system.

The command mpiexec serves the same function and is defined by the MPI standard, though mpirun remains more common in practice.

Running an MPI Application with srun

In systems using the SLURM job scheduler, MPI programs can also be launched using SLURM’s built-in command srun. This command integrates directly with SLURM’s resource allocation mechanism. For example:

srun --ntasks=NPROCESSES EXECUTABLE

This command starts the specified number of MPI processes as part of the current SLURM job allocation. A minimal SLURM batch script might look like this:

#!/bin/bash 
#SBATCH --ntasks=4 
#SBATCH --time=01:00

module load OpenMPI
srun EXECUTABLE

The script can be submitted to the queue with sbatch. The same script could also use mpirun instead of srun if desired.

Running the MPI Hello World Example

After writing the C Hello World program, the next step is to compile it. On the login node, load OpenMPI and use the wrapper compiler:

module load OpenMPI 
mpicc -o mpi_hello mpi_hello.c

Then, create a simple SLURM batch file named mpi_hello.job:

#!/bin/bash 
#SBATCH --ntasks=4
#SBATCH --time=01:00 
#SBATCH --output=mpi_hello.out

module load OpenMPI
srun ./mpi_hello

Finally, submit the job with:

sbatch mpi_hello.job

When the job finishes, view the output with:

 $ cat mpi_hello.out
Hello world from processor nic5-w004, rank 1 out of 4 ranks
Hello world from processor nic5-w004, rank 3 out of 4 ranks
Hello world from processor nic5-w004, rank 2 out of 4 ranks
Hello world from processor nic5-w004, rank 0 out of 4 ranks

You will see four Hello World lines, one from each process, possibly in an unexpected order. This non-deterministic order is a hallmark of parallel execution: each process runs independently, and without synchronization, there is no guarantee of which process prints first.

Experimenting with SLURM options

SLURM provides flexible options to control how MPI processes are distributed across nodes. The --ntasks option sets the total number of MPI processes, while --nodes requests a minimum number of nodes to allocate. The --ntasks-per-node option controls how many processes run on each node. Adjusting these options allows fine control over process placement, which can influence performance.

For example, if we run the following job script

#!/bin/bash
#SBATCH --ntasks=4
#SBATCH --nodes=4 
#SBATCH --time=01:00
#SBATCH --output=mpi_hello_world.out

module load OpenMPI
srun ./mpi_hello_world

it will produce an output where the 4 processes are distributed on 4 nodes:

 $ cat mpi_hello.out
Hello world from processor nic5-w036, rank 0 out of 4 ranks
Hello world from processor nic5-w041, rank 2 out of 4 ranks
Hello world from processor nic5-w044, rank 3 out of 4 ranks
Hello world from processor nic5-w039, rank 1 out of 4 ranks

On the other hand, if we run this job script

#!/bin/bash
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=2
#SBATCH --time=01:00
#SBATCH --output=mpi_hello_world.out

module load OpenMPI
srun ./mpi_hello_world

it will produce an output where we have 8 processes, 4 nodes and two processes distributed on each node:

 $ cat mpi_hello.out | sort
Hello world from processor nic5-w039, rank 0 out of 8 ranks
Hello world from processor nic5-w039, rank 1 out of 8 ranks
Hello world from processor nic5-w041, rank 2 out of 8 ranks
Hello world from processor nic5-w041, rank 3 out of 8 ranks
Hello world from processor nic5-w044, rank 4 out of 8 ranks
Hello world from processor nic5-w044, rank 5 out of 8 ranks
Hello world from processor nic5-w045, rank 6 out of 8 ranks
Hello world from processor nic5-w045, rank 7 out of 8 ranks

Early exit of an MPI application

int MPI_Abort(MPI_Comm comm, int errorcode)
Argument Meaning
comm (in) Communicator of tasks to abort
errorcode (in) Error code to return to invoking environment

MPI provides the function MPI_Abort to forcefully terminate an application. Theoretically, this call aborts all processes associated with a given communicator, allowing selective termination. In practice, however, most implementations treat MPI_Abort as a fatal error that ends the entire MPI job. Once invoked, the MPI environment becomes invalid, and no further MPI routines should be called.

MPI_Abort is primarily used in error handling, where it ensures that all processes exit immediately if a serious problem is detected.

Measuring wallclock time

double MPI_Wtime()

Finally, MPI offers a convenient way to measure elapsed time through the function MPI_Wtime. This function returns a double-precision floating-point number representing the wallclock time in seconds on the calling process. To time a section of code, one can record the time before and after its execution:

double start = MPI_Wtime(); 
/* computation */ 
double end = MPI_Wtime();

printf("Elapsed time: %g seconds\n", end - start);