This chapter describes the perfcatch utility used to profile the performance of an MPI program. It covers the following topics:
For information on additional profiling tools, see “Using Profiling Tools with MPI Applications” in Chapter 5.
The perfcatch utility runs an MPI program with a wrapper profiling library that prints MPI call profiling information to a summary file upon MPI program completion. This MPI profiling result file is called MPI_PROFILING_STATS, by default (see “ MPI_PROFILING_STATS Results File Example”). It is created in the current working directory of the MPI process with rank 0.
The syntax of the perfcatch utility is, as follows:
perfcatch [-v | -vofed | -i] cmd args |
The perfcatch utility accepts the following options:
| No option | Supports MPT | |
| -v | Supports Voltaire MPI | |
| -vofed | Supports Voltaire OFED MPI | |
| -i | Supports Intel MPI |
To use perfcatch with an SGI Message Passing Toolkit MPI program, insert the perfcatch command in front of the executable name. Here are some examples:
mpirun -np 64 perfcatch a.out arg1 |
mpirun host1 32, host2 64 perfcatch a.out arg1 |
To use perfcatch with Intel MPI, add the -i options. An example is, as follows:
mpiexec -np 64 perfcatch -i a.out arg1 |
For more information, see the perfcatch (1) man page.
The MPI profiling result file has a summary statistics section followed by a rank-by-rank profiling information section. The summary statistics section reports some overall statistics, including the percent time each rank spent in MPI functions, and the MPI process that spent the least and the most time in MPI functions. Similar reports are made about system time usage.
The rank-by-rank profiling information section lists every profiled MPI function called by a particular MPI process. The number of calls and the total time consumed by these calls is reported. Some functions report additional information such as average data counts and communication peer lists.
An example MPI_PROFILING_STATS results file is, as follows:
============================================================
PERFCATCHER version 22
(C) Copyright SGI. This library may only be used
on SGI hardware platforms. See LICENSE file for
details.
============================================================
MPI program profiling information
Job profile recorded Wed Jan 17 13:05:24 2007
Program command line: /home/estes01/michel/sastest/mpi_hello_linux
Total MPI processes 2
Total MPI job time, avg per rank 0.0054768 sec
Profiled job time, avg per rank 0.0054768 sec
Percent job time profiled, avg per rank 100%
Total user time, avg per rank 0.001 sec
Percent user time, avg per rank 18.2588%
Total system time, avg per rank 0.0045 sec
Percent system time, avg per rank 82.1648%
Time in all profiled MPI routines, avg per rank 5.75004e-07 sec
Percent time in profiled MPI routines, avg per rank 0.0104989%
Rank-by-Rank Summary Statistics
-------------------------------
Rank-by-Rank: Percent in Profiled MPI routines
Rank:Percent
0:0.0112245% 1:0.00968502%
Least: Rank 1 0.00968502%
Most: Rank 0 0.0112245%
Load Imbalance: 0.000771%
Rank-by-Rank: User Time
Rank:Percent
0:17.2683% 1:19.3699%
Least: Rank 0 17.2683%
Most: Rank 1 19.3699%
Rank-by-Rank: System Time
Rank:Percent
0:86.3416% 1:77.4796%
Least: Rank 1 77.4796%
Most: Rank 0 86.3416%
Notes
-----
Wtime resolution is 5e-08 sec
Rank-by-Rank MPI Profiling Results
----------------------------------
Activity on process rank 0
Single-copy checking was not enabled.
comm_rank calls: 1 time: 6.50005e-07 s 6.50005e-07 s/call
Activity on process rank 1
Single-copy checking was not enabled.
comm_rank calls: 1 time: 5.00004e-07 s 5.00004e-07 s/call
------------------------------------------------
recv profile
cnt/sec for all remote ranks
local ANY_SOURCE 0 1
rank
------------------------------------------------
recv wait for data profile
cnt/sec for all remote ranks
local 0 1
rank
------------------------------------------------
recv wait for data profile
cnt/sec for all remote ranks
local 0 1
rank
------------------------------------------------
send profile
cnt/sec for all destination ranks
src 0 1
rank
------------------------------------------------
ssend profile
cnt/sec for all destination ranks
src 0 1
rank
------------------------------------------------
ibsend profile
cnt/sec for all destination ranks
src 0 1
rank
|
The MPI performance profiling environment variables are, as follows:
| Variable | Description |
| MPI_PROFILE_AT_INIT | Activates MPI profiling immediately, that is, at the start of MPI program execution. |
| MPI_PROFILING_STATS_FILE | Specifies the file where MPI profiling results are written. If not specified, the file MPI_PROFILING_STATS is written. |
The MPI supported profiled functions are, as follows:
| Note: Some functions may not be implemented in all language as indicated below. |
| Languages | Function |
| C Fortran | mpi_allgather |
| C Fortran | mpi_allgatherv |
| C Fortran | mpi_allreduce |
| C Fortran | mpi_alltoall |
| C Fortran | mpi_alltoallv |
| C Fortran | mpi_barrier |
| C Fortran | mpi_bcast |
| C Fortran | mpi_comm_create |
| C Fortran | mpi_comm_free |
| C Fortran | mpi_comm_group |
| C Fortran | mpi_comm_rank |
| C Fortran | mpi_finalize |
| C Fortran | mpi_gather |
| C Fortran | mpi_gatherv |
| C | mpi_get_count |
| C Fortran | mpi_group_difference |
| C Fortran | mpi_group_excl |
| C Fortran | mpi_group_free |
| C Fortran | mpi_group_incl |
| C Fortran | mpi_group_intersection |
| C Fortran | mpi_group_range_excl |
| C Fortran | mpi_group_range_incl |
| C Fortran | mpi_group_union |
| C | mpi_ibsend |
| C Fortran | mpi_init |
| C | mpi_init_thread |
| C Fortran | mpi_irecv |
| C Fortran | mpi_isend |
| C | mpi_probe |
| C Fortran | mpi_recv |
| C Fortran | mpi_reduce |
| C Fortran | mpi_scatter |
| C Fortran | mpi_scatterv |
| C Fortran | mpi_send |
| C Fortran | mpi_sendrecv |
| C Fortran | mpi_ssend |
| C Fortran | mpi_test |
| C Fortran | mpi_testany |
| C Fortran | mpi_wait |
| C Fortran | mpi_wait |