Chapter 2. Tutorial for C Users

This chapter provides a tutorial that shows you how to gather and analyze performance data in a C program, using SpeedShop tools. The tutorial covers these topics:


Note: Because of inherent differences between systems and because of concurrent processes that may be running on your system, your experiment will produce different results from the one in this tutorial. However, the basic structure of the results should be the same.


Tutorial Overview

This tutorial uses a sample program called generic. There are three versions of the program:

  • generic directory : contains files for the n32-bit ABI

  • generico32 directory: contains files for the (old) 32-bit ABI

  • generic64 directory: contains files for the 64-bit ABI

When you work with the tutorial, choose the version of generic most appropriate for your system. A good guideline is to choose the version that corresponds to the way you expect to develop your programs.

This tutorial was written and tested using the version of generic in the generic directory.

Contents of the generic Program

The generic program was designed as a test and demonstration application. It contains code to run scenarios that each test a different area of SpeedShop. The version of generic used in this tutorial performs scenarios that:

  • Build a linked list of structures

  • Use a lot of user time

  • Scan a directory and run the stat command on each file

  • Perform file I/O

  • Generate a number of floating-point exceptions

  • Load and call a routine in a DSO

Output from the generic Program

Output from the program looks like the following:

0:00:00.000 ======== (27173)             Begin script Fri  06 Feb 1998
15:03:31.
        begin script `ll.u.cvt.d.i.f.dso'
0:00:00.002 ======== (27173)        start of linklist Fri  06 Feb 1998
15:03:31.
        linklist completed.
0:00:00.003 ======== (27173)         start of usrtime Fri  06 Feb 1998
15:03:31.
        usertime completed.
0:00:25.572 ======== (27173)         start of cvttrap Fri  06 Feb 1998
15:03:57.
        cvttrap completed, y = 2.60188e+14, z = 2.60188e+14.
0:00:25.806 ======== (27173)         start of dirstat Fri  06 Feb 1998
15:03:57.
        dirstat of /usr/include completed, 304 files.
0:00:26.618 ======== (27173) start of iofile -- stdio Fri  06 Feb 1998
15:03:58.
        stdio iofile on /unix completed, 7307988 chars.
0:00:26.864 ======== (27173)        start of fpetraps Fri  06 Feb 1998
15:03:58.
        fpetraps completed.
0:00:26.865 ======== (27173)          start of libdso Fri  06 Feb 1998
15:03:58.
dlslave_init executed
dlslave_routine executed
        slaveusertime completed, x = 5000000.000000.
        libdso: dynamic routine returned 13
        end of script `ll.u.cvt.d.i.f.dso'
0:00:27.972 ======== (27173)               End script Fri  06 Feb 1998
15:03:59.

Tutorial Setup

Copy the program to a directory where you have write permission and compile it so that you can use it in the tutorial.

  1. Change to the /usr/demos/SpeedShop directory.

  2. Copy the appropriate generic directory and its contents to a directory where you have write permission:

    % cp -r generic your_dir

  3. Change to the directory you just created:

    % cd your_dir/generic

  4. Compile the program, by entering:

    % make all

    This provides an executable for the experiment.

Analyzing Performance Data

This section explains how to run the following experiments on the generic program, generate the experiment's results, and interpret the results:

  • usertime. As a first cut at optimization, this may be the most useful experiment. It breaks down a program into its functions and returns the CPU time used in each. See “A usertime Experiment”.

  • pcsamp. This experiment uses a different method to return the CPU time. See “A pcsamp Experiment”.

  • dsc_hwc. This experiment counts the number of times a required data item was not in secondary data cache. If the data item is not in secondary data cache, it must be fetched from memory, which requires more time. See “A Hardware Counter Experiment”.

  • bbcounts. This experiment counts basic block usage and estimates a linear time. It also maps out a complete call graph. See “A Basic Block Experiment”.

  • fpe. This experiment counts the number of floating-point exceptions in each function. See “An fpe Trace”.

You can follow the tutorial from start to finish, or you can choose the experiment you want to perform.

A usertime Experiment

This section explains how to perform a usertime experiment. The usertime experiment allows you to gather data on the amount of CPU time spent in each function in your program.


Note: Due to statistical sampling of the call stack, not all functions may appear in the experiment output.


For more information on usertime, see “usertime Experiment” in Chapter 4.

Performing a usertime Experiment

From the command line, enter the following:

% ssrun -usertime generic

This command starts the experiment. Output from generic and from ssrun is printed to stdout, as shown in the following example. A data file is also generated. The name consists of the process name (generic), the experiment type (usertime), and the experiment ID. In this example, the file name is generic.usertime.m10981.

0:00:00.000 ======== (10981)             Begin script Mon  02 Feb 1998
11:05:02.
        begin script `ll.u.cvt.d.i.f.dso'
0:00:00.002 ======== (10981)        start of linklist Mon  02 Feb 1998
11:05:02.
        linklist completed.
0:00:00.003 ======== (10981)         start of usrtime Mon  02 Feb 1998
11:05:02.
        usertime completed.
0:00:22.948 ======== (10981)         start of cvttrap Mon  02 Feb 1998
11:05:25.
        cvttrap completed, y = 2.60188e+14, z = 2.60188e+14.
0:00:23.156 ======== (10981)         start of dirstat Mon  02 Feb 1998
11:05:25.
        dirstat of /usr/include completed, 304 files.
0:00:23.937 ======== (10981) start of iofile -- stdio Mon  02 Feb 1998
11:05:26.
        stdio iofile on /unix completed, 7307988 chars.
0:00:24.777 ======== (10981)        start of fpetraps Mon  02 Feb 1998
11:05:27.
        fpetraps completed.
0:00:24.777 ======== (10981)          start of libdso Mon  02 Feb 1998
11:05:27.
dlslave_init executed
dlslave_routine executed
        slaveusertime completed, x = 5000000.000000.
        libdso: dynamic routine returned 13
        end of script `ll.u.cvt.d.i.f.dso'
0:00:25.866 ======== (10981)               End script Mon  02 Feb 1998
11:05:28.

Generating a Report

To generate a report on the data collected, enter the following at the command line:

% prof your_output_file_name > usertime.results

The prof command prints results to stdout.


Note: Because of line width restrictions, the DSO, file name, and line number information at the end of each line is wrapped to the next line in the following sample output.


-------------------------------------------------------------------------
SpeedShop profile listing generated Mon Feb  2 11:07:15 1998
   prof generic.usertime.m10981
                 generic (n32): Target program
                      usertime: Experiment name
                         ut:cu: Marching orders
                 R4400 / R4000: CPU / FPU
                             1: Number of CPUs
                           175: Clock frequency (MHz.)
  Experiment notes--
          From file generic.usertime.m10981:
        Caliper point 0 at target begin, PID 10981
                   /usr/demos/SpeedShop/progs.etc/linpack.demos/c/generic
        Caliper point 1 at exit(0)
-------------------------------------------------------------------------
Summary of statistical callstack sampling data (usertime)--
                           809: Total Samples
                             0: Samples with incomplete traceback
                        24.270: Accumulated Time (secs.)
                          30.0: Sample interval (msecs.)
-------------------------------------------------------------------------
Function list, in descending order by exclusive time
-------------------------------------------------------------------------
[index]  excl.secs excl.%   cum.%  incl.secs incl.%   samples function (dso:file,line)

   [4]     22.770  93.8%   93.8%     22.770  93.8%        759  anneal
(generic: generic.c, 1573)
   [6]      1.020   4.2%   98.0%      1.020   4.2%         34  slaveusrtime 
(dlslave.so: dlslave.c, 22)
   [9]      0.210   0.9%   98.9%      0.210   0.9%          7  cvttrap 
(generic: generic.c, 317)
  [12]      0.120   0.5%   99.4%      0.120   0.5%          4  __read
(libc.so.1: read.s, 20)
  [14]      0.090   0.4%   99.8%      0.090   0.4%          3  _xstat
(libc.so.1: xstat.s, 12)
  [10]      0.030   0.1%   99.9%      0.180   0.7%          6  iofile
(generic: generic.c, 464)
  [11]      0.030   0.1%  100.0%      0.150   0.6%          5  fread 
(libc.so.1: fread.c, 34)
   [1]      0.000   0.0%  100.0%     24.270 100.0%        809  __start 
(generic: crt1text.s, 101)
   [2]      0.000   0.0%  100.0%     24.270 100.0%        809  main 
(generic: generic.c, 101)
   [3]      0.000   0.0%  100.0%     24.270 100.0%        809  Scriptstring 
(generic: generic.c, 184)
   [5]      0.000   0.0%  100.0%     22.770  93.8%        759  usrtime 
(generic: generic.c, 1377)
  [15]      0.000   0.0%  100.0%      0.090   0.4%          3  dirstat 
(generic: generic.c, 348)
  [16]      0.000   0.0%  100.0%      0.090   0.4%          3  _stat 
(libc.so.1: stat.c, 31)
  [13]      0.000   0.0%  100.0%      0.120   0.5%          4  _read 
(libc.so.1: readSCI.c, 27)
   [7]      0.000   0.0%  100.0%      1.020   4.2%         34  libdso
(generic: generic.c, 619)
   [8]      0.000   0.0%  100.0%      1.020   4.2%         34  dlslave_routine 
(dlslave.so: dlslave.c, 7)

Analyzing the Report

The report shows information for each function. The meanings of the column headings are described below:

  • The index column assigns a reference number to each function.

  • The excl.secs column shows how much time, in seconds, was spent in the function itself (exclusive time). For example, less than one hundredth of a second was spent in __start, but 0.03 of a second was spent in fread. (Routines that begin with an underscore, such as __start, are internal routines that you cannot change.)

  • The excl.% column shows the percentage of a program's total time that was spent in the function. The anneal function consumed 93.8% of the program's time.

  • The cum.% column shows the percentage of the complete program time that has executed in the routines listed so far.

  • The incl.secs column shows how much time, in seconds, was spent in the function and descendents of the function. For example, 0.21 seconds were spent in cvttrap and the functions that were called by it.

  • The incl.% column shows the cumulative percentage of inclusive time spent in each function and its descendents. For example, 93.8% of the time was spent in anneal and all the functions that were called through it.

  • The samples column shows how many samples were taken when the process was executing in the function and in all of the function's descendants.

  • The procedure (dso:file,line) columns list the function name, its DSO name, its file name, and its line number. For example, the top line reports statistics for the function anneal, the DSO generic, in the file generic.c, at line 1573.

A pcsamp Experiment

This section explains how to perform a pcsamp experiment. The pcsamp experiment allows you to gather information on actual CPU time for each function in your program. For more information on pcsamp, see “pcsamp Experiment” in Chapter 4.

From the command line, enter the following:

% ssrun -fpcsamp generic

This starts the experiment. The f prefix is added to pcsamp for this program because the program runs quickly and does not gather much data using the default pcsamp experiment name; adding the f prefix results in more data samples. Output from generic and from ssrun is printed to stdout, as shown in the following example.

A data file is also generated. The name consists of the process name (generic), the experiment type (fpcsamp), and the experiment ID. In this example, the file name is generic.fpcsamp.m11140.

0:00:00.000 ======== (11140)             Begin script Mon  02 Feb 1998
10:58:41.
        begin script `ll.u.cvt.d.i.f.dso'
0:00:00.003 ======== (11140)        start of linklist Mon  02 Feb 1998
10:58:41.
        linklist completed.
0:00:00.004 ======== (11140)         start of usrtime Mon  02 Feb 1998
10:58:41.
        usertime completed.
0:00:22.437 ======== (11140)         start of cvttrap Mon  02 Feb 1998
10:59:03.
        cvttrap completed, y = 2.60188e+14, z = 2.60188e+14.
0:00:22.638 ======== (11140)         start of dirstat Mon  02 Feb 1998
10:59:03.
        dirstat of /usr/include completed, 304 files.
0:00:23.407 ======== (11140) start of iofile -- stdio Mon  02 Feb 1998
10:59:04.
        stdio iofile on /unix completed, 7307988 chars.
0:00:23.750 ======== (11140)        start of fpetraps Mon  02 Feb 1998
10:59:04.
        fpetraps completed.
0:00:23.751 ======== (11140)          start of libdso Mon  02 Feb 1998
10:59:04.
dlslave_init executed
dlslave_routine executed
        slaveusertime completed, x = 5000000.000000.
        libdso: dynamic routine returned 13
        end of script `ll.u.cvt.d.i.f.dso'
0:00:24.778 ======== (11140)               End script Mon  02 Feb 1998
10:59:05.

Generating a Report

To generate a report on the data collected, and to redirect the output to a file, enter the following:

% prof your_output_file_name > pcsamp.results

Output similar to the following is generated:

------------------------------------------------------------------------
SpeedShop profile listing generated Mon Feb  2 11:01:36 1998
   prof generic.fpcsamp.m11140
                 generic (n32): Target program
                       fpcsamp: Experiment name
                pc,2,1000,0:cu: Marching orders
                 R4400 / R4000: CPU / FPU
                             1: Number of CPUs
                           175: Clock frequency (MHz.)
  Experiment notes--
          From file generic.fpcsamp.m11140:
        Caliper point 0 at target begin, PID 11140
                /usr/demos/SpeedShop/linpack.demos/c/generic
        Caliper point 1 at exit(0)
-------------------------------------------------------------------------
Summary of statistical PC sampling data (fpcsamp)--
                         23828: Total samples
                        23.828: Accumulated time (secs.)
                           1.0: Time per sample (msecs.)
                             2: Sample bin width (bytes)
-------------------------------------------------------------------------
Function list, in descending order by time
-------------------------------------------------------------------------
 [index]      secs    %    cum.%   samples  function (dso: file, line)

     [1]    22.279  93.5%  93.5%     22279  anneal (generic: generic.c, 1573)
     [2]     0.975   4.1%  97.6%       975  slaveusrtime (dlslave.so: dlslave.c, 22)
     [3]     0.201   0.8%  98.4%       201  __read (libc.so.1: read.s, 20)
     [4]     0.198   0.8%  99.3%       198  cvttrap (generic: generic.c, 317)
     [5]     0.121   0.5%  99.8%       121  _xstat (libc.so.1: xstat.s, 12)
     [6]     0.010   0.0%  99.8%        10  __open (libc.so.1: open.s, 23)
     [7]     0.010   0.0%  99.9%        10  __write (libc.so.1: write.s, 20)
     [8]     0.010   0.0%  99.9%        10  __sigfillset (libc.so.1: sigfillset.c, 11)
     [9]     0.010   0.0%  99.9%        10  _ecvt_r (libc.so.1: ecvt.c, 70)
    [10]     0.003   0.0% 100.0%         3  fread (libc.so.1: fread.c, 34)
    [11]     0.003   0.0% 100.0%         3  dirstat (generic: generic.c, 348)
    [12]     0.002   0.0% 100.0%         2  _doprnt (libc.so.1: doprnt.c, 285)
    [13]     0.001   0.0% 100.0%         1  memcpy (libc.so.1: bcopy.s, 329)
    [14]     0.001   0.0% 100.0%         1  _readdir (libc.so.1: readdir.c, 135)
    [15]     0.001   0.0% 100.0%         1  _read (libc.so.1: readSCI.c, 27)
    [16]     0.001   0.0% 100.0%         1  __sinf (libm.so: fsin.c, 93)
             0.002   0.0% 100.0%         2  **OTHER** (includes excluded DSOs, rld, etc.)

            23.828 100.0% 100.0%     23828  TOTAL

Analyzing the Report

The report has the following columns:

  • The [index] column assigns a reference number to each function.

  • The secs column shows the amount of CPU time, in seconds, that was spent in the function.

  • The % column shows the percentage of the total program time that was spent in the function.

  • The cum.% column shows the percentage of the complete program time in functions that have been listed so far.

  • The samples column shows how many samples were taken when the process was executing in the function.

  • The function (dso: file, line) columns list the function, its DSO name, its file name, and its line number.

A Hardware Counter Experiment


Note: This experiment can be performed only on systems that have built-in hardware counters (machines with the R10000, R12000, R14000, or R16000 class of CPU).

This section takes you through the steps to perform a hardware counter experiment. There are a number of hardware counter experiments, but this tutorial describes the steps involved in performing the dsc_hwc experiment. This experiment captures information about secondary data cache misses. For more information on hardware counter experiments, see “SpeedShop Hardware Counter Experiments” in Chapter 4.

Performing a Hardware Counter Experiment

From the command line, enter:

% ssrun -dsc_hwc generic

This starts the experiment. Output from generic and from ssrun is printed to stdout. A data file is also generated. The name consists of the process name (generic), the experiment type (dsc_hwc), and the experiment ID. In this example, the file name is generic.dsc_hwc.m294398.

Generating a Report

To generate a report on the data collected and redirect the output to a file, enter the following:

% profyour_output_file_name > dsc_hwc.results

The report should look similar to the following listing:

-------------------------------------------------------------------------
SpeedShop profile listing generated Mon Feb  2 11:11:44 1998
   prof generic.dsc_hwc.m294398
                 generic (n32): Target program
                       dsc_hwc: Experiment name
                 hwc,26,131:cu: Marching orders
               R10000 / R10010: CPU / FPU
                            16: Number of CPUs
                           195: Clock frequency (MHz.)
  Experiment notes--
          From file generic.dsc_hwc.m294398:
        Caliper point 0 at target begin, PID 294398
               /usr/demos/SpeedShop/linpack.demos/c/generic
        Caliper point 1 at exit(0)
-------------------------------------------------------------------------
Summary of R10K perf. counter overflow PC sampling data (dsc_hwc)--
                             6: Total samples
       Sec cache D misses (26): Counter name (number)
                           131: Counter overflow value
                           786: Total counts
-------------------------------------------------------------------------
Function list, in descending order by counts
-------------------------------------------------------------------------
 [index]        counts     %   cum.%   samples  function (dso: file, line)

     [1]           131  16.7%  16.7%     1  init2da (generic: generic.c, 1430)
     [2]           131  16.7%  33.3%     1  genLog (generic: generic.c, 1686)
     [3]           131  16.7%  50.0%     1  _write (libc.so.1: writeSCI.c, 27)
                   393  50.0% 100.0%     3  **OTHER** (includes excluded DSOs, rld, etc.)

                   786 100.0% 100.0%         6  TOTAL

Analyzing the Report

The information immediately preceding the function list displays the following:

  • The Total samples is the number of times the program counter was sampled. It is sampled once for each overflow, or once each time the hardware counter exceeds the specified value.

  • The Counter name (number) indicates the hardware counter used in the experiment. In this case, hardware counter 26 counts the number of times a value required in a calculation was not available in secondary cache. For a complete list of the hardware counters and their numbers, see Table 4-3.

  • The Counter overflow value is the number at which the hardware counter overflows or exceeds its preset value. In this case, the value is 131, which is the default. The fdsc_hwc experiment runs the same hardware counter experiment with the preset value of 29. You can change the overflow value by setting the _SPEEDSHOP_HWC_COUNTER_OVERFLOW environment variable to a value larger than 0, the _SPEEDSHOP_HWC_COUNTER_NUMBER environment variable to 26, and running the prof_hwc experiment instead of dsc_hwc. See “_hwc Hardware Counter Experiments ” in Chapter 4 to learn how to choose a counter overflow value.

  • The Total counts is the total number of times a value was not in secondary cache when needed. This value is determined by multiplying the total number of samples by the overflow value; extra counts that do not cause an overflow are not recorded.

The function list has the following columns:

  • The index column assigns a reference number to each function.

  • The counts column shows the number of times a data item was not in secondary cache when needed for a calculation during the execution of the function. As with Total counts, a function's counts value is determined by multiplying its samples value by the overflow value.

  • The % column shows the percentage of the program's overflows that occurred in the function.

  • The cum.% column shows the percentage of the program's overflows that occurred in the functions listed so far. A function might have a low number in its % column but a high value in its cum.% column if it executed late in the program.

  • The samples column shows the number of times the program counter was sampled during execution of the function. A sample is taken for each overflow of the hardware counter.

  • The function (dso: file, line) columns list the function name, the DSO, the file name, and line number of the function.

A Basic Block Experiment

This section takes you through the steps to perform an bbcounts experiment. The times returned represent an idealized computation. This experiment ignores potential floating-point interlocks and memory latency time (cache misses and memory bus contention). The times returned will always be lower than the times for an actual run. For more information on the bbcounts experiment, see “bbcounts Experiment” in Chapter 4.

Performing a bbcounts Experiment

From the command line, enter

% ssrun -bbcounts generic

This starts the experiment. First the executable, rld, and the DSOs are instrumented. This entails making copies of the libraries and executables, giving the copies an extension that depends on the ABI, and inserting information into the copies. The extension is .pixbb for the executable, .pix32 for all old 32-bit libraries, .pixn32 for all n32 libraries, and .pix64 for all 64-bit libraries.

Output from generic and from ssrun is printed to stdout. A data file is also generated. The name consists of the process name (generic), the experiment type (bbcounts), and the experiment ID. In this example, the file name is generic.bbcounts.m10966, and the following is written to stdout:

johmar % ssrun -bbcounts generic
        instrumenting /lib32/rld
        instrumenting /usr/lib32/libssrt.so
        instrumenting /usr/lib32/libss.so
        instrumenting /usr/lib32/libm.so
        instrumenting /usr/lib32/libc.so.1
0:00:00.000 ======== (76717)             Begin script Tue  16 Oct 2001  05:16:50.
        begin script `ll.u.cvt.d.i.f.dso'
0:00:00.010 ======== (76717)        start of linklist Tue  16 Oct 2001  05:16:50.
        linklist completed.
0:00:00.016 ======== (76717)         start of usrtime Tue  16 Oct 2001  05:16:50.
        usertime completed.
0:00:19.208 ======== (76717)         start of cvttrap Tue  16 Oct 2001  05:17:09.
        cvttrap completed, y = 2.60188e+14, z = 2.60188e+14.
0:00:19.451 ======== (76717)         start of dirstat Tue  16 Oct 2001  05:17:10.
        dirstat of /usr/include completed, 306 files.
0:00:19.497 ======== (76717) start of iofile -- stdio Tue  16 Oct 2001  05:17:10.
        stdio iofile on /unix completed, 7063616 chars.
0:00:19.878 ======== (76717)        start of fpetraps Tue  16 Oct 2001  05:17:10.
        fpetraps completed.
0:00:19.880 ======== (76717)          start of libdso Tue  16 Oct 2001  05:17:10.
dlslave_init executed
dlslave_routine executed
        slaveusertime completed, x = 5000000.000000.
        libdso: dynamic routine returned 13
        end of script `ll.u.cvt.d.i.f.dso'
0:00:21.274 ======== (76717)               End script Tue  16 Oct 2001  05:17:11.

The output statements beginning with “instrumenting declares that ssrun is instrumenting first the libraries and then the generic executable itself.

Generating a Report

To generate a report on the data collected, enter the following at the command line:

% prof your_output_file_name > bbcounts.results

This command redirects output to a file called bbcounts.results. The file contains results that look similar to the following partial listing. The number of functions and their names may also vary.

SpeedShop profile listing generated Tue Oct 16 05:41:34 2001
   prof generic.bbcounts.m76777
                 generic (n32): Target program
                      bbcounts: Experiment name
                         it:cu: Marching orders
                 R5000 / R5000: CPU / FPU
                             1: Number of CPUs
                           180: Clock frequency (MHz.)
  Experiment notes--
	  From file generic.bbcounts.m76777:
	Caliper point 0 at target begin, PID 76777
			/d3/jcarter/pixie/generic/generic
	Caliper point 1 at exit(0)
-------------------------------------------------------------------------
Summary of ideal time data (bbcounts)--
                    2063894891: Total number of instructions executed
                    3049155927: Total computed cycles
                        16.940: Total computed execution time (secs.)
                         1.477: Average cycles / instruction
-------------------------------------------------------------------------
Function list, in descending order by exclusive ideal time
-------------------------------------------------------------------------
 [index]   excl.secs   excl.%     cum.%        cycles  instructions    calls  function  (dso:file, line)

     [1]      16.096    95.0%     95.0%    2897320027    1971220024        1  anneal (generic.pixbb: generic.c, 1570)
     [2]       0.694     4.1%     99.1%     125000842      75000733        1  slaveusrtime (dlslave.so: dlslave.c, 22)
     [3]       0.139     0.8%     99.9%      25000068      16000056        1  cvttrap (generic.pixbb: generic.c, 318)
     [4]       0.002     0.0%    100.0%        347029        323780     1277  general_find_symbol (rld: rld.c, 2038)
     [5]       0.002     0.0%    100.0%        327878        298004     2992  resolve_relocations (rld: rld.c, 2636)
     [6]       0.001     0.0%    100.0%        146994        146994     1292  elfhash (rld: obj.c, 1184)
     [7]       0.001     0.0%    100.0%         98839         98839     4303  obj_dynsym_got (rld: objfcn.c, 46)
     [8]       0.000     0.0%    100.0%         87622         87622     1651  strcmp (rld: strcmp.s, 34)
     [9]       0.000     0.0%    100.0%         85790         76476        2  fix_all_defineds (rld: rld.c, 3419)
    [10]       0.000     0.0%    100.0%         76318         73786     1280  resolve_symbol (rld: rld.c, 1828)
    [11]       0.000     0.0%    100.0%         72295         71052     1247  resolving (rld: rld.c, 1499)
    [12]       0.000     0.0%    100.0%         71134         58125        1  init2da (generic.pixbb: generic.c, 1427)
    [13]       0.000     0.0%    100.0%         59676         54889      432  fread (libc.so.1: fread.c, 27)
    [14]       0.000     0.0%    100.0%         48345         44185       53  _doprnt (libc.so.1: doprnt.c, 227)
    [15]       0.000     0.0%    100.0%         48000         35200     1600  _drand48 (libc.so.1: drand48.c, 116)
    [16]       0.000     0.0%    100.0%         38952         14346       18  offtime (libc.so.1: time_comm.c, 179)
    [17]       0.000     0.0%    100.0%         38783         27864      628  __sinf (libm.so: fsin.c, 97)
.
.
.

Analyzing the Report

The columns in the report provide the following information:

  • The index column assigns a reference number to each function.

  • The excl.secs column shows the minimum number of seconds that might be spent in the function under ideal conditions. For example, 21.453 seconds is optimal for the anneal function, the way it is currently written. The pcsamp experiment actually timed this function at 22.279 seconds.

  • The excl.% column shows how much of the program's total time was spent in the function.

  • The cum.% column shows the cumulative percentage of time spent in the functions listed so far.

  • The cycles column shows the total number of machine cycles used by the function. For example, 3,754,320,037 CPU clock cycles were spent in the anneal function.

  • The instructions column shows the total number of instructions executed by a function. For example, the anneal function executed 1,971,220,024 instructions.

  • The calls column shows the total number of calls made to the function. For example, there was just one call to the anneal function.

  • The function (dso:file, line) columns list the function, its DSO name, its file name, and the line number. For example, indexed line one in the preceding report presents statistics for the function anneal in the generic executable. The function's source is found in the file generic.c at IDE number 1573.

An fpe Trace

This section takes you through the steps to perform a floating-point exception (fpe) trace, which identifies functions in which floating-point exceptions have occurred. For more information on the fpe trace, see “Floating-Point Exception Trace” in Chapter 4.

Performing an fpe Trace

From the command line, enter:

% ssrun -fpe generic

Output from generic and from ssrun is printed to stdout. A data file is created with a name generated by concatenating the process name (generic), the experiment type (fpe), and the experiment ID. In this example, the file name is generic.fpe.m12213.

Generating a Report

To generate a report on the data collected, enter the following at the command line:

% prof your_output_file_name > fpe.results

The report should look similar to the following partial listing:

-------------------------------------------------------------------------
SpeedShop profile listing generated Mon Feb  2 13:26:33 1998
   prof generic.fpe.m12213
                 generic (n32): Target program
                           fpe: Experiment name
                        fpe:cu: Marching orders
                 R4400 / R4000: CPU / FPU
                             1: Number of CPUs
                           175: Clock frequency (MHz.)
  Experiment notes--
          From file generic.fpe.m12213:
        Caliper point 0 at target begin, PID 12213
           /usr/demos/SpeedShop/linpack.demos/c/generic
        Caliper point 1 at exit(0)
-------------------------------------------------------------------------
Summary of FPE callstack tracing data (fpe)--
                             4: Total FPEs
                             0: Samples with incomplete traceback
-------------------------------------------------------------------------
Function list, in descending order by exclusive FPEs
-------------------------------------------------------------------------
 [index]  excl.FPEs excl.& cum.% incl.FPEs incl.%  function  (dso: file, line)

   [1]         4 100.0%  100.0%        4 100.0%  fpetraps (generic: generic.c, 405)
   [2]         0   0.0%  100.0%        4 100.0%  __start (generic: crt1text.s, 101)
   [3]         0   0.0%  100.0%        4 100.0%  main (generic: generic.c, 101)
   [4]         0   0.0%  100.0%        4 100.0%  Scriptstring (generic: generic.c, 184)

Analyzing the Report

The report shows information for each function:

  • The index column assigns a reference number to each function.

  • The excl.FPEs column shows how many floating-point exceptions were found in the function. Four floating-point exceptions were found in fpetraps.

  • The excl.% column shows the percentage of the total number of floating-point exceptions that were found in the function.

  • The cum.% column shows the percentage of exclusive floating-point exceptions in the functions that have been listed so far. The list is sorted by the number of floating-point exceptions, with the most in the top line and the least in the bottom line. Because all of the exceptions are in the first function listed in this example, all entries in this column are 100%.

  • The incl.FPEs column shows how many floating-point exceptions were generated by the function and the functions it called.

  • The incl.% column shows the percentage of the program's total number of floating-point exceptions in this function and the functions it called. Because fpetraps is called through all of the other functions, they are all listed as 100%.

  • The function (dso:file, line) columns list the routine name, its DSO name, its file name, and its line number.