Chapter 7. In-lining and Interprocedural Analysis

This chapter provides additional information about the PCA command-line options and in-line pragmas that you can use to inline functions and perform interprocedural analysis.

In-lining is the process of replacing a function reference with the text of the function. This process eliminates the overhead of the function call, and can assist other optimizations by making relationships between function arguments, returned values, and the surrounding code easier to find.

Interprocedural analysis is the process of inspecting called functions for information on relationships between arguments, returned values, and global data. This process can provide many of the benefits of in-lining without replacing the function reference.

Table 7-1 lists the in-lining options.

Table 7-1. In-lining Options

In-lining–Purpose

Long Name

Short Name

Default Value

which to in-line

 

 

 

Specify routine to in-line

inline[=name[,name...]]

inl[=names]

off

Create preprocessed library

inline_create=lib.klib

incr=lib.klib

off

Define inlinable routines

inline_from_files=list

inff=list

current source file

Specify library from

inline_from_libs=list

infl=list

off

Specify call nest level

inline_depth[=n]

ind[=n]

ind=2

Specify for loop-nest level

inline_looplevel[=n]

inll[=n]

inll=2

Specify manual control

inline_manual

inm

off

Table 7-2 lists the IPA options.

Table 7-2. Interprocedural Analysis Options

IPA–Purpose

Long Name

Short Name

Default Value

which to do IPA

 

 

 

Specify routine to analyze

ipa[=name[,name...]]

ipa[=names]

off

Create preprocessed library

ipa_create=lib.klib

ipacr=lib.klib

off

Define routines for IPA

ipa_from_files=list

ipaff=list

current source file

Specify library from

ipa_from_libs=list

ipafl=list

off

Specify for loop-nest level

ipa_looplevel[=n]

ipall[=n]

ipall=2

for IPA

 

 

 

Specify manual control

ipa_manual

ipam

off

The rest of this chapter covers the in-lining and interprocedural analysis command-line options and pragmas, related command-line options, examples of their use, and information on program constructs that inhibit in-lining. In-lining and interprocedural analysis are symmetrical from the command-line standpoint–you use related sets of commands and pragmas for them. (Many places that say in-lining apply to both in-lining and interprocedural analysis.)

In-lining and IPA Command-Line Options

In-lining has two phases:

  1. Define the universe of in-linable routines.

  2. Select which routines in that universe to in-line or analyze.

The from_files and from_libs options define the universe of in-linable routines. The inline, ipa, and looplevel options select which of the available routines are to be in-lined/analyzed. The create options set up collections of routines for inclusion in later PCA runs.

The subsections that follow define the syntax for in-lining and interprocedural analysis command-line options. The short forms of their names appear in square brackets ([ ]).

The inline_from and ipa_from Options

The inline_from and ipa_from options take the following form:

-inline_from_files=list            [-inff=list]
-inline_from_libraries=list        [-infl=list]
-ipa_from_files=list               [-ipaff=list]
-ipa_from_libraries=list           [-ipafl=list]

where list is one or more of the following:

  • source file name

  • library file name

  • directory

Separate each item in the list by commas. Do not use shell wild card characters in the list of files and directories. The default is current source file. Different types of files are distinguished by their extensions. For example:

-inline_from_files=xj.c,yy.c,../mrtn

looks for routines in the C source files xj.c and yy.c, and in C source files in the directory ../mrtn. (Including the directory ../mrtn is equivalent to the UNIX® notation ../mrtn/*.c). All source files that contain C preprocessor directives must be preprocessed by the cc compiler before being in-lined or analyzed.

The from_libraries versions of these options take as their arguments lists of function libraries and directories containing such libraries.

PCA recognizes the type of file by its extension, or lack of one (see Table 7-3 for the file types).

Table 7-3. File Types

File Extension

Type of File

.c

C source file

.klib

Library from inline/ipa_create

other

Directory

Two special abbreviations are:

dash (–) 

A dash specifies the current source file (as listed on the command line, or specified in a –input=file command-line option).

period ( .) 

A period specifies the current working directory.

Specifying a nonexistent file or directory is a command-line error.

If you specify multiple from_files and from_libraries options, their lists are concatenated to get a bigger universe.

Routine name references are resolved by a search in the order that files appear in from_files and from_libraries options on the command line. Libraries are searched in their original lexical order. Multiple from_files and from_libraries lists are searched in the order in which they appear on the command line.

Creating and Using Libraries

To create a preprocessed library, use the following syntax:

-inline_create=library_name.klib      [-incr=lib_name.klib]
-ipa_create=library_name.klib         [-ipacr=lib_name.klib]

To specify a library file to in-line from, use:

-inline_from_libraries=list           [–infl=list]
-ipa_from_libraries=list              [–ipafl=list]

The default source for routines to put into the library is the current source file. If you specify inline_from (ipa_from), the routines in the listed files are the ones put into the library. This provides a method to combine or expand libraries. Just include the old library(ies) and any new file(s) in an inline_from (ipa_from) option.

Routines are included in libraries in the order in which they appear in the input file(s). This order guarantees that if multiple routines with the same name are in the same source file, the one chosen for in-lining will be the one you expect from the algorithm under inline_from, described previously.

A library created with inline_create will work for in-lining or IPA, since it is just partially reduced source code. However, a library made with ipa_create may not appear in an –inline_from=list. Such use is flagged with a warning message.

If no library name is given, the name used is file.klib, where file is the input file name with any trailing .c stripped off.

When creating a library, only one create option may be given. That is, only one library may be created per PCA run. If the library file existed prior to running PCA, it is overwritten. When you specify this option on the command line, no transformed code file will be generated. See the previous description of the from_libraries options for information on using libraries created with these options.

If you don't specify an inline (ipa) option, the default is to include all the functions in the source file in the library, if possible. See “Conditions That Inhibit In-lining” later in this chapter for a list of conditions that can prevent a function from being in-lined.

An example of in-lining from the library created above is included in the section of examples later in this chapter.

Naming Specific Routines

To specify the names of particular routines to in-line, use:

-inline[=name[,name...]]          [-inl=name,...]
-ipa[=name[,name...]]             [-ipa=name,...]

The default is all routines in the function universe. You can specify this by any inline_from (ipa_from) option, subject to the looplevel setting.

In-lining and IPA are off by default, that is, if no in-lining (IPA) options are specified and no in-lining (IPA) directives are found in the source code, no in-lining (IPA) is performed.

If you omit inline (ipa) from the command line, automatic selection of routines to in-line is disabled. You can manually select functions to in-line (analyze) with the –inline_manual (–ipa_manual) options and the inline and ipa pragmas.

If you specify inline (ipa) on the command line without a list of routine names, then all routines in the in-lining (IPA) universe are eligible, subject to the looplevel value.

If you specify inline (ipa) on the command line with a list of routine names, then only the listed routines are eligible, subject to the looplevel value.

for Loop Level

To set a minimum for loop nest level for function call expansion, use:

-inline_looplevel[=n]          [-inll[=n]]
-ipa_looplevel[=n]             [-ipall[=n]]

Use the looplevel option to limit in-lining and interprocedural analysis to just functions that are referenced in nested loops, where the reduced function call overhead or enhanced optimization will be multiplied.

The argument is defined from the most deeply nested leaf of the call tree.

The default, 2, restricts in-lining (interprocedural analysis) to the best-seeming candidate routines.

For example:

main
{
  ...
   a();  ------>  a() {...}
}
  ..
 for (..) {
   for (..) {
    b();  --------->  b() {
   }                    for (..) {
 }                        for (..) {
                            c();  -------> c() {...}
                          }
                        }
                      }

The call to b is inside a doubly nested loop and is more profitable to expand than the call to a. The call to c is quadruply nested, so in-lining c yields the biggest gain of the three.

The argument is defined from the most deeply nested function reference:

–inline_looplevel=1 


Only the functions referenced in the most deeply nested call site(s) may be expanded (function c in the previous example). If more than one function call is at the same loop-nest level, all of them are selected when that level is included.

–inline_looplevel=2 


Only function calls at the most deeply nested level and one loop less deeply nested may be expanded.

–inline_looplevel=3 


Level 3 is required to in-line function b, since its call is two loops less nested than the call to function c.

A value of 3 or greater causes c to be in-lined into b, then the new b to be in-lined into the main program.

–inline_looplevel (or –inline_looplevel=large number) 


A large number permits in-lining at any nesting level. The calling tree written to the listing file with –listoptions=c includes the nesting depth level of each call in each program unit and the aggregate nesting depth (the sum of the nesting depths for each call site, starting from the main program). Use this information to identify the best functions for in-lining.

A function that passes the looplevel test is in-lined everywhere it is used, even places that are not in deeply nested loops. If some, but not all, invocations of a function are to be expanded, use the inline and ipa pragmas just before each function call that is to be expanded (see the next section).

Because in-lining increases the size of the code, the extra paging and cache contention can actually slow down a program. Restricting in-lining to functions used in for loops multiplies the benefits of eliminating function call overhead for a given amount of code space expansion. (If in-lining appears to slow an application, investigate the problem using IPA, which has little effect on code space and the number of temporary variables.)

Manual Control

To instruct PCA to recognize the #pragma [no]inline and #pragma [no]ipa directives, use these options:

-inline_manual             [-inm]
-ipa_manual                [-ipam]

This allows manual control over which functions are in-lined/analyzed at which call sites (see the following section, “In-lining Pragmas”).

The default is to ignore these pragmas. To enable these pragmas, include –inline_manual (–ipa_manual) on the command line.

Since #pragma [no]inline and #pragma [no]ipa are not affected by the looplevel command-line options, you can use them either with or without the command-line control.

In-lining Pragmas

The inline/ipa pragmas tell PCA to in-line (or perform interprocedural analysis on) the named functions. The syntax is:

#pragma [no]inline [here][routine][global] [(name[,name...])]
#pragma [no]ipa [here][routine][global] [(name[,name...])]

These pragmas tell PCA whether or not to in-line/analyze the named functions. These pragmas combine next-line, entire routine, and global (entire program) scope. If you omit these optional elements, all functions referenced on the next line of code that are in the in-lining/analyzing universe are in-lined on that one line.

These pragmas are disabled by default. Enable them with the –inline_manual and –ipa_manual command-line options. They are independent of the other in-lining and IPA command-line options, and you can use them instead of, or in addition to, command-line controlled in-lining.

Keywords: here, routine, and global

The keywords, here, routine, and global are described below.

here 

If you include the scope keyword here, or if you don't specify any scope, the pragma applies only to the next statement.

routine 

If you include the scope keyword routine, the pragma applies to the rest of the routine, or until a corresponding no appears. (Or, if the first pragma was a noinline (noipa), until the corresponding inline (ipa) pragma.)

global 

If you include the scope keyword global, or if the pragma appears before any lines of source code, the pragma applies to the entire file, or until toggled with the corresponding no pragma. (Or, if the first pragma was a noinline (noipa), until the corresponding inline (ipa) pragma.) Typically, global pragmas appear only at the top of the source file. The same routine name may not appear in both global in-lining and global IPA lists, either by pragmas or the inline (ipa) options.

These keywords must appear in lowercase, as function names are case sensitive. The optional names are function names. If any functions are named in the directive, it applies only to them. If no function names are given, the pragma applies to all functions. The parentheses around the function names are not required if the list of function names is empty.

If a #pragma inline or #pragma ipa names a routine not in the universe, a warning message is issued, and the pragma is ignored.

Listing File Additions

You can print the calling tree with the –listoptions=c option.

–listoptions=c

The optional calling tree includes the loop-nest depth level of each function call. The metric uses the convention of the –inline_looplevel and –ipa_looplevel options. The farthest-out leaf is 1, and higher values trace back to the main program.

In-lining/IPA Examples

The following code examples demonstrate a few of the possibilities for using the features described in this chapter. Because PCA undergoes constant enhancement, the code that your version of PCA produces may not be identical to the code in these examples. The temporary variable names, in particular, can change without substantially altering the transformed code.

Unless otherwise noted, the following examples were run with the –optimize and –scalaropt options set to:

-o=0 -so=0

to show the in-lining more clearly. If you specify nonzero values, the functions are first in-lined or analyzed, and then the concurrentization/ dusty-deck transformations (see Chapter 3, “PCA Command-Line Options”) are applied. In some cases, C preprocessor additions or code modifications were removed to make the examples simpler.

In-lining Example–Same Source File

The following example demonstrates in-lining both with –inline=matm (only the function matm will be in-lined), and with –inline (both functions are in-lined). The PCA output includes optimized versions of both functions, in addition to the expanded main program. An example source file follows:

void example_8_4_1 ()
{
    int i, n;
    double a[200][200], b[200][200], c[200][200];
    double cksum, matm();
    setup (b, 200);
    setup (c, 200);
    for (n=25; n<200; n=n+25) {
        cksum = matm (n, a, b, c);
        printf ("For N=  %d   checksum= %q \\n", n, cksum);
    }
}
void setup (double e[200][200], int n)
{
    int i, j;
    for (i=0; i<n; i++)
        for (j=0; j<n; j++)
            e[i][j] = ((i + 7*j) % 10) / 10.0;
    return;
}
double matm (int n, double a[200][200], double b[200][200], double c[200][200])
{
    int i, j, k;
    for (i=0; i<n; i++)
        for (j=0; j<n; j++) {
            a[i][j] = 0.0;
            for (k=0; k<n; k++)
                a[i][j] = a[i][j] + b[i][k]*c[k][j];
        }
    return (a[3][5]);
}

This is the main program generated by –inline=matm:

void example_8_4_1(  )
{
    int i, n;
    double a[200][200];
    double b[200][200];
    double c[200][200];
    double cksum;
    double matm( );
    setup( b, 200 );
    setup( c, 200 );
    for ( n = 25; n<=199; n+=25 ) {
        cksum = matm( n, a, b, c );
        printf( "For N=  %d   checksum= %q \\n", n, cksum );
    }
}
void setup( double e[][200], int n )
{
    int i;
    int j;
    for ( i = 0; i<n; i++ ) {
        for ( j = 0; j<n; j++ ) {
            e[i][j] = ((i + j * 7) % 10) / 10.0;
        }
    }
    return ;
}
double matm( int n, double a[][200], double b[][200], double c[][200] )
{
    int i;
    int j;
    int k;
    double _Kdd1;
    for ( i = 0; i<n; i++ ) {
        for ( j = 0; j<n; j++ ) {
            a[i][j] = 0.0;
            _Kdd1 = a[i][j];
            for ( k = 0; k<n; k++ ) {
                _Kdd1 +=  b[i][k] * c[k][j];
            }
            a[i][j] = _Kdd1;
        }
    }
    return a[3][5];
}

This is the output generated by –inline:

void example_8_4_1(  )
{
    int i;
    int n;
    double a[200][200];
    double b[200][200];
    double c[200][200];
    double cksum;
    double matm( );
    setup( b, 200 );
    setup( c, 200 );
    for ( n = 25; n<=199; n+=25 ) {
        cksum = matm( n, a, b, c );
        printf( "For N=  %d   checksum= %q \\n", n, cksum );
    }
}
void setup( double e[][200], int n )
{
    int i;
    int j;
    for ( i = 0; i<n; i++ ) {
        for ( j = 0; j<n; j++ ) {
            e[i][j] = ((i + j * 7) % 10) / 10.0;
        }
    }
    return ;
}
double matm( int n, double a[][200], double b[][200], double c[][200] )
{
    int i;
    int j;
    int k;
    double _Kdd1;
    for ( i = 0; i<n; i++ ) {
        for ( j = 0; j<n; j++ ) {
            a[i][j] = 0.0;
            _Kdd1 = a[i][j];
            for ( k = 0; k<n; k++ ) {
                _Kdd1 +=  b[i][k] * c[k][j];
            }
            a[i][j] = _Kdd1;
        }
    }
    return a[3][5];
}

In-lining Example with a Library

The next example demonstrates the creation of a library and in-lining functions from it, a two-step process.

First step: Create the library.

The file subfil.c contains these two functions:

extern double sin (double);
#pragma no side effects (sin)
void mkcoef (double coef[], int n)
{
    int i;
    for (i=0; i<n; i++)
        coef[i] = 1.0 / (i + 1);
}
double yval (double x, double coef[], int n)
{
    double sum;
    int i;
    sum = 0.0;
    for (i=0; i<n; i++)
        sum = sum + coef[i] * sin ((i + 1) * x);
    return (sum);
}

Run the file through the C preprocessor to create the file subfil.cpp:

/usr/lib/cpp subfil.c > subfil.cpp

Then execute the PCA command:

/usr/lib/pca  -inline_create=subfil.klib   -list=subfil.L subfil.cpp

This creates a library file with the two functions, and a listing file subfil.L, which contains only a list of routines and whether or not each was saved in the library:

function mkcoef -- saved
function yval -- saved

Second step: Inline the functions into a calling program.

The file sqwv.c contains the main program:

void example_8_4_2 ()
{
    double coef[15], y[2000], yval();
    int i;
    mkcoef (coef, 15);
    for (i=0; i<2000; i++)
        y[i] = yval ((i + 1) * 0.001 * 3.14159, coef, 15);
    for (i=0; i<2000; i=i+10)
       printf ("%f %f %f %f %f %f %f %f %f %f \\n",y[i],y[i+1],
y[i+2], y[i+3], y[i+4], y[i+5], y[i+6], y[i+7], y[i+8],
y[i+9]);
}

Run the commands:

/usr/lib/cpp sqwv.c > sqwv.cpp
/usr/lib/pca -infl=subfil.klib -o=0 -d=0 sqwv.cpp \
-cmp=sqwv.cmp

This puts the following into the file sqwv.cmp:

void example_8_4_2(  )
{
    double coef[15];
    double y[2000];
    double yval( );
    int i;
    double _Kdd1[129];
#pragma padding(_Kdd1)
#pragma storage order(y, _Kdd1, coef)
    mkcoef( coef, 15 );
    for ( i = 0; i<=1999; i++ ) {
        y[i] = yval( (i + 1) * 0.001 * 3.14159, coef, 15 );
    }
    for ( i = 0; i<=1999; i+=10 ) {
        printf( "%f %f %f %f %f %f %f %f %f %f \\n", y[i], y[i+1], y[i+2], y[i+3], y[i+4], y[i+5], y[i+6], y[i+7], y[i+8], y[i+9]
             );
    }
}

In the previous example, all other optimizations were turned off to show the expansion more clearly. If you specify non-zero values for the –optimize, –scalaropt, and –roundoff options, PCA first in-lines the routines, then performs the optimizations in the usual manner.

IPA Example

In the following example, the variables n and np1 have a simple relationship. This relationship is hidden behind a function call, however, so PCA normally will not try to concurrentize the loop in the main program.

When you specify the –ipa=rxgfs command-line option, PCA will inspect the named function for information on the relationship of its arguments and returned value and the surrounding code. The assumed dependence is lifted, and the loop can be safely concurrentized.

If a function cannot be in-lined (this simple one can be), or if you don't want to in-line it, it can often still be analyzed for its effects on the calling routine.

The next example looks like this:

void example_8_4_3 ()
{
    int np1, i, m, n;
    int a[100][100];
    np1 = rxgfs(n);
    for (i=0; i<m; i++) {
        a[i][n] = a[i-1][np1];
    }
}
int rxgfs(int n)
{
    return (n+1);
}

When run with the default values for –optimize and –scalaropt, the example becomes (the function is not shown):

void example_8_4_3(  )
{
    int np1;
    int i;
    int m;
    int n;
    int a[100][100];
    np1 = rxgfs( n );
    for ( i = 0; i<m; i++ ) {
        a[i][n] = a[i-1][np1];
    }
}
int rxgfs( int n )
{
    return (n + 1);
}

Notes on In-lining and IPA

You may perform either in-lining or interprocedural analysis in a PCA run. If you want to in-line some routines and use IPA for others, you must do this in two PCA runs.

  • Routines to be in-lined must pass all the criteria (–inline=–inline_looplevel) to be in-lined. (See the following section for the exception to this rule.)

  • The #pragma [no]inline and #pragma [no]ipa directives, when enabled, override the in-lining/IPA command-line options.

  • A #pragma inline global directive without a function name list instructs PCA to in-line every function it can regardless of the –inline and –inline_looplevel settings.

  • A #pragma noinline global directive instructs PCA not to in-line anything, regardless of the –inline and –inline_looplevel settings.

No in-lining or interprocedural analysis will be performed if the primary source file is stdin. (See the description of the –input command-line option in Chapter 3, “PCA Command-Line Options” for more information on specifying the primary source file.)

When you specify a library with –inline_from_libraries, routines may be taken from that library for in-lining into the source code. No attempt is made to in-line routines from the source file into routines from the library.

For example, if the main program calls function bb, which is in the library, and bb calls function dd, which is in the source file, then bb can be in-lined into the main program, but PCA will not attempt to in-line dd into the text from library routine bb.

A library created with –inline_create will work for in-lining or IPA, since it is just partially reduced source code, but a library made with –ipa_create may not appear in a –inline_from_libs=list. It is flagged with a warning message.

In-lining and interprocedural analysis are slow, memory-intensive activities. Using –inline_looplevel (in-line all available functions everywhere they are used) for a large set of in-linable routines for a large source file can absorb significant system resources. For most programs, specifying a small value for –inline_looplevel and/or a small number of routines with –inline= will provide most of the benefits of in-lining. (Specifying a small value also applies to the corresponding IPA options.)

Conditions That Inhibit In-lining

This section lists conditions that inhibit the in-lining of functions, whether from a library or source file. (See the preceding section for notes on the use of the in-lining command-line options and pragmas.) Conditions that inhibit in-lining include:

  • unresolved name conflicts (which usually indicate an incorrect program)

  • a function that is too long (> 600 lines)