This chapter provides additional information about the PCA command-line options and in-line pragmas that you can use to inline functions and perform interprocedural analysis.
In-lining is the process of replacing a function reference with the text of the function. This process eliminates the overhead of the function call, and can assist other optimizations by making relationships between function arguments, returned values, and the surrounding code easier to find.
Interprocedural analysis is the process of inspecting called functions for information on relationships between arguments, returned values, and global data. This process can provide many of the benefits of in-lining without replacing the function reference.
Table 7-1 lists the in-lining options.
In-lining–Purpose | Long Name | Short Name | Default Value |
|---|---|---|---|
which to in-line |
|
|
|
Specify routine to in-line | inline[=name[,name...]] | inl[=names] | off |
Create preprocessed library | inline_create=lib.klib | incr=lib.klib | off |
Define inlinable routines | inline_from_files=list | inff=list | current source file |
Specify library from | inline_from_libs=list | infl=list | off |
Specify call nest level | inline_depth[=n] | ind[=n] | ind=2 |
Specify for loop-nest level | inline_looplevel[=n] | inll[=n] | inll=2 |
Specify manual control | inline_manual | inm | off |
Table 7-2 lists the IPA options.
Table 7-2. Interprocedural Analysis Options
IPA–Purpose | Long Name | Short Name | Default Value |
|---|---|---|---|
which to do IPA |
|
|
|
Specify routine to analyze | ipa[=name[,name...]] | ipa[=names] | off |
Create preprocessed library | ipa_create=lib.klib | ipacr=lib.klib | off |
Define routines for IPA | ipa_from_files=list | ipaff=list | current source file |
Specify library from | ipa_from_libs=list | ipafl=list | off |
Specify for loop-nest level | ipa_looplevel[=n] | ipall[=n] | ipall=2 |
for IPA |
|
|
|
Specify manual control | ipa_manual | ipam | off |
The rest of this chapter covers the in-lining and interprocedural analysis command-line options and pragmas, related command-line options, examples of their use, and information on program constructs that inhibit in-lining. In-lining and interprocedural analysis are symmetrical from the command-line standpoint–you use related sets of commands and pragmas for them. (Many places that say in-lining apply to both in-lining and interprocedural analysis.)
In-lining has two phases:
Define the universe of in-linable routines.
Select which routines in that universe to in-line or analyze.
The from_files and from_libs options define the universe of in-linable routines. The inline, ipa, and looplevel options select which of the available routines are to be in-lined/analyzed. The create options set up collections of routines for inclusion in later PCA runs.
The subsections that follow define the syntax for in-lining and interprocedural analysis command-line options. The short forms of their names appear in square brackets ([ ]).
The inline_from and ipa_from options take the following form:
-inline_from_files=list [-inff=list] -inline_from_libraries=list [-infl=list] -ipa_from_files=list [-ipaff=list] -ipa_from_libraries=list [-ipafl=list] |
where list is one or more of the following:
source file name
library file name
directory
Separate each item in the list by commas. Do not use shell wild card characters in the list of files and directories. The default is current source file. Different types of files are distinguished by their extensions. For example:
-inline_from_files=xj.c,yy.c,../mrtn |
looks for routines in the C source files xj.c and yy.c, and in C source files in the directory ../mrtn. (Including the directory ../mrtn is equivalent to the UNIX® notation ../mrtn/*.c). All source files that contain C preprocessor directives must be preprocessed by the cc compiler before being in-lined or analyzed.
The from_libraries versions of these options take as their arguments lists of function libraries and directories containing such libraries.
PCA recognizes the type of file by its extension, or lack of one (see Table 7-3 for the file types).
File Extension | Type of File |
|---|---|
.c | C source file |
.klib | Library from inline/ipa_create |
other | Directory |
Two special abbreviations are:
| dash (–) | A dash specifies the current source file (as listed on the command line, or specified in a –input=file command-line option). | |
| period ( .) | A period specifies the current working directory. |
Specifying a nonexistent file or directory is a command-line error.
If you specify multiple from_files and from_libraries options, their lists are concatenated to get a bigger universe.
Routine name references are resolved by a search in the order that files appear in from_files and from_libraries options on the command line. Libraries are searched in their original lexical order. Multiple from_files and from_libraries lists are searched in the order in which they appear on the command line.
To create a preprocessed library, use the following syntax:
-inline_create=library_name.klib [-incr=lib_name.klib] -ipa_create=library_name.klib [-ipacr=lib_name.klib] |
To specify a library file to in-line from, use:
-inline_from_libraries=list [–infl=list] -ipa_from_libraries=list [–ipafl=list] |
The default source for routines to put into the library is the current source file. If you specify inline_from (ipa_from), the routines in the listed files are the ones put into the library. This provides a method to combine or expand libraries. Just include the old library(ies) and any new file(s) in an inline_from (ipa_from) option.
Routines are included in libraries in the order in which they appear in the input file(s). This order guarantees that if multiple routines with the same name are in the same source file, the one chosen for in-lining will be the one you expect from the algorithm under inline_from, described previously.
A library created with inline_create will work for in-lining or IPA, since it is just partially reduced source code. However, a library made with ipa_create may not appear in an –inline_from=list. Such use is flagged with a warning message.
If no library name is given, the name used is file.klib, where file is the input file name with any trailing .c stripped off.
When creating a library, only one create option may be given. That is, only one library may be created per PCA run. If the library file existed prior to running PCA, it is overwritten. When you specify this option on the command line, no transformed code file will be generated. See the previous description of the from_libraries options for information on using libraries created with these options.
If you don't specify an inline (ipa) option, the default is to include all the functions in the source file in the library, if possible. See “Conditions That Inhibit In-lining” later in this chapter for a list of conditions that can prevent a function from being in-lined.
An example of in-lining from the library created above is included in the section of examples later in this chapter.
To specify the names of particular routines to in-line, use:
-inline[=name[,name...]] [-inl=name,...] -ipa[=name[,name...]] [-ipa=name,...] |
The default is all routines in the function universe. You can specify this by any inline_from (ipa_from) option, subject to the looplevel setting.
In-lining and IPA are off by default, that is, if no in-lining (IPA) options are specified and no in-lining (IPA) directives are found in the source code, no in-lining (IPA) is performed.
If you omit inline (ipa) from the command line, automatic selection of routines to in-line is disabled. You can manually select functions to in-line (analyze) with the –inline_manual (–ipa_manual) options and the inline and ipa pragmas.
If you specify inline (ipa) on the command line without a list of routine names, then all routines in the in-lining (IPA) universe are eligible, subject to the looplevel value.
If you specify inline (ipa) on the command line with a list of routine names, then only the listed routines are eligible, subject to the looplevel value.
To set a minimum for loop nest level for function call expansion, use:
-inline_looplevel[=n] [-inll[=n]] -ipa_looplevel[=n] [-ipall[=n]] |
Use the looplevel option to limit in-lining and interprocedural analysis to just functions that are referenced in nested loops, where the reduced function call overhead or enhanced optimization will be multiplied.
The argument is defined from the most deeply nested leaf of the call tree.
The default, 2, restricts in-lining (interprocedural analysis) to the best-seeming candidate routines.
For example:
main
{
...
a(); ------> a() {...}
}
..
for (..) {
for (..) {
b(); ---------> b() {
} for (..) {
} for (..) {
c(); -------> c() {...}
}
}
}
|
The call to b is inside a doubly nested loop and is more profitable to expand than the call to a. The call to c is quadruply nested, so in-lining c yields the biggest gain of the three.
The argument is defined from the most deeply nested function reference:
A function that passes the looplevel test is in-lined everywhere it is used, even places that are not in deeply nested loops. If some, but not all, invocations of a function are to be expanded, use the inline and ipa pragmas just before each function call that is to be expanded (see the next section).
Because in-lining increases the size of the code, the extra paging and cache contention can actually slow down a program. Restricting in-lining to functions used in for loops multiplies the benefits of eliminating function call overhead for a given amount of code space expansion. (If in-lining appears to slow an application, investigate the problem using IPA, which has little effect on code space and the number of temporary variables.)
To instruct PCA to recognize the #pragma [no]inline and #pragma [no]ipa directives, use these options:
-inline_manual [-inm] -ipa_manual [-ipam] |
This allows manual control over which functions are in-lined/analyzed at which call sites (see the following section, “In-lining Pragmas”).
The default is to ignore these pragmas. To enable these pragmas, include –inline_manual (–ipa_manual) on the command line.
Since #pragma [no]inline and #pragma [no]ipa are not affected by the looplevel command-line options, you can use them either with or without the command-line control.
The inline/ipa pragmas tell PCA to in-line (or perform interprocedural analysis on) the named functions. The syntax is:
#pragma [no]inline [here][routine][global] [(name[,name...])] #pragma [no]ipa [here][routine][global] [(name[,name...])] |
These pragmas tell PCA whether or not to in-line/analyze the named functions. These pragmas combine next-line, entire routine, and global (entire program) scope. If you omit these optional elements, all functions referenced on the next line of code that are in the in-lining/analyzing universe are in-lined on that one line.
These pragmas are disabled by default. Enable them with the –inline_manual and –ipa_manual command-line options. They are independent of the other in-lining and IPA command-line options, and you can use them instead of, or in addition to, command-line controlled in-lining.
The keywords, here, routine, and global are described below.
These keywords must appear in lowercase, as function names are case sensitive. The optional names are function names. If any functions are named in the directive, it applies only to them. If no function names are given, the pragma applies to all functions. The parentheses around the function names are not required if the list of function names is empty.
If a #pragma inline or #pragma ipa names a routine not in the universe, a warning message is issued, and the pragma is ignored.
The following code examples demonstrate a few of the possibilities for using the features described in this chapter. Because PCA undergoes constant enhancement, the code that your version of PCA produces may not be identical to the code in these examples. The temporary variable names, in particular, can change without substantially altering the transformed code.
Unless otherwise noted, the following examples were run with the –optimize and –scalaropt options set to:
-o=0 -so=0 |
to show the in-lining more clearly. If you specify nonzero values, the functions are first in-lined or analyzed, and then the concurrentization/ dusty-deck transformations (see Chapter 3, “PCA Command-Line Options”) are applied. In some cases, C preprocessor additions or code modifications were removed to make the examples simpler.
The following example demonstrates in-lining both with –inline=matm (only the function matm will be in-lined), and with –inline (both functions are in-lined). The PCA output includes optimized versions of both functions, in addition to the expanded main program. An example source file follows:
void example_8_4_1 ()
{
int i, n;
double a[200][200], b[200][200], c[200][200];
double cksum, matm();
setup (b, 200);
setup (c, 200);
for (n=25; n<200; n=n+25) {
cksum = matm (n, a, b, c);
printf ("For N= %d checksum= %q \\n", n, cksum);
}
}
void setup (double e[200][200], int n)
{
int i, j;
for (i=0; i<n; i++)
for (j=0; j<n; j++)
e[i][j] = ((i + 7*j) % 10) / 10.0;
return;
}
double matm (int n, double a[200][200], double b[200][200], double c[200][200])
{
int i, j, k;
for (i=0; i<n; i++)
for (j=0; j<n; j++) {
a[i][j] = 0.0;
for (k=0; k<n; k++)
a[i][j] = a[i][j] + b[i][k]*c[k][j];
}
return (a[3][5]);
}
|
This is the main program generated by –inline=matm:
void example_8_4_1( )
{
int i, n;
double a[200][200];
double b[200][200];
double c[200][200];
double cksum;
double matm( );
setup( b, 200 );
setup( c, 200 );
for ( n = 25; n<=199; n+=25 ) {
cksum = matm( n, a, b, c );
printf( "For N= %d checksum= %q \\n", n, cksum );
}
}
void setup( double e[][200], int n )
{
int i;
int j;
for ( i = 0; i<n; i++ ) {
for ( j = 0; j<n; j++ ) {
e[i][j] = ((i + j * 7) % 10) / 10.0;
}
}
return ;
}
double matm( int n, double a[][200], double b[][200], double c[][200] )
{
int i;
int j;
int k;
double _Kdd1;
for ( i = 0; i<n; i++ ) {
for ( j = 0; j<n; j++ ) {
a[i][j] = 0.0;
_Kdd1 = a[i][j];
for ( k = 0; k<n; k++ ) {
_Kdd1 += b[i][k] * c[k][j];
}
a[i][j] = _Kdd1;
}
}
return a[3][5];
}
|
This is the output generated by –inline:
void example_8_4_1( )
{
int i;
int n;
double a[200][200];
double b[200][200];
double c[200][200];
double cksum;
double matm( );
setup( b, 200 );
setup( c, 200 );
for ( n = 25; n<=199; n+=25 ) {
cksum = matm( n, a, b, c );
printf( "For N= %d checksum= %q \\n", n, cksum );
}
}
void setup( double e[][200], int n )
{
int i;
int j;
for ( i = 0; i<n; i++ ) {
for ( j = 0; j<n; j++ ) {
e[i][j] = ((i + j * 7) % 10) / 10.0;
}
}
return ;
}
double matm( int n, double a[][200], double b[][200], double c[][200] )
{
int i;
int j;
int k;
double _Kdd1;
for ( i = 0; i<n; i++ ) {
for ( j = 0; j<n; j++ ) {
a[i][j] = 0.0;
_Kdd1 = a[i][j];
for ( k = 0; k<n; k++ ) {
_Kdd1 += b[i][k] * c[k][j];
}
a[i][j] = _Kdd1;
}
}
return a[3][5];
}
|
The next example demonstrates the creation of a library and in-lining functions from it, a two-step process.
The file subfil.c contains these two functions:
extern double sin (double);
#pragma no side effects (sin)
void mkcoef (double coef[], int n)
{
int i;
for (i=0; i<n; i++)
coef[i] = 1.0 / (i + 1);
}
double yval (double x, double coef[], int n)
{
double sum;
int i;
sum = 0.0;
for (i=0; i<n; i++)
sum = sum + coef[i] * sin ((i + 1) * x);
return (sum);
}
|
Run the file through the C preprocessor to create the file subfil.cpp:
/usr/lib/cpp subfil.c > subfil.cpp |
Then execute the PCA command:
/usr/lib/pca -inline_create=subfil.klib -list=subfil.L subfil.cpp |
This creates a library file with the two functions, and a listing file subfil.L, which contains only a list of routines and whether or not each was saved in the library:
function mkcoef -- saved function yval -- saved |
The file sqwv.c contains the main program:
void example_8_4_2 ()
{
double coef[15], y[2000], yval();
int i;
mkcoef (coef, 15);
for (i=0; i<2000; i++)
y[i] = yval ((i + 1) * 0.001 * 3.14159, coef, 15);
for (i=0; i<2000; i=i+10)
printf ("%f %f %f %f %f %f %f %f %f %f \\n",y[i],y[i+1],
y[i+2], y[i+3], y[i+4], y[i+5], y[i+6], y[i+7], y[i+8],
y[i+9]);
}
|
Run the commands:
/usr/lib/cpp sqwv.c > sqwv.cpp /usr/lib/pca -infl=subfil.klib -o=0 -d=0 sqwv.cpp \ -cmp=sqwv.cmp |
This puts the following into the file sqwv.cmp:
void example_8_4_2( )
{
double coef[15];
double y[2000];
double yval( );
int i;
double _Kdd1[129];
#pragma padding(_Kdd1)
#pragma storage order(y, _Kdd1, coef)
mkcoef( coef, 15 );
for ( i = 0; i<=1999; i++ ) {
y[i] = yval( (i + 1) * 0.001 * 3.14159, coef, 15 );
}
for ( i = 0; i<=1999; i+=10 ) {
printf( "%f %f %f %f %f %f %f %f %f %f \\n", y[i], y[i+1], y[i+2], y[i+3], y[i+4], y[i+5], y[i+6], y[i+7], y[i+8], y[i+9]
);
}
}
|
In the previous example, all other optimizations were turned off to show the expansion more clearly. If you specify non-zero values for the –optimize, –scalaropt, and –roundoff options, PCA first in-lines the routines, then performs the optimizations in the usual manner.
In the following example, the variables n and np1 have a simple relationship. This relationship is hidden behind a function call, however, so PCA normally will not try to concurrentize the loop in the main program.
When you specify the –ipa=rxgfs command-line option, PCA will inspect the named function for information on the relationship of its arguments and returned value and the surrounding code. The assumed dependence is lifted, and the loop can be safely concurrentized.
If a function cannot be in-lined (this simple one can be), or if you don't want to in-line it, it can often still be analyzed for its effects on the calling routine.
The next example looks like this:
void example_8_4_3 ()
{
int np1, i, m, n;
int a[100][100];
np1 = rxgfs(n);
for (i=0; i<m; i++) {
a[i][n] = a[i-1][np1];
}
}
int rxgfs(int n)
{
return (n+1);
}
|
When run with the default values for –optimize and –scalaropt, the example becomes (the function is not shown):
void example_8_4_3( )
{
int np1;
int i;
int m;
int n;
int a[100][100];
np1 = rxgfs( n );
for ( i = 0; i<m; i++ ) {
a[i][n] = a[i-1][np1];
}
}
int rxgfs( int n )
{
return (n + 1);
}
|
You may perform either in-lining or interprocedural analysis in a PCA run. If you want to in-line some routines and use IPA for others, you must do this in two PCA runs.
Routines to be in-lined must pass all the criteria (–inline=–inline_looplevel) to be in-lined. (See the following section for the exception to this rule.)
The #pragma [no]inline and #pragma [no]ipa directives, when enabled, override the in-lining/IPA command-line options.
A #pragma inline global directive without a function name list instructs PCA to in-line every function it can regardless of the –inline and –inline_looplevel settings.
A #pragma noinline global directive instructs PCA not to in-line anything, regardless of the –inline and –inline_looplevel settings.
No in-lining or interprocedural analysis will be performed if the primary source file is stdin. (See the description of the –input command-line option in Chapter 3, “PCA Command-Line Options” for more information on specifying the primary source file.)
When you specify a library with –inline_from_libraries, routines may be taken from that library for in-lining into the source code. No attempt is made to in-line routines from the source file into routines from the library.
For example, if the main program calls function bb, which is in the library, and bb calls function dd, which is in the source file, then bb can be in-lined into the main program, but PCA will not attempt to in-line dd into the text from library routine bb.
A library created with –inline_create will work for in-lining or IPA, since it is just partially reduced source code, but a library made with –ipa_create may not appear in a –inline_from_libs=list. It is flagged with a warning message.
In-lining and interprocedural analysis are slow, memory-intensive activities. Using –inline_looplevel (in-line all available functions everywhere they are used) for a large set of in-linable routines for a large source file can absorb significant system resources. For most programs, specifying a small value for –inline_looplevel and/or a small number of routines with –inline= will provide most of the benefits of in-lining. (Specifying a small value also applies to the corresponding IPA options.)
This section lists conditions that inhibit the in-lining of functions, whether from a library or source file. (See the preceding section for notes on the use of the in-lining command-line options and pragmas.) Conditions that inhibit in-lining include: