Chapter 4. Power C Analyzer Directives

You can use directives to provide additional information about a program that PCA cannot derive from its analysis of the program. Although you can use PCA without directives, they improve the optimization results. Directives provide information only. However, PCA notes the information in a directive and takes that information into consideration when trying to identify data dependencies. Table 4-1 lists PCA directives and their durations.

Table 4-1. PCA Directives

PCA Directive

Duration

#pragma serial

next loop

#pragma concurrent

next loop

#pragma concurrent call

next loop

#pragma set chunksize (n)

next loop

#pragma set numthreads (n)

next loop

#pragma set schedtype (type)

next loop

#pragma no side effects (name[name...])

program unit

#pragma distinct (name,name[name...])

program unit

#pragma arl(n)

selectable

#pragma inline [here][routine][global] [(name[name..])]

selectable

#pragma ipa [here][routine][global] [(name[name..])]

selectable

#pragma padding (variable list)

program unit

#pragma storage order (variable list)

program unit

To understand how PCA interprets directives, first consider “assumed” data dependences. For example, consider the loop:

for (i=0; i<n; i++) X[i] = X[i-1] + X[m];

In this loop, X is an array, n and m are scalars, and nothing is known about the relationship between n and m. Two types of data dependencies occur. Between X[i] and X[i-1] there is a forward dependence, and the distance is known to be 1. Between X[i] and X[m], PCA tries to find a relation but cannot, because it does not know the value of m in relation to n. The second dependence is called an assumed dependence, because it is assumed to exist but cannot be proven to exist.

If you know that the assumed data dependency was incorrect, you can tell PCA so by using a directive. If no definite data dependencies exist, PCA can convert the loop to run in parallel.

Use caution when using a directive because PCA cannot check the truth of an assertion implied by the directive. If you make an untrue assertion, PCA may run a data-dependent loop in parallel. This situation is very dangerous, because such code can intermittently produce the wrong answer.

The following sections describe each of these directives.

#pragma serial

This directive forces the loop immediately following it to be serial, and restricts optimization by forcing all enclosing loops to also be serial. The syntax for this directive is:

#pragma serial

PCA can still optimize loops that are inside the serial loop, but not enclosing the serial loop. Consider the code:

  for (i=0; i<N; i++)
    for (j=0; j<N; j++) {
#pragma serial
      for (k=0; k<N; k++)
         x[i][j][k] = x[i][j][k] * y[i][j];
      for (k=0; k<N; k++)
         x[i][j][k] = x[i][j][k] + z[i][k];
     }

The directive forces the i and j loops, and the first k loop to be serial. PCA can still optimize the second k loop, but it does not distribute (interchange) the i or j loops to try to get an optimizable loop. PCA always honors the #pragma serial directive. This directive is in effect only for the next loop.

#pragma concurrent

Use the #pragma concurrent directive to tell PCA to ignore assumed dependences in the following loop. The syntax for this directive is:

#pragma concurrent

If the loop contains definite dependencies in addition to the assumed dependencies, PCA does not convert the loop to run in parallel. In this case, the example on the previous example would be left serial, because it has a known dependence.


Note: PCA does not generate code that executes in parallel (concurrently) if you use the –noconcurrentize command line option.

This directive is in effect only for the next loop.

#pragma concurrent call

Use the #pragma concurrent call directive to tell PCA that the function calls in the following loop are safe to execute in parallel.

The syntax for this directive is:

#pragma concurrent call

PCA ignores all potential data dependences due to the function argument(s). This directive applies only to the immediately following loop and not to any nested or surrounding loops. Put a #pragma concurrent call directive before each concurrentizable loop with function references. Be sure that the functions called do not introduce data dependencies.

A better way to concurrentize a loop with function calls is to use either #pragma no side effects or interprocedural analysis. IPA directs PCA to determine the true data dependencies and not rely on user assessment. IPA is explained in Chapter 7, “In-lining and Interprocedural Analysis.”

#pragma set chunksize, #pragma set numthreads, and #pragma set schedtype

These pragmas tell PCA which values to use for chunksize, numthreads, and schedtype.

The syntax for each of these directives is:

#pragma set chunksize (n)
#pragma set numthreads (n)
#pragma set schedtype (type)

For chunksize, the range of values for n is 1 to 1,000,000. For numthreads, the range of values for n is 1 to 255. If PCA sees values larger than these, it will assume the maximum and generate a warning message. If PCA sees values smaller that 1, it will generate a warning message and ignore the pragma.

The schedtype types are:

  • simple

  • dynamic

  • interleave

  • gss

  • runtime

Refer to xx #pragma parallel for a complete description of num-threads, and #pragma pfor for descriptions of chunksize and schedtype.

#pragma no side effects

C functions frequently produce more information than just the returned value. Changing values of arguments via pointers or arrays, changing global data, and I/O can make a function unsafe to run concurrently.

The #pragma no side effects directive tells PCA to assume that all of the named functions are safe to execute concurrently. This means that the functions perform no I/O and that they modify only local variables.

The syntax for this directive is:

#pragma no side effects ( name [,name...] )

If you pass pointers or array names to the function and use this directive, PCA assumes that the memory locations they represent are not modified. The functions named must be declared before the directive.

#pragma arl

Use #pragma arl (address resolution level) to control the assumptions PCA makes about memory aliases. The syntax for this directive is:

#pragma arl(n)

where n is the level of control. Table 4-2 describes the levels of control.

Table 4-2. Address Resolution Levels, #pragma arl

Value

Description

0

Make no assumptions about memory aliases.

1

Assume there are no pointer self-references (the default).

2

Assume function arguments are distinct from each other.

3

Assume local pointer/arrays are distinct from global pointers/arrays.

4

Assume all pointers/arrays are distinct from each other.

The directive has the same meaning as the –arl command-line option. See Chapter 3, “PCA Command-Line Options” for more information on this option and the levels of control.

When this directive appears inside a function (between the outer { and } of a function definition), it applies only to that function. If the directive appears outside a function, it sets the default value to be used for all functions that follow.

The command-line option is equivalent to a pragma at the beginning of the file and is thus overridden by other #pragma arl directives in the file.

#pragma distinct

Use #pragma distinct to indicate that two objects do not overlap.

The syntax for this directive is:

#pragma distinct (expr1,expr2[,expr3,expr4...])

where

expr1, expr2... represent objects.

The form of the expressions allowed is:

id 

a variable

*id 

what a pointer variable points to

id [] 

the array whose name is id

All variables involved must be previously declared. For example, for pointer p and array a, you can assert:

#pragma distinct (*p, a[])

if *p never overlaps with a[i] for any i used in the program.

The range of the directive is the function where it was made and all succeeding functions. If the assertion is made about local variables or parameters, it will have no effect beyond the immediate function. These variables cannot be used outside the immediate function.

#pragma inline and #pragma ipa

Use the inline and ipa directives to select manually which function(s) to in-line or perform interprocedural analysis on and at which call sites. The syntax is:

#pragma [no]inline [here][routine][global] [( name[,name...])]
#pragma [no]ipa [here][routine][global] [( name[,name...])]

If either of these directives appears with a name list, all occurrences of the named functions will be in-lined/analyzed, if possible, in all references within the scope of the directive. If the directive appears without a list of functions, all function references are eligible. (See Chapter 7, “In-lining and Interprocedural Analysis” for more information about these pragmas.)

The no forms turn off in-lining and IPA of the named function(s). The scope keywords are interpreted as:

here 

applies only to the next statement

routine 

applies to the rest of the program unit

global 

applies to the rest of the input file

You can terminate the routine and global scopes by the corresponding no directives. (Or terminate a noinline directive with an appropriate inline directive.)

These pragmas can override the –inline, –ipa, –inline_looplevel, and –ipa_looplevel command-line options. You can use #pragma inline and #pragma ipa in addition to, or in place of, command-line controlled in-lining/interprocedural analysis.


Note: The inline_man or ipa_man command-line option must be specified for the corresponding directive to be enabled (see Chapter 7, “In-lining and Interprocedural Analysis” for more information).


Memory Management pragmas

PCA supports two memory management directives, #pragma padding and #pragma storage order. PCA uses these output directives to pass information on data layout to the compiler or to itself (if you are using PCA to process a program interactively). If PCA processes a program more than once, it will use the information in the directives inserted in previous runs to direct its cache usage optimizations.

#pragma padding

Use the padding directive to identify the listed arrays and scalar variables as objects which PCA created for the purpose of data alignment. PCA uses this directive when it reprocesses a program; the compiler will ignore this directive. The syntax of #pragma padding is:

#pragma padding (variable1 [, variable2 …])

The following rules govern the use of the padding directive.

  • You can use more than one padding directive within a single program unit.

  • The padding directive will be placed immediately after the declarations section of the program unit (the main function or called function).

  • A padding object can be routine-local or external.

  • A padding object can not be a dummy argument to the procedure or function.

#pragma storage order

The storage order directive specifies the relative order in which storage should be allocated for the listed routine-local variables and arrays. PCA can reduce cache collisions by positioning the arrays correctly. The C compiler currently ignores the storage order directive.

The syntax of #pragma storage order is:

#pragma storage order (variable1 [, variable2 …])

The rules governing the use of #pragma storage order are:

  • You can use more than one storage order directive per program unit. Each directive can be interpreted separately.

  • The storage order directives will be placed directly after the declaration section of the program unit.

  • An object listed in a storage order must be local to the program unit.

  • An object listed in a storage order must not be:

    • mentioned in another storage order directive

    • an external variable or array

    • a dummy argument to the procedure or function

PCA can generate as many #pragma storage order directives as it considers useful.

To interpret a storage order directive, the compiler must place the named objects in memory in the order listed. For example:

   float a1[100], a2[3], a3[200]
#pragma storage order (a1,a2,a3)

On a machine with 4 bytes per float variable, the compiler would place the variables as follows:

  • a1 would be placed at some address X.

  • a2 would be placed at X + 100*4.

  • a3 would be placed at X + 100*4 + 3*4

Note that both static and automatic storage schemes are allowed, so as long as all of the objects in a single storage order are placed in the same scheme.

The padding and storage order directives often appear together, as in the following example.

   double _Kdd13[770];
   double _Kdd14[770];
#pragma padding(_Kdd14, _Kdd13)
#pragma storage order(c, _Kdd13, b, _Kdd14, a)

Parallelizing Loops that Deal with Linked Lists

When dealing with elements of linked lists, PCA allows you to parallelize:

  • loops in which each iteration processes a different member of the list and the computations for each element are independent of each other, that is, they can be computed in any order.

  • loops in which each iteration processes a different member of the list and the computations for each element are to a large extent independent of each other, but have a small portion of the code which has to be processed in the order in which the elements appear in the list.

Two pragmas support these two cases: #pragma plist, and #pragma ordered.

#pragma plist

Syntax:

#pragma plist unordered (list vars.; initialize shared;  initialize
local; condition; increment) for (initialize shared, initialize
local; condition; increment)
{
   ... /* loop body */
}

#pragma ordered

Syntax:

#pragma plist ordered (list vars.; initialize shared;   initialize
local; condition; increment) for (initialize shared, initialize
local; condition; increment)
{
   ... /* unordered loop body 1 */
#pragma ordered
   {
      ... /* ordered loop body */
   }
   ... /* unordered loop body 2 */
}


Note: For both the above cases, the increment operation can be performed anywhere without any side effect other than that of modification of the loop variable.

The following example shows an unordered loop, which uses #pragma plist unordered, and an ordered loop, which uses #pragma ordered.

#define N 100 #define LOOP 10
#define ERROR 5
typedef struct st_1 *sptr;
struct st_1 {
   sptr next;
   int data;
};
sptr head;
main ()
{
   sptr list = 0;
   double sum2;
   int cnt;
   int i, j, k, t;
   double sum1;
   double psum;
   int pcnt;
   int error = 0;
   head = (sptr) malloc (sizeof (struct st_1));
   head->data = N;
   for (list = head, i = 1; i < N; i++) {
      list->next = (sptr) malloc (sizeof (struct st_1));
      list = list->next;
      list->data = (N - i);
   }
   list->next = 0;
   sum1 = 0;
   for (list = head, i = 0; list; i++, list = list->next) {
      if (list->data != N - i) {
         printf ("Mismatch: i = %d, data = %d\n", i, list->data);
         break;
      }
      sum1 += list->data;
   }
   printf ("SUM1 = %le\n", sum1);
   for (i = 0; (i < LOOP) && (error < ERROR); i++) {
      sum2 = 0;
      cnt = 0;
#pragma parallel shared (list, head, sum2, cnt) local (psum, pcnt, t)
      {
#pragma plist ordered (list; list=head, sum2 = 0, cnt = 0; psum = 0, pcnt = 0; list; list = list->next;)
         for (list= head, sum2 = 0, cnt = 0, psum = 0, pcnt = 0;
             list;
             list = list->next)
         {
#pragma ordered
            {
               pcnt++;
               psum += list->data;
            }
         }
#pragma critical
         {
            sum2 += psum;
            cnt += pcnt;
            printf ("sum2 = %le, psum = %le, pcnt = %d\n",
                     sum2, psum, pcnt);
         }
      }
      if (sum2 != sum1) {
         error++;
         printf ("ERROR: i = %d, count = %d, SUM2 = %le\n",i,
                 cnt, sum2);
      }
      printf ("\n");
   }
}