Appendix A. Sample Programs

The programs in this appendix illustrate the use of some of the features discussed in the book. The following programs are included:

Mapping and Reading the Cycle Counter

This section contains two example programs. The first simply reports the precision of the hardware cycle counter. The second demonstrates mapping and reading the cycle counter.

Testing Cycle Counter Precision.

The program in Example A-1 is a simple utility that gets the cycle counter precision using syssgi() and displays it. The timer precision (in bits, either 32 or 64) is displayed to standard output. Also, the precision is returned by the program, so it can be tested in a shell script in the $status shell variable.

Example A-1. Program to Return Cycle Counter Precision


/*****************************************************************************
||
|| This program makes the value returned by syssgi(SGI_CYCLECNTR_SIZE)
|| accessible at the command line.  The output display can be read, or
|| tested in a shell script.  The value is also returned, so it can
|| be tested in the $status variable.
||
*****************************************************************************/
#include <sys/syssgi.h>     /* for syssgi(), SGI_QUERY_CYCLECNTR */
#include <stdio.h>
int main(int argc, char *argv[])
{
    unsigned int tbc = syssgi(SGI_CYCLECNTR_SIZE);
    int arg, quiet = 0;
    for (arg=1; arg<argc; ++arg)
    {
        if (0==strcmp(argv[arg],"-q"))
        {
            quiet = 1;
        }
        else /* includes case of -h */
        {
            printf("%s [-h | -q]\n",argv[0]);
            printf("\tReport the precision of the hardware cycle counter.\n");
            printf("\tPrecision in bits displayed to stdout unless -q.\n");
            printf("\tPrecision in bits returned as status.\n");
            return tbc;
        }
    }
    if (!quiet)
        printf("%d bits in the cycle counter\n",tbc);
    return tbc;
}

Reading the Cycle Counter

The program in Example A-2 shows how to map the high-precision cycle counter into memory and sample it. The file compiles to a library of the following functions:

mapTheTimer()

Uses mmap() to map the cycle counter into the address space. Returns the unit-value of the timer in picoseconds; for example returns 21000 in a Challenge where the timer unit value is 21 nanoseconds.

timerBitCount()

Returns the number of bits of precision in the timer, which varies with the CPU board type, either 32 or 64 bits.

 

readTimer32()

Returns the least-significant (or only) word of the timer value.

readTimer64()

Returns the timer value as a 64-bit unsigned integer (extended with 0-bits when necessary).

main()

Compiled only when variable UNIT_TEST is set, contains code to exercise the preceding functions.


Example A-2. Functions to Map and Read the Cycle Counter


/*****************************************************************************
||
|| The functions in this module provide access to the free-running timer
|| on the CPU board of certain SGI systems.
||
|| timerBitCount()
||
|| Returns the number of bits of data in the timer, as reported
|| by syssgi(SGI_CYCLECNTR_SIZE):
||    0 error reported by syssgi -- probably no timer in this machine
||   32 in an Indy or Crimson
||   64 in a Challenge, Onyx, and other big machines.
||
|| mapTheTimer()
||
|| This function tests the hardware environment. If the current system has
|| a timer, the function tries to map it into memory.  Errors can include:
||   * 0 returned by timerBitCount()
||   * error returned by syssgi(SGI_QUERY_CYCLECNTR)
||   * error returned by mmap(2)
|| When there is no error, the function returns a positive integer which is
|| the number of picoseconds represented by one unit increment of the timer.
|| In the event of an error, the function returns 0, and errno is set to
|| some error code.
||   mapTheTimer() can be called multiple times without harm.  To convert
|| its returned value to a fraction of a second, convert to double and
|| multiply by 1e-12.
||
|| readTimer32()
||   
|| This function calls mapTheTimer(), if it has not been called already.
|| Thus the first attempt to read the clock will map it if necessary.
|| If the timer has been mapped, its least-significant bits are returned
|| as an unsigned 32-bit integer.
||   * if mapTheTimer() failed, the returned value is always 0
||   * if the timer has 32-bit precision, the returned value is
||     the whole timer value
||   * if the timer has 64-bit precision (e.g. Challenge), the returned
||     value is the low-order word.
||
|| readTimer64()
||
|| This function is like readTimer32(), except that it returns an unsigned
|| 64-bit integer.
||   * if mapTheTimer() failed, the returned value is always 0
||   * if the timer has 32-bit precision, the returned value is
||     the whole timer value, extended with high-order 0-bits
||   * if the timer has 64-bit precision, the returned value is the whole
||     timer value.  The 64-bit timer is sampled in such as way as to 
||     compensate for rollover while minimizing bus traffic.
||
|| main()
||
|| Compiled only when UNIT_TEST is defined, provides a functional test
|| platform for the above functions.
||
|| NOTE: in two of these routines we assume that this machine is operating
|| in big-endian mode, such that the least-significant 32 bits of a 
|| long-long are at the higher word address.
||
*****************************************************************************/
#include <stddef.h>         /* for NULL */
#include <fcntl.h>          /* for O_RDONLY and open() */
#include <unistd.h>         /* for getpagesize() */
#include <sys/mman.h>       /* for constants used with mmap() */
#include <sgidefs.h>        /* for __psint_t, __uint*_t, and ABI defs */
#include <sys/syssgi.h>     /* for syssgi(), SGI_QUERY_CYCLECNTR */
#include <errno.h>          /* for errno global */
/*****************************************************************************
|| The following globals are set up by mapTheTimer() the first time called.
||   timerMapAddress == NULL means mapTheTimer() has never been called
||                   == -1 means mapTheTimer() called and failed
||                   else it points to the timer in memory
||   The data type (void *) is coerced to __uint32_t or __uint64_t in use.
||
|| The "volatile" declaration keeps the compiler from optimizing away 
|| successive references to it.
||
||   timerPicoSecs   == 0 means the timer has not been mapped successfully
||                   else is the value returned by syssgi(QUERY_CYCLECOUNTER)
||
||   timerPrecision  == value returned by syssgi(SGI_CYCLECNTR_SIZE), 
||                   but as this value is needed in the timer-reading
||                   functions, it is cached, so as to avoid a system call
||                   every time we read the clock.
||
|| If this code was redone in C++ (not a bad idea, feel free) these would
|| be class variables.
*****************************************************************************/
#define TIMER_IS_MAPPED (0 != timerPicoSecs)
#define TIMER_MAP_ATTEMPTED (NULL != timerMapAddress)
static volatile void * timerMapAddress = NULL;
static unsigned int timerPicoSecs = 0;
static unsigned int timerPrecision = 0;
unsigned int
mapTheTimer()
{
__uint32_t  timerUnits = 0; /* receives timer picosecond unit value */
__psint_t   timerPhysAddr;  /* receives timer absolute address */
__psint_t   timerPhysVPN;   /* timerPhysAddr masked to a page boundary */
__psint_t   addrMask;       /* page offset bit mask */
int         fdMem;          /* file descriptor for /dev/mmem */
    if ( ! TIMER_MAP_ATTEMPTED) /* first time through this code */
    {
        /*
        || Get the physical address of the clock in full. If there
        || is no cycle counter on this machine, syssgi returns -1.
        */
        timerPhysAddr = syssgi(SGI_QUERY_CYCLECNTR, &timerUnits);
        if ((__psint_t)-1 != timerPhysAddr) /* we have a timer */
        {
            /*
            || Trim out the offset from the address leaving the
            || page number part of the address. (VPN == virtual page number)
            */
            addrMask = getpagesize() - 1;
            timerPhysVPN = timerPhysAddr & ~addrMask;
            /*
            || Map the page containing the clock's address into the virtual
            || address space of this process.
            */
            fdMem = open("/dev/mmem", O_RDONLY);
            timerMapAddress = (void *) mmap(
                NULL,               /* addr = 0, don't care it goes */
                addrMask,           /* len = pagesize - 1 */
                PROT_READ,          /* prot = read-only */
                MAP_PRIVATE,        /* changes are unshared (n.a.) */
                fdMem,              /* map base is physical memory */
                (off_t)timerPhysVPN /* source address to map */
                );
            if ((__psint_t)-1 != (__psint_t)timerMapAddress)
            {
                /*
                || mmap() succeeded, cache info in global variables.
                */
                timerPicoSecs = timerUnits;
                timerPrecision = syssgi(SGI_CYCLECNTR_SIZE);
                /*
                || Restore any nonzero offset bits to mapped page address.
                */
                timerMapAddress = (void*) (
                    ((__psint_t)timerMapAddress) /* addr as int */
                    | (timerPhysAddr & addrMask)  /* plus offset bits */
                    );
            }
            else
                ; /* mmap() failed, timerMapAddress == -1, errno set */
        } /* end syssgi() successful */
        else
        {
            timerMapAddress = (void *)-1; /* syssgi error, no timer (?) */
        }
    } /* end attempting to initialize */
    return timerPicoSecs;
}
unsigned int
timerBitCount()
{
    if (TIMER_IS_MAPPED)
        return timerPrecision;
    if ( ! TIMER_MAP_ATTEMPTED)
    {
        mapTheTimer();
        return timerPrecision;
    }
    else return 0;
}
/*****************************************************************************
||
|| In both of the following routines, one goal is to minimize the number of
|| references to the mapped timer.  Reason: each such reference is an 
|| uncached memory reference plus a bus access, taking at least 1 usec and
|| possibly more depending on the machine.  Unnecessary references to the
|| timer should be avoided when possible.
||
|| If the timer has 64 bits, return its least-significant word. Which word
|| is that?  This code assumes the big-endian model.  An alternative
|| would be to load the long-long value and force C to convert it.  That is
|| be portable but would hit the bus twice instead of once, nullifying the
|| speed advantage that this routine has over the one following.
||
*****************************************************************************/
__uint32_t
readTimer32()
{
__uint32_t ret = 0;

    if ( ! TIMER_IS_MAPPED ) mapTheTimer();
    if ( TIMER_IS_MAPPED ) /* timer mapped ok */
    {
        if (64 == timerPrecision)
            ret = ((__uint32_t *)timerMapAddress)[1]; /* low word of 2 */
        else /* in IRIX 6.2, 32 bits is the only alternative */
            ret = *((__uint32_t *)timerMapAddress);
    }
    return ret;
}
/*****************************************************************************
||
|| When the timer has 32 bits, just fake up a long-long and return it.
|| For long timers we must ask: was this code compiled to an ABI that does
|| atomic loads of long-longs (-64 or -n32), or not (-32)?
|| In the newer ABIs, we just fetch the 64-bit timer in one move.
||
|| When compiled under a 32-bit system, the generated code loads the timer
|| value in two "lw" instructions.  The low word of the timer overflows into
|| the high word about every 90 seconds, and if that happens between the
|| lw's, the result will be wrong.  Worse, we cannot be certain which of the
|| two words the compiler will choose to load first, the low or the high.
||
|| In order to minimize the number of uncached accesses, we test for
|| overflow only when it has recently happened; that is, when
|| the most significant 9 bits of the low word are all-0.  This
|| condition defines a window of 0.17 seconds following the overflow
|| (21e-12 * 2^23 == .176160768).
|| If this were kernel code, the window could be much smaller.  In enabled
|| code we have to allow for a series of interrupts between the load of the
|| upper and lower words.  As it is, if we load the upper word just before
|| overflow, and an interrupt delays the next fetch 0.17+ seconds, we will
|| return an incorrect value.
||
*****************************************************************************/
__uint64_t
readTimer64()
{
union {
    struct { __uint32_t msw,lsw; }w; 
    __uint64_t ll;
    } ret;

    ret.ll = 0;
    if ( ! TIMER_IS_MAPPED ) mapTheTimer();
    if ( TIMER_IS_MAPPED ) /* it mapped ok */
    {
        if (timerPrecision == 32)
        {
            ret.w.msw = 0;
            ret.w.lsw = *((__uint32_t *)timerMapAddress);
        }
        else
        {
#if (_MIPS_SIM == _MIPS_SIM_NABI32 || _MIPS_SIM == _MIPS_SIM_ABI64)
            /* 64-bit loads are atomic */
            ret.ll = *(__uint64_t *)timerMapAddress;
#else /* 64-bit loads are not atomic */
            ret.w.msw = ((__uint32_t *)timerMapAddress)[0];
            ret.w.lsw = ((__uint32_t *)timerMapAddress)[1];
            if ( (ret.w.lsw & 0xff800000) == 0)
            {
            /*
            || The high word incremented not more than .17 sec ago.
            || Provided there is not a delay here exceeding 89.8 sec,
            || the following single load ensures we have the high word
            || value that is correctly associated with the low word
            || we already picked up.
            */
                ret.w.msw = ((__uint32_t *)timerMapAddress)[0];
            }
#endif
        }
    }
    return ret.ll;
}

#ifdef UNIT_TEST
#include <stdio.h>

int main(int argc, char*argv[])
{
    int     j;
    int     numTix = 10;
    unsigned int picosecs;
    unsigned short tbits;
    double  dmicsecs;
        
    if (argc>1) numTix = atoi(argv[1]);

    if ( picosecs = mapTheTimer() )
    {
        tbits = timerBitCount();
        dmicsecs = ((double)picosecs)/1e6;
        printf("The timer has %d bits of precision\n",tbits);
        printf("One timer unit == %d picoseconds or %g us\n",
                                    picosecs, dmicsecs);
    }
    else
    {
        perror("mapTheTimer");
        return errno;
    }

    {
        __uint32_t st1, st2, stx;
        st1 = readTimer32();
        printf("\nreading timer as 32 bits\n\n");
        for(j=0; j<numTix; ++j)
        {
            st2 = readTimer32();
            stx = st2 - st1;
            printf("0x%0x - 0x%0x = 0x%0x (%g usecs)\n",
                    st2,     st1,    stx, (stx*dmicsecs) );
            st1 = st2;
        }
    }

    {
        __uint64_t lt1, lt2, ltx;
        lt1 = readTimer64();
        printf("\nreading timer as 64 bits\n\n");
        for(j=0; j<numTix; ++j)
        {
            lt2 = readTimer64();
            ltx = lt2 - lt1;
            printf("0x%0llx - 0x%0llx = 0x%0llx (%g usecs)\n",
                    lt2,        lt1,      ltx, (ltx * dmicsecs));
            lt1 = lt2;
        }
    }
}
#endif

Getting the Time of Day Stamp

The program in Example A-3 tests the precision of the time of day stamp returned by gettimeofday(). The function getTODdiff() contains an example call to gettimeofday().

Example A-3. Program to Exercise gettimeofday()


#include <sys/time.h>
#include <stdio.h>
#define LOOPS 1000
/*
 * This function loops on gettimeofday() until the returned time
 * changes by more than 1 microsecond, then reports the difference.
 * 
 * We look for a change of >1 usec because in some older systems
 * (apparently not in 6.2) IRIX, in order to ensure that gettimeofday()
 * never returns the same value twice while not actually updating the
 * timer, just adds 1 usec on each call until a normal dispatching
 * tick occurs. In 6.2 systems, IRIX actually recalculates the timer
 * on each call.
 * 
 * The function also updates a maximum loop-count value.
 */
long getTODdiff(int * pMaxLoops)
{
    long first, second;
    int nloops = 0;
    struct timeval tod;
    struct timezone tz;
    gettimeofday(&tod, &tz);
    first = tod.tv_usec;
    do
    {
        gettimeofday(&tod, &tz);
        second = tod.tv_usec;
        ++nloops;
    } while (first == (second-nloops));
    if (first > second)
        second += 1000000;
    if (pMaxLoops)
        if (nloops > *pMaxLoops)
            *pMaxLoops = nloops;
    return second - first;
}
int main(int argc, char *argv[])
{
    int j, limit;
    int maxLoops = 0;
    long sample, sum, min, max;
    double mean;
    
    limit = LOOPS;
    if (argc > 1)
        limit = atoi(argv[1]);
    /* get past the first call, which is likely to be short */
    sum = getTODdiff(NULL);
    
    /* exercise gettimeofday a few times */
    for (j=0, sum=0, min=999999, max=0; j< limit; j++)
    {
        sample = getTODdiff(&maxLoops);
        sum += sample;
        if (sample > max) max = sample;
        if (sample < min) min = sample;
    }
    mean = sum/LOOPS;
    printf("gettimeofday() increments: min=%ld, mean=%g, max=%ld\n",
                                           min,      mean,    max);
    if (maxLoops > 1)
        printf("Max number of loops %d\n", maxLoops);
    else
        printf("Honest timer update in this system.\n");
    return 0;
}

Interprocess Communication

The program in Example A-6 illustrates the use of some of the interprocess communication (IPC) features of IRIX, in particular:

The program models a real-time data-collection program. The main process establishes an arena. Within the arena it creates a data structure that defines and manages a ring buffer. Then the main process uses sproc() to create three processes:

  • inputProcess() generates random-integer “input data” and stores it in the ring buffer. To simulate an unpredictable and varying input rate, the process “receives” bursts of from 1 to 16 data items. The average input rate is calculable (see the commentary in the code).

    The number of items to generate can be specified on the command line as the -c option followed by the count. The default is 2000 items. After generating that many items, inputProcess() waits until all data has been consumed, then terminates.

  • outputProcess()—of which two instances are created—takes data from the ring buffer. To simulate a steady average output rate, each process sets a repeating itimer and takes one data item each time the timer expires. The itimer interval represents the simulated “processing time” of a data item. This interval can be specified on the command line as the -t option followed by the interval in microseconds. The default is 10,000 (10 milliseconds per item per process, an output rate of 200 items/second).

After starting the three processes, the main process waits for one to terminate. When there are no errors, inputProcess() is the first and only process to terminate—the two outputProcess() instances end up blocked on a semaphore, waiting for more data.

The main process kills the remaining processes; then displays the metering information from the lock and semaphores, and terminates the program.

The three simulated real-time processes communicate through two semaphores and a lock.

  • Semaphore semRBdata represents the number of data items now in the ring buffer. inputProcess() does the V operation, increasing the semaphore count with each input datum; outputProcess() does the P operation, decreasing the count with each output.

  • Semaphore semRBspace represents the number of empty slots in the ring buffer. inputProcess() does the P operation to acquire an empty slot, and outputProcess() does the V operation when it releases a slot.

  • Lock lockRBupdate represents the right to alter the ring buffer index values. All processes set this lock before modifying the ring buffer, and clear it afterward.

The displayed metering data at the end of the program shows whether the output processes could keep up with the input process. It is necessary to run the program with a nondegrading real-time priority to get consistent results. The output in Example A-4 shows a case in which output did not keep up.

Example A-4. Producer/Consumer Program Test 1


# npri -h 39 ./ringBuffer -t 20000
Lock lockRBupdate acquired 4004 times, 4004 without waiting (100%)
Metering info on sema semRBdata
  P: 2004, 2000 with no wait (99%)
  V: 2002, 2 with P waiting (0%)
Metering info on sema semRBspace
  P: 2002, 1423 with no wait (71%)
  V: 2002, 579 with P waiting (28%)

In Example A-4, look first at the P operations for semRBspace. 71% of the time, when inputProcess() applies uspsema() to this semaphore to acquire a slot in the ring buffer, it does not wait. However, 29% of the time it did wait, meaning that the ring buffer was full and no free slots were available until an outputProcess() released one. Clearly, the output processes were not keeping up with the input data rate.

Example A-5. Producer/Consumer Program Test 2


# npri -h 39 ./ringBuffer -t 5000
Lock lockRBupdate acquired 4004 times, 4004 without waiting (100%)
Metering info on sema semRBdata
  P: 2004, 1565 with no wait (78%)
  V: 2002, 437 with P waiting (21%)
Metering info on sema semRBspace
  P: 2002, 2002 with no wait (100%)
  V: 2002, 0 with P waiting (0%)

Example A-5 shows a test run in which the output processes did keep up with the input rate. In every case, inputProcess() was able to acquire a slot from semRBspace without waiting. 22% of the time, when an outputProcess() tried to acquire a data item from semRBdata, it had to wait, meaning the ring buffer was empty. (This percentage would be higher if inputProcess() did not frequently dump blocks of 2-16 items into the buffer.)

Example A-6. Producer/Consumer Program Demonstrating IPC Functions


#include <stdlib.h> /* for getopt() */
#include <signal.h>
#include <sys/time.h>
#include <ulocks.h>
#include <math.h> /* for random() and srandom() */
#include <sys/types.h> /* for pid_t */
#include <sys/wait.h> /* for wait() */
/*
|| The following declarations define a structure that controls a ring buffer.
|| 
||  rbElem_t    the type of thing that is stored in the buffer
||  RB_MAXELS   the size of the ring buffer
||  rbStruct_t  control and serialization items for the buffer
||
||  The buffer and structure are built together in an arena, and the 
||  address of the structure is the arena info (usgetinfo()).
*/
typedef long rbElem_t;  /* can be any scalar, but assumed long below */
#define RB_MAXELS 160   /* specify enough to buffer the peak data rate */
typedef struct rbStruct {
    rbElem_t * theBuffer;   /* -> [RB_MAXELS] of rbElem_t */
    usema_t * semRBdata;    /* -> semaphore for buffered data */
    usema_t * semRBspace;   /* -> semaphore for open buffer slots */
    ulock_t * lockRBupdate; /* -> lock on the following words */
    int rbGet;              /* theBuffer[rbGet] is next live data */
    int rbPut;              /* theBuffer[rbPut] is next empty slot */
} rbStruct_t;
/*
|| The following constants are default values for the global parameters.
|| See the prologs of the inputProcess and outputProcess functions.
*/
#define MAX_BURST 16 /* data rate average 25*16/2 == 200/sec */
#define FINAL_COUNT ((MAX_BURST/2)*25)*10 /* run for ~10 seconds */
#define OUTPUT_TIME 10000 /* 10 ms delay or 100 Hz */
/*
|| The following global variables are input parameters to the
|| child processes. They are set by main() from command line arguments.
*/
static int outputTimer = OUTPUT_TIME;   /* -t argument */
static int inputCount = FINAL_COUNT;    /* -c argument */
/*
|| Allocate the arena and initialize it with the ring buffer structure.
|| If any error occurs, report it and return NULL.  The following filename
|| is used. It must address a writeable directory. If the file already
|| exists, it must be writeable to this process.
*/
#define ARENA_FILE "/var/tmp/ring.buffer.arena"
usptr_t *
allocRBStuff()
{
    usptr_t * arena;
    rbStruct_t * rbs;
    int okSoFar = 1;
    /*
    || Announce that we want metering info stored with our locks.
    */
    if (-1 == usconfig(CONF_LOCKTYPE, US_DEBUGPLUS) )
    {
        perror("usconfig(CONF_LOCKTYPE)");
        return NULL;
    }
    /*
    || Create the arena.
    */
    if ( NULL == (arena = usinit(ARENA_FILE) ) )
    {
        perror("usinit");
        return NULL;
    }
    /*
    || From here on, a failure means we must call usdetach().
    */
    rbs = (rbStruct_t *)uscalloc(1, sizeof(rbStruct_t), arena);
    if (!(okSoFar=(0 != rbs)))
    {
        fprintf(stderr, "Unable to allocate anything in arena\n");
    }
    else
    {
        rbs->theBuffer = 
            (rbElem_t *)uscalloc(RB_MAXELS, sizeof(rbElem_t), arena);
        if (!(okSoFar=(0 != rbs->theBuffer)))
            fprintf(stderr, "Unable to allocate ring buffer in arena\n");
    }
    if (okSoFar)
    { /* value of semRBdata is 0 because no data in buffer yet */       
        rbs->semRBdata = usnewsema(arena, 0);
        if (!(okSoFar=(0!=rbs->semRBdata)))
            perror("usnewsema #1");
    }
    if (okSoFar)
    { /* value of semRBspace is number of empty slots in ring buffer */
        okSoFar = 0 != (rbs->semRBspace = usnewsema(arena, RB_MAXELS) );
        if (!okSoFar)
            perror("usnewsema #2");
    }
    if (okSoFar)
    {
        okSoFar = 0 != (rbs->lockRBupdate = usnewlock(arena));
        if (!okSoFar)
            perror("usnewlock");
    }
    /*
    || Set the semaphores to collect metering information.
    */
    if (okSoFar)
    {
        okSoFar =   (0==( usctlsema(rbs->semRBdata, CS_METERON) ) )
            &&  (0==( usctlsema(rbs->semRBspace, CS_METERON) ) );
        if (!okSoFar)
            perror("usctlsema(METERON)");
    }
    if (okSoFar)
    { /* stow the ring buffer structure as the arena info word */
        usputinfo(arena, (void *)rbs);
    }
    else
    { /* something went wrong, return null */
        usdetach(arena);
        arena = NULL;
    }
    return arena; 
}
/*
|| Put an item into the ring buffer.  This dePletes the count of open
|| slots, and reViVes the count of waiting data.  If the ring buffer is
|| full it blocks until getElement() has been called.  It can also
|| block briefly on the lock if another process is updating the ring.
*/
void 
putElement(rbElem_t value, rbStruct_t * rbs)
{
    uspsema(rbs->semRBspace);       /* dePlete the open slots */
    ussetlock(rbs->lockRBupdate);   /* get exclusive use of rbPut */
    rbs->theBuffer[rbs->rbPut++] = value;
    if (rbs->rbPut >= RB_MAXELS)
        rbs->rbPut = 0;
    usunsetlock(rbs->lockRBupdate); /* release use of lock */
    usvsema(rbs->semRBdata);        /* reViVe the count of active data */
}
/*
|| Fetch an item from the ring buffer. This dePletes the count of
|| waiting data items and reVives the count of open slots.  If the
|| ring buffer is empty it blocks until putElement() is called.
*/
rbElem_t getElement(rbStruct_t * rbs)
{
    rbElem_t ret;
    uspsema(rbs->semRBdata);        /* dePlete the available data */
    ussetlock(rbs->lockRBupdate);   /* get exclusive use of rbGet */
    ret = rbs->theBuffer[rbs->rbGet++];
    if (rbs->rbGet >= RB_MAXELS)
        rbs->rbPut = 0;
    usunsetlock(rbs->lockRBupdate); /* release use of lock */
    usvsema(rbs->semRBspace);       /* reViVe the count of open slots */
    return ret;
}
/*
|| This is the body of the simulated data collection process.
|| The process actually runs at a constant rate of 25 Hz, invoking
|| sginap(4) to pace itself: 100 ticks per second / 4 ticks = 25Hz.
|| However, to simulate "data" received in bursts, it "receives" from
|| 1 to MAX_BURST items per iteration, an average of MAX_BURST/2,
|| for an average data rate of (25*MAX_BURST/2) items/second.
||
|| With MAX_BURST at 16, that gives 200 items/second.
|| This is the average rate the data writers must achieve, and the ring
|| buffer has to take up the slack during long bursts.
|| 
|| At a rough approximation, the probability of a burst of length
|| n*MAX_BURST should be (1/MAX_BURST)^n.  (This means that there is
|| a nonzero probability of a burst of any length whatever, and you
|| cannot make a buffer big enough to completely preclude blockages.)
||
|| However, with MAX_BURST==16 and RB_MAXEL==160, this buffer should
|| overflow once in ~1e-12 times, provided the data writers keep to the rate.
|| 
|| The process executes until it has buffered FINAL_COUNT elements, 
|| then terminates.  main() waits for this, and shuts down the program.
*/
void
inputProcess(void *arena)
{
    rbElem_t datum;
    rbStruct_t * rbs = usgetinfo((usptr_t *)arena);
    int myPid = getpid();
    int counter = inputCount;
    int burst;
    srandom(myPid); /* seed random() */
    do
    {
        sginap(4);
        datum = (rbElem_t) random(); /* ASSUMES rbElem_t is long */
        burst = 1+(datum % MAX_BURST);
        for ( ; burst; --burst)
        {
            putElement(datum, rbs);
            --counter;
        }
    } while (counter > 0);
    /*
    || Kill time until all data has been consumed by the output procs.
    || The semaphore count is positive until all data is consumed, then
    || it becomes negative, -2, when the two output procs are waiting.
    */
    while(ustestsema(rbs->semRBdata) > -2)
    {
        sginap(10);
    }
    /* exit, ending the process and satisfying wait() in main() */
}
/*
|| This is the body of both simulated data-output processes.
|| Two instances of this code are started. The purpose of starting
|| two is merely to complicate the use of the semaphores -- it is
|| not intended to b realistic.
||
|| Each process sets a repeating itimer with an interval of OUTPUT_TIME
|| microseconds.  That constant determines the "output data rate" that
|| can be achieved.  However, due to integer truncation effects in the
|| precision timer routines, you should not expect fine-grained
|| adjustments of this value to be effective. (Not to mention the
|| interference of other processes in the system, even when this
|| program runs with a real-time priority level.)
||
|| The signal handler is empty. The POSIX sigsuspend() call is used
|| to block until the SIGALRM comes. When it comes, the empty handler
|| is called and then control returns from the sigsuspend().
|| Then one data item is fetched from the ring buffer.
||
|| When the input rate averages 200/sec, each output process needs to
|| get signals at a rate of 100/sec, or an interval of 10000 usec.
|| (Tested on an Indy, the interval had to be 2500 usec to work)
|| 
*/
void
uponSigalrm()
{
    return; /* empty handler for SIGALRM */
}
void
outputProcess(void *arena)
{
    rbStruct_t * rbs = usgetinfo((usptr_t *)arena);
    sigset_t alarmSet, emptySet;
    struct sigaction alarmAct = {SA_RESTART, uponSigalrm, 0};
    struct itimerval timer = {{0, 0}, {0, 0}};
    rbElem_t datum;
    /*
    ||  Prepare an empty set of signals to use with sigsuspend().
    */    
    sigemptyset(&emptySet);
    /*
    || Prepare a mask to block SIGALRM, and apply it.
    */
    alarmSet = emptySet;
    sigaddset(&alarmSet, SIGALRM);
    sigprocmask(SIG_BLOCK, &alarmSet, NULL);
    /*
    || Set the action for SIGALRM to the empty handler.
    */
    if (sigaction(SIGALRM, &alarmAct, NULL))
    {
        perror("sigaction");
        return;
    }
    /*
    || If a nonzero "processing time" is specified, set a repeating
    || itimer to deliver SIGALRMs regularly.
    */
    if (outputTimer)
    {
        timer.it_interval.tv_usec = outputTimer;
        timer.it_value.tv_usec = outputTimer;
        if (setitimer(ITIMER_REAL, &timer, NULL))
        {
            perror("setitimer");
            return;
        }
    }
    /*
    || Loop getting successive data items. If a nonzero processing
    || time is specified, wait for a timer pop after each one.
     */
    for (;;)
    {
        datum = getElement(rbs);
        if (outputTimer)
            sigsuspend(&emptySet);
    }
}
/*
|| Subroutine to display metering info about a lock in a more
|| compact form than usdumplock(3)
*/
void
showLockInfo(char *lockName, ulock_t *lock)
{
    lockmeter_t linfo;
    if (0==usctllock(lock,CL_METERFETCH,&linfo))
    {
        int nowaits = linfo.lm_hits - linfo.lm_spins;
        int nwpct = (100 * nowaits) / linfo.lm_hits;
        printf("Lock %s acquired %d times, %d without waiting (%d%%)\n",
               lockName,   linfo.lm_hits,   nowaits,         nwpct );
    }
    else
        printf("No metering info for lock %s\n",lockName);
}
/*
|| Subroutine to display metering info about a semaphore.
*/
void
showSemaInfo(char *semaName, usema_t *sema)
{
    semameter_t sinfo;
    if (0==usctlsema(sema,CS_METERFETCH,&sinfo))
    {
        int pct, nwait;
        printf("Metering info on sema %s\n",semaName);
        pct = (100 * sinfo.sm_phits) / sinfo.sm_psemas;
        printf("  P: %d, %d with no wait (%d%%)\n",
                 sinfo.sm_psemas, sinfo.sm_phits, pct);
        nwait = sinfo.sm_vsemas - sinfo.sm_vnowait;
        pct = (100 * nwait)/sinfo.sm_vsemas;
        printf("  V: %d, %d with P waiting (%d%%)\n",
            sinfo.sm_vsemas, nwait,        pct);
    }
    else
        printf("No metering info for sema %s\n",semaName);
}
/*
|| The main() function:
||      * Gets the arguments, if any.
||      * Sets up the arena.
||      * Starts the 3 processes.
||      * Waits for the outputProcess to terminate.
||      * Dumps the lock and semaphore info.
||      * Detaches the arena and unlinks its file.
*/
main(int argc, char**argv)
{
    pid_t kids[3];
    usptr_t * arena = allocRBStuff();
    rbStruct_t *rbs;
    int c;
    /*
    || Check that the arena and structures allocated OK.
    */
    if (!arena)
        return -1; /* allocation failed, message issued */
    rbs = usgetinfo(arena);
    /*
    || get command line arguments for input count and output delay
    */
    while (EOF != (c = getopt(argc, argv, "c:t:")))
    {
        switch (c)
        {
        case 'c':
            inputCount = atoi(optarg);
            break;
        case 't':
            outputTimer = atoi(optarg);
            break;
        case '?':
            printf("usage: [-c input data count] [-t output time usec]\n");
            return -2;
            break;
        }
    }
    /*
    || Create the inputProcess (simulated data collection).
    */
    kids[0] = sproc(inputProcess, PR_SALL, (void *)arena);
    if (-1 == kids[0])
    {
        perror("sproc(outputProcess)");
        return -1;
    }
    /*
    || Create the 2 outputProcesses (simulated data reduction).
     */
    kids[1] = sproc(outputProcess,  PR_SALL,  (void *)arena);
    if (-1 == kids[1])
    {
        perror("sproc(inputProcess 1)");
        return -1;
    }
    kids[2] = sproc(outputProcess,  PR_SALL,  (void *)arena);
    if (-1 == kids[2])
    {
        perror("sproc(inputProcess 2)");
        return -1;
    }
    /*
    || Wait until a child process (don't care which) ends.
    */
    wait(0);
    /*
    || Display the metering information from the lock and semaphores.
    */
    showLockInfo("lockRBupdate",rbs->lockRBupdate);
    showSemaInfo("semRBdata",rbs->semRBdata);
    showSemaInfo("semRBspace",rbs->semRBspace);
    /*
    || Clean up: terminate the 2 output procs (which are probably
    || blocked on semRBdata at this time).  Then detach the arena
    || and unlink its file.
    */
    kill(kids[1],SIGTERM);
    kill(kids[2],SIGTERM);
    printf("\ndetaching arena file\n");
    usdetach(arena);
    unlink(ARENA_FILE);
    return 0;
}

Probing the Address Space

The sample program in Example A-7 uses some generally unsafe coding tricks to get the addresses of segments for text, stack, library DSO and mapped data. It demonstrates the use of mmap() with /dev/zero, for default and absolute segment addresses.

Example A-7. Program That Explores the Address Space


#include <stddef.h>     /* for standard malloc(3C) */
#include <unistd.h>     /* for sbrk(2) */
#include <stdio.h>      /* for printf */
#include <sys/types.h>  /* for __psint_t */
/* include <sys/stat.h> */
#include <sys/fcntl.h>  /* for O_RDWR */
#include <sys/mman.h>   /* for mmap(2) */
#define DISPLAY(v,t) {printf("%s:\t%0lx\n",t,(__psint_t)v);}
int main()
{
    /*
    || Get a mask that truncates an address to a page boundary.
    */
    __psint_t psize = getpagesize();
    __psint_t pmask = ~(psize-1);
    /*
    || Get a file descriptor for the nothing device.
    || Use that FD to map two segments of memory containing 00.
    */
    int zero = open("/dev/zero",O_RDWR);
    void * zmap1 = mmap(0,16384,PROT_WRITE,MAP_SHARED,zero,0);
    void * zmap2 = mmap(0,16384,PROT_WRITE,MAP_SHARED,zero,0);
    /*
    || Map one segment at a designated address reserved for
    || user maps by the MIPS ABI.
    */
    void * abi_map = (char *)mmap((void *)0x30040000L,16384,
                PROT_WRITE,MAP_SHARED+MAP_FIXED,zero, 0);

    /*
    || Get the address of this program.
    */
    char * poke = (char *)((__psint_t)main);
    /*
    || Get some program addresses supplied by ld(1), but note
    || the warnings in end(3C) -- these addresses "have no standard
    || definition" when multiple text/data segments exist.
    */
    extern int _ftext[];
    void * ld_ftext = (void *)_ftext;
    extern int _etext[];
    void * ld_etext = (void *)_etext;
    extern int _fdata[];
    void * ld_fdata = (void *)_fdata;
    extern int _edata[];
    void * ld_edata = (void *)_edata;
    extern int _fbss[];
    void * ld_fbss  = (void *)_fbss;
    extern int _end[];
    void * ld_end   = (void *)_end;
    /*
    || Get the address of some code in the libc DSO.
    */
    void * libc_adr = (void *)fprintf;
    /*
    || Get the current start and end of the heap.
    */
    void * malloc_adr = (void *)malloc((size_t)256);
    void * brk_adr = sbrk(0);
    /*
    || Get the address of an item in our stack space.
    */
    void * stack_adr = (void *)&psize;
    /*
    || Display all the above.
    */
    DISPLAY(psize,"Page size")
    DISPLAY(zmap1,"Mapped segment 1")
    DISPLAY(zmap2,"Mapped segment 2")
    DISPLAY(abi_map,"ABI mapped segment")
    DISPLAY(ld_ftext,"Text starts")
    DISPLAY(ld_etext,"Text ends")
    DISPLAY(ld_fdata,"Initialized data starts")
    DISPLAY(ld_edata,"Initialized data ends")
    DISPLAY(ld_fbss,"Uninitialized starts")
    DISPLAY(ld_end,"Uninitialized ends")
    DISPLAY(malloc_adr,"Heap data starts")
    DISPLAY(brk_adr,"Heap data ends")
    DISPLAY(stack_adr,"Stack data")
    DISPLAY(libc_adr,"Spot in one DSO")
    /*
    || See if we can get away with patching our own text.
    */
    if (!mprotect((void *)(pmask&(__psint_t)poke),psize,PROT_WRITE+PROT_EXEC))
    {
        poke[0] = poke[0];
        printf("I wrote into program text\n");
    }
    else
    {
        perror("mprotect(text)");
    }
}

Deadline Scheduling Subroutines

The following example contains two subroutines that simplify the interface to the schedctl() function for deadline scheduling. If the code is compiled with variable UNIT_TEST defined, it compiles a main() procedure that runs a test. Otherwise it compiles only the functions. A test run of the program resembles the following:

% setDeadline 20 100
schedule pid 0 for 20% of 100ms --> 0
policy DL_ONLY-->0
policy DL_ANY-->0
policy DL_RELEASE-->0

On a uniprocessor, a request for much more than 20% of the CPU is rejected. On a multiprocessor, a request for 98% or 99% is generally successful.

Example A-8. Helper Functions for Using schedctl()


/* 
|| Issue the schedctl(2) calls to set up deadline scheduling, using
|| the simpler interface of npri(1). That is, where schedctl() requires
|| you to set up a structure containing two intervals in nanoseconds,
|| setDeadlinePct() lets you specify an interval in milliseconds and a
|| percentage duty cycle.
||
|| As a bonus, setDeadlinePolicy() is a short way to call for any of
|| the four policies, DL_ONLY, DL_ANY, DL_RELEASE (the rest of the period)
|| and DL_BLOCK (for the rest of the period).
*/
#include <errno.h>
#include <sys/schedctl.h>
#include <stdio.h> /* for stderr, perror */
/*
|| This local function does the arithmetic to convert a count of
|| milliseconds into the fields of a timestruc_t.
*/
static void putMSinTimestruc(timestruc_t *ts, const int milliseconds)
{
    int ms = milliseconds;  
    if (1000 > ms)
        ts->tv_sec = 0;
    else
    { /* set the seconds as well as the nanoseconds */
        ts->tv_sec = ms/1000;
        ms %= 1000;
    }
    /* set the nanoseconds: 1e3*1e6 == 1e9 */
    ts->tv_nsec = ms*1000000;
}
/*
|| Request deadline scheduling for the specified PID (0 for "self"),
|| in terms of a period in milliseconds and a percentage.
*/  
int setDeadlinePct(int pid, int period, int pct)
{
    struct sched_deadline dd = {{0,0},{0,0}};
    int retval;
    putMSinTimestruc(&dd.dl_period, period);
    putMSinTimestruc(&dd.dl_alloc, (period * pct)/100);
    if (-1 == (retval = schedctl(DEADLINE, pid, &dd)) )
    {
        if (ENOSPC == errno)
        { /* system cannot guarantee that duty cycle */
            fprintf(stderr,"schedctl: cannot promise %d%% of %dms\n",
                                                     pct, period);
        }
        else perror("schedctl");
    }
    return retval;
}
/*
|| Request one of the constants defined in schedctl.h as a new
|| scheduling policy for the specified PID.
||
|| Note: the constants DL_ONLY, etc., are declared in schedctl.h
|| as type-casts to (struct sched_deadline *).  That is why this
|| function speficies that type for its second argument -- when
|| it logically should be simply "int."
*/
int setDeadlinePolicy(int pid, struct sched_deadline * policy)
{
    int retval = schedctl(DEADLINE,pid,policy);
    if (-1 == retval)
    {
        char msg[64];
        sprintf(msg,"schedctl(DEADLINE,%d,%ld)",pid,policy);
        perror(msg);
    }
    return retval;
}
#ifdef UNIT_TEST
int main(int argc, char **argv)
{
    int pct = 25;
    int per = 100;
    int pid = 0; /* which means "self" to schedctl() */
    if (1 < argc)
    {
        pct = atoi(argv[1]);
    }
    if (2 < argc)
    {
        per = atoi(argv[2]);
    }
    if (3 < argc)
    {
        pid = atoi(argv[3]);
    }
    if ( (4 < argc) || (0==pct) || (0==per) )
    {
        fprintf(stderr,
    "usage: setDeadline [ pct_duty_cycle [ period_ms [ pid ] ] ]\n");
        exit();
    }
    printf("schedule pid %d for %d%% of %dms --> %d\n",
                        pid, pct, per, setDeadlinePct( pid, per, pct));
    printf("policy DL_ONLY-->%d\n", setDeadlinePolicy(pid,DL_ONLY));        
    printf("policy DL_ANY-->%d\n", setDeadlinePolicy(pid,DL_ANY));      
    printf("policy DL_RELEASE-->%d\n", setDeadlinePolicy(pid,DL_RELEASE));  
}
#endif 

Asynchronous I/O Example

The program in Example A-9 demonstrates the use some asynchronous I/O functions. The basic purpose of the program is to read a list of input files and write their concatenated contents as its output—work that does not normally require asynchronous I/O. However, this test program reads the input files using aio_read(), and writes the output files using aio_write() and aio_fsync(). In addition, it can be compiled in either of two ways,

  • to copy the input files one at a time, using subroutine calls

  • to copy the input files concurrently, using a separate process for each input file

There is no functional advantage to using multiple processes. Doing so merely makes the example more interesting. It also demonstrates that, even though multiple processes ask for output at different points in the same file at the same time, the output is written to the requested offsets.

The reading and writing is done in one of four functions. The functions all perform the following sequence of actions:

  1. Initialize the aiocb for the type of notification desired. The type of notification is the principal difference between the functions: some use signals, some callback functions, some no notification.

  2. Until the input file is exhausted,

    • Call aio_read() for up to one BLOCKSIZE amount from the next offset in the input file

    • Wait for the read to complete

    • Call aio_write() to write the data read to the next offset in the output file

    • Wait for the write to complete

  3. Use aio_fsync() to ensure that output is complete and wait for it to complete.

The four functions, inProc0() through inProc3(), differ only in the method they use to wait for completion.

  • inProc0() alternates calling aio_error() with sginap() until the status is other than EINPROGRESS.

  • inProc1() calls aio_suspend() to wait for the current operation.

  • inProc2() sets the aiocb to request a signal on completion. Then it waits on a semaphore that is posted from the signal handler function.

  • inProc3() waits on a semaphore which is posted from a callback function.

You select which of the four function to use with the -a argument to the program. If you compile the program with the variable DO_SPROCS defined as 0, the chosen function is called as a subroutine once for each input file. If you compile with DO_SPROCS defined as 1, the chosen function is launched by sprocsp() once for each input file.

Example A-9. Asynchronous I/O Example Program


/* ============================================================================ 
||  aiocat.c : This highly artificial example demonstrates asynchronous I/O. 
||
|| The command syntax is:
||  aiocat [ -o outfile ] [-a {0|1|2|3} ] infilename...
||
|| The output file is given by -o, with $TMPDIR/aiocat.out by default.
|| The aio method of waiting for completion is given by -a as follows:
||  -a 0 poll for completion with aio_error() (default)
||  -a 1 wait for completion with aio_suspend()
||  -a 2 wait on a semaphore posted from a signal handler
||  -a 3 wait on a semaphore posted from a callback routine
||
|| Up to MAX_INFILES input files may be specified. Each input file is
|| read in BLOCKSIZE units. The output file contains the data from
|| the input files in the order they were specified. Thus the
|| output should be the same as "cat infilename... >outfile".
||
|| When DO_SPROCS is compiled true, all I/O is done asynchronously
|| and concurrently using one sproc'd process per file.  Thus in a
|| multiprocessor concurrent input can be done.
============================================================================ */

#define _SGI_MP_SOURCE  /* see the "Caveats" section of sproc(2) */
#include <sys/time.h>   /* for clock() */
#include <errno.h>      /* for perror() */
#include <stdio.h>      /* for printf() */
#include <stdlib.h>     /* for getenv(), malloc(3c) */
#include <ulocks.h>     /* usinit() & friends */
#include <bstring.h>    /* for bzero() */
#include <sys/resource.h> /* for prctl, get/setrlimit() */
#include <sys/prctl.h>  /* for prctl() */
#include <sys/types.h>  /* required by lseek(), prctl */
#include <unistd.h>     /* ditto */
#include <sys/types.h>  /* wanted by sproc() */
#include <sys/prctl.h>  /* ditto */
#include <signal.h>     /* for signals - gets sys/signal and sys/siginfo */
#include <aio.h>        /* async I/O */

#define BLOCKSIZE 2048  /* input units -- play with this number */
#define MAX_INFILES 10  /* max sprocs: anything from 4 to 20 or so */
#define DO_SPROCS 1     /* set 0 to do all I/O in a single process */

#define QUITIFNULL(PTR,MSG) if (NULL==PTR) {perror(MSG);return(errno);}
#define QUITIFMONE(INT,MSG) if (-1==INT) {perror(MSG);return(errno);}
/*****************************************************************************
|| The following structure contains the info needed by one child proc.
|| The main program builds an array of MAX_INFILES of these.
|| The reason for storing the actual filename here (not a pointer) is
|| to force the struct to >128 bytes.  Then, when the procs run in 
|| different CPUs on a CHALLENGE, the info structs will be in different
|| cache lines, and a store by one proc will not invalidate a cache line
|| for its neighbor proc.
*/
typedef struct child
{
        /* read-only to child */
    char fname[100];        /* input filename from argv[n] */
    int         fd;         /* FD for this file */
    void*       buffer;     /* buffer for this file */
    int         procid;     /* process ID of child process */
    off_t       fsize;      /* size of this input file */
        /* read-write to child */
    usema_t*    sema;       /* semaphore used by methods 2 & 3 */
    off_t       outbase;    /* starting offset in output file */
    off_t       inbase;     /* current offset in input file */
    clock_t     etime;      /* sum of utime/stime to read file */
    aiocb_t     acb;        /* aiocb used for reading and writing */
} child_t;

/******************************************************************************
|| Globals, accessible to all processes
*/
char*       ofName = NULL;  /* output file name string */
int         outFD;          /* output file descriptor */
usptr_t*    arena;          /* arena where everything is built */
barrier_t*  convene;        /* barrier used to sync up */
int         nprocs = 1;     /* 1 + number of child procs */
child_t*    array;          /* array of child_t structs in arena */
int         errors = 0;     /* always incremented on an error */

/******************************************************************************
|| forward declaration of the child process functions
*/
void inProc0(void *arg, size_t stk);    /* polls with aio_error() */
void inProc1(void *arg, size_t stk);    /* uses aio_suspend() */
void inProc2(void *arg, size_t stk);    /* uses a signal and semaphore */
void inProc3(void *arg, size_t stk);    /* uses a callback and semaphore */

/******************************************************************************
// The main()
*/
int main(int argc, char **argv)
{
    char*       tmpdir;         /* ->name string of temp dir */
    int         nfiles;         /* how many input files on cmd line */
    int         argno;          /* loop counter */
    child_t*    pc;             /* ->child_t of current file */
    void (*method)(void *,size_t) = inProc0; /* ->chosen input method */
    char        arenaPath[128]; /* build area for arena pathname */
    char        outPath[128];   /* build area for output pathname */    
    /*
    || Ensure the name of a temporary directory.
    */
    tmpdir = getenv("TMPDIR");
    if (!tmpdir) tmpdir = "/var/tmp";
    /*
    || Build a name for the arena file.
    */
    strcpy(arenaPath,tmpdir);
    strcat(arenaPath,"/aiocat.wrk");
    /*
    || Create the arena. First, call usconfig() to establish the
    || minimum size (twice the buffer size per file, to allow for misc usage)
    || and the (maximum) number of processes that may later use
    || this arena.  For this program that is MAX_INFILES+10, allowing
    || for our sprocs plus those done by aio_sgi_init().
    || These values apply to any arenas made subsequently, until changed.
    */
    {
        ptrdiff_t ret;
        ret = usconfig(CONF_INITSIZE,2*BLOCKSIZE*MAX_INFILES);
        QUITIFMONE(ret,"usconfig size")
        ret = usconfig(CONF_INITUSERS,MAX_INFILES+10);
        QUITIFMONE(ret,"usconfig users")
        arena = usinit(arenaPath);
        QUITIFNULL(arena,"usinit")
    }
    /*
    || Allocate the barrier.
    */
    convene = new_barrier(arena);
    QUITIFNULL(convene,"new_barrier")
    /*
    || Allocate the array of child info structs and zero it.
    */
    array = (child_t*)usmalloc(MAX_INFILES*sizeof(child_t),arena);
    QUITIFNULL(array,"usmalloc")
    bzero((void *)array,MAX_INFILES*sizeof(child_t));
    /*
    || Loop over the arguments, setting up child structs and
    || counting input files.  Quit if a file won't open or seek,
    || or if we can't get a buffer or semaphore.
    */
    for (nfiles=0, argno=1; argno < argc; ++argno )
    {
        if (0 == strcmp(argv[argno],"-o"))
        { /* is the -o argument */
            ++argno;
            if (argno < argc)
                ofName = argv[argno];
            else
            {
                fprintf(stderr,"-o must have a filename after\n");
                return -1;
            }
        }
        else if (0 == strcmp(argv[argno],"-a"))
        { /* is the -a argument */
            char c = argv[++argno][0];
            switch(c)
            {
            case '0' : method = inProc0; break;
            case '1' : method = inProc1; break;
            case '2' : method = inProc2; break;
            case '3' : method = inProc3; break;
            default:
                {
                    fprintf(stderr,"unknown method -a %c\n",c);
                    return -1;
                }
            }
        }
        else if ('-' == argv[argno][0])
        { /* is unknown -option */
            fprintf(stderr,"aiocat [-o outfile] [-a 0|1|2|3] infiles...\n");
            return -1;
        }
        else    
        { /* neither -o nor -a, assume input file */
            if (nfiles < MAX_INFILES)
            {
                /*
                || save the filename
                */
                pc = &array[nfiles];
                strcpy(pc->fname,argv[argno]);
                /*
                || allocate a buffer and a semaphore.  Not all
                || child procs use the semaphore but so what?
                */
                pc->buffer = usmalloc(BLOCKSIZE,arena);
                QUITIFNULL(pc->buffer,"usmalloc(buffer)")
                pc->sema = usnewsema(arena,0);
                QUITIFNULL(pc->sema,"usnewsema")
                /*
                || open the file
                */
                pc->fd = open(pc->fname,O_RDONLY);
                QUITIFMONE(pc->fd,"open")
                /*
                || get the size of the file. This leaves the file
                || positioned at-end, but there is no need to reposition 
                || because all aio_read calls have an implied lseek.
                || NOTE: there is no check for zero-length file; that
                || is a valid (and interesting) test case.
                */
                pc->fsize = lseek(pc->fd,0,SEEK_END);
                QUITIFMONE(pc->fsize,"lseek")
                /*
                || set the starting base address of this input file
                || in the output file.  The first file starts at 0.
                || Each one after starts at prior base + prior size.
                */
                if (nfiles) /* not first */
                    pc->outbase =
                        array[nfiles-1].fsize + array[nfiles-1].outbase;
                ++nfiles;
            }
            else
            {
                printf("Too many files, %s ignored\n",argv[argno]);
            }
        }
    } /* end for(argc) */
    /*
    || If there was no -o argument, construct an output file name.
    */
    if (!ofName)
    {
        strcpy(outPath,tmpdir);
        strcat(outPath,"/aiocat.out");
        ofName = outPath;
    }
    /*
    || Open, creating or truncating, the output file.
    || Do not use O_APPEND, which would constrain aio to doing
    || operations in sequence.
    */
    outFD = open(ofName, O_WRONLY+O_CREAT+O_TRUNC,0666);
    QUITIFMONE(outFD,"open(output)")
    /*
    || If there were no input files, just quit, leaving empty output
    */
    if (!nfiles)
    {
        return 0;
    }
    /*
    || Note the number of processes-to-be, for use in initializing
    || aio and for use by each child in a barrier() call.
    */
    nprocs = 1+nfiles;
    /*
    || Initialize async I/O using aio_sgi_init(), in order to specify
    || a number of locks at least equal to the number of child procs
    || and in order to specify extra sproc users.
    */
    { 
        aioinit_t ainit = {0}; /* all fields initially zero */
        /*
        || Go with the default 5 for the number of aio-created procs,
        || as we have no way of knowing the number of unique devices.
        */
#define AIO_PROCS 5
        ainit.aio_threads = AIO_PROCS;
        /*
        || Set the number of locks aio needs to the number of procs
        || we will start, minimum 3.
        */
        ainit.aio_locks = (nprocs > 2)?nprocs:3;
        /*
        || Warn aio of the number of user procs that will be
        || using its arena.
        */
        ainit.aio_numusers = nprocs;
        aio_sgi_init(&ainit);
    }
    /*
    || Process each input file, either in a child process or in
    || a subroutine call, as specified by the DO_SPROCS variable.
    */
    for (argno = 0; argno < nfiles; ++argno)
    {
        pc = &array[argno];
#if DO_SPROCS
#define CHILD_STACK 64*1024
        /*
        || For each input file, start a child process as an instance
        || of the selected method (-a argument).
        || If an error occurs, quit. That will send a SIGHUP to any
        || already-started child, which will kill it, too.
        */ 
        pc->procid = sprocsp(method     /* function to start */
                            ,PR_SALL    /* share all, keep FDs sync'd */
                            ,(void *)pc /* argument to child func */
                            ,NULL       /* absolute stack seg */
                            ,CHILD_STACK);  /* max stack seg growth */
        QUITIFMONE(pc->procid,"sproc")
#else
        /*
        || For each input file, call the selected (-a) method as a
        || subroutine to copy its file.
        */
        fprintf(stderr,"file %s...",pc->fname);
        method((void*)pc,0);
        if (errors) break;
        fprintf(stderr,"done\n");
#endif
    }
#if DO_SPROCS
    /*
    || Wait for all the kiddies to get themselves initialized.
    || When all have started and reached barrier(), all continue.
    || If any errors occurred in initialization, quit.
    */
    barrier(convene,nprocs); 
    /*
    || Child processes are executing now. Reunite the family round the
    || old hearth one last time, when their processing is complete.
    || Each child ensures that all its output is complete before it
    || invokes barrier().
    */
    barrier(convene,nprocs);
#endif
    /*
    || Close the output file and print some statistics.
    */
    close(outFD);
    {
        clock_t timesum;
        long    bytesum;
        double  bperus;
        printf("    procid   time     fsize     filename\n");
        for(argno = 0, timesum = bytesum = 0 ; argno < nfiles ; ++argno)
        {
            pc = &array[argno];
            timesum += pc->etime;
            bytesum += pc->fsize;
            printf("%2d: %-8d %-8d %-8d  %s\n"
                    ,argno,pc->procid,pc->etime,pc->fsize,pc->fname);
        }
        bperus = ((double)bytesum)/((double)timesum);
        printf("total time %d usec, total bytes %d, %g bytes/usec\n"
                     ,timesum            , bytesum , bperus);
    }
    /*
    || Unlink the arena file, so it won't exist when this progam runs
    || again. If it did exist, it would be used as the initial state of
    || the arena, which might or might not have any effect.
    */
    unlink(arenaPath);
    return 0;
}
/******************************************************************************
|| inProc0() alternates polling with aio_error() with sginap(). Under
|| the Frame Scheduler, it would use frs_yield() instead of sginap().
|| The general pattern of this function is repeated in the other three;
|| only the wait method varies from function to function.
*/
int inWait0(child_t *pch)
{
    int ret;
    aiocb_t* pab = &pch->acb;
    while (EINPROGRESS == (ret = aio_error(pab))) 
    {
        sginap(0); 
    }
    return ret;
}
void inProc0(void *arg, size_t stk)
{
    child_t *pch = arg;         /* starting arg is ->child_t for my file */
    aiocb_t *pab = &pch->acb;   /* base address of the aiocb_t in child_t */
    int ret;                    /* as long as this is 0, all is ok */
    int bytes;                  /* #bytes read on each input */
    /*
    || Initialize -- no signals or callbacks needed.
    */
    pab->aio_sigevent.sigev_notify = SIGEV_NONE;
    pab->aio_buf = pch->buffer; /* always the same */
#if DO_SPROCS
    /*
    || Wait for the starting gun...
    */
    barrier(convene,nprocs);
#endif
    pch->etime = clock();
    do /* read and write, read and write... */
    {
        /*
        || Set up the aiocb for a read, queue it, and wait for it.
        */
        pab->aio_fildes = pch->fd;
        pab->aio_offset = pch->inbase;
        pab->aio_nbytes = BLOCKSIZE;
        if (ret = aio_read(pab)) 
            break;  /* unable to schedule a read */
        ret = inWait0(pch);
        if (ret)
            break;  /* nonzero read completion status */
        /*
        || get the result of the read() call, the count of bytes read.
        || Since aio_error returned 0, the count is nonnegative.
        || It could be 0, or less than BLOCKSIZE, indicating EOF.
        */ 
        bytes = aio_return(pab); /* actual read result */
        if (!bytes)
            break;  /* no need to write a last block of 0 */
        pch->inbase += bytes;   /* where to read next time */
        /*
        || Set up the aiocb for a write, queue it, and wait for it.
        */
        pab->aio_fildes = outFD;
        pab->aio_nbytes = bytes;
        pab->aio_offset = pch->outbase;
        if (ret = aio_write(pab)) 
            break;
        ret = inWait0(pch);
        if (ret)
            break;
        pch->outbase += bytes;  /* where to write next time */
    } while ((!ret) && (bytes == BLOCKSIZE));
    /*
    || The loop is complete.  If no errors so far, use aio_fsync()
    || to ensure that output is complete.  This requires waiting
    || yet again.
    */
    if (!ret)
    { 
        if (!(ret = aio_fsync(O_SYNC,pab)))
        ret = inWait0(pch);
    }
    /*
    || Flag any errors for the parent proc. If none, count elapsed time.
    */
    if (ret) ++errors;
    else pch->etime = (clock() - pch->etime);
#if DO_SPROCS
    /*
    || Rendezvous with the rest of the family, then quit.
    */
    barrier(convene,nprocs);
#endif
    return;
} /* end inProc1 */
/******************************************************************************
|| inProc1 uses aio_suspend() to await the completion of each operation.
|| Otherwise it is the same as inProc0, above.
*/

int inWait1(child_t *pch)
{
    int ret;
    aiocb_t* susplist[1]; /* list of 1 aiocb for aio_suspend() */
    susplist[0] = &pch->acb;
    /*
    || Note: aio.h declares the 1st argument of aio_suspend() as "const."
    || The C compiler requires the actual-parameter to match in type,
    || so the list we pass must either be declared "const aiocb_t*" or
    || must be cast to that -- else cc gives a warning.  The cast
    || in the following statement is only to avoid this warning.
    */ 
    ret = aio_suspend( (const aiocb_t **) susplist,1,NULL);
    return ret;
}
void inProc1(void *arg, size_t stk)
{
    child_t *pch = arg;         /* starting arg is ->child_t for my file */
    aiocb_t *pab = &pch->acb;   /* base address of the aiocb_t in child_t */
    int ret;                    /* as long as this is 0, all is ok */
    int bytes;                  /* #bytes read on each input */
    /*
    || Initialize -- no signals or callbacks needed.
    */
    pab->aio_sigevent.sigev_notify = SIGEV_NONE;
    pab->aio_buf = pch->buffer; /* always the same */
#if DO_SPROCS
    /*
    || Wait for the starting gun...
    */
    barrier(convene,nprocs);
#endif
    pch->etime = clock();
    do /* read and write, read and write... */
    {
        /*
        || Set up the aiocb for a read, queue it, and wait for it.
        */
        pab->aio_fildes = pch->fd;
        pab->aio_offset = pch->inbase;
        pab->aio_nbytes = BLOCKSIZE;
        if (ret = aio_read(pab))
            break;
        ret = inWait1(pch);
        /*
        || If the aio_suspend() return is nonzero, it means that the wait
        || did not end for i/o completion but because of a signal. Since we
        || expect no signals here, we take that as an error.
        */
        if (!ret) /* op is complete */
            ret = aio_error(pab);  /* read() status, should be 0 */
        if (ret)
            break;  /* signal, or nonzero read completion */
        /*
        || get the result of the read() call, the count of bytes read.
        || Since aio_error returned 0, the count is nonnegative.
        || It could be 0, or less than BLOCKSIZE, indicating EOF.
        */
        bytes = aio_return(pab); /* actual read result */
        if (!bytes)
            break;  /* no need to write a last block of 0 */
        pch->inbase += bytes;   /* where to read next time */
        /*
        || Set up the aiocb for a write, queue it, and wait for it.
        */
        pab->aio_fildes = outFD;
        pab->aio_nbytes = bytes;
        pab->aio_offset = pch->outbase;
        if (ret = aio_write(pab))
            break;
        ret = inWait1(pch);
        if (!ret) /* op is complete */
            ret = aio_error(pab);  /* should be 0 */
        if (ret)
            break;
        pch->outbase += bytes;  /* where to write next time */
    } while ((!ret) && (bytes == BLOCKSIZE));
    /*
    || The loop is complete.  If no errors so far, use aio_fsync()
    || to ensure that output is complete.  This requires waiting
    || yet again.
    */
    if (!ret)
    {
        if (!(ret = aio_fsync(O_SYNC,pab)))
            ret = inWait1(pch);
    }
    /*
    || Flag any errors for the parent proc. If none, count elapsed time.
    */
    if (ret) ++errors;
    else pch->etime = (clock() - pch->etime);
#if DO_SPROCS
    /*
    || Rendezvous with the rest of the family, then quit.
    */
    barrier(convene,nprocs);
#endif
} /* end inProc0 */
/******************************************************************************
|| inProc2 requests a signal upon completion of an I/O. After starting
|| an operation, it P's a semaphore which is V'd from the signal handler.
*/
#define AIO_SIGNUM SIGRTMIN+1 /* arbitrary choice of signal number */
void sigHandler2(const int signo, const struct siginfo *sif )
{
    /*
    || In this minimal signal handler we pick up the address of the
    || child_t info structure -- which was put in aio_sigevent.sigev_value
    || field during initialization -- and use it to find the semaphore.
    */
    child_t *pch = sif->si_value.sival_ptr ;
    usvsema(pch->sema); 
    return; /* stop here with dbx to print the above address */
}
int inWait2(child_t *pch)
{
    /*
    || Wait for any signal handler to post the semaphore.  The signal
    || handler could have been entered before this function is called,
    || or it could be entered afterward.
    */
    uspsema(pch->sema); 
    /*
    || Since this process executes only one aio operation at a time,
    || we can return the status of that operation.  In a more complicated
    || design, if a signal could arrive from more than one pending
    || operation, this function could not return status.
    */
    return aio_error(&pch->acb);
}
void inProc2(void *arg, size_t stk)
{
    child_t *pch = arg;         /* starting arg is ->child_t for my file */
    aiocb_t *pab = &pch->acb;   /* base address of the aiocb_t in child_t */
    int ret;                    /* as long as this is 0, all is ok */
    int bytes;                  /* #bytes read on each input */
    /*
    || Initialize -- request a signal in aio_sigevent. The address of
    || the child_t struct is passed as the siginfo value, for use
    || in the signal handler.
    */
    pab->aio_sigevent.sigev_notify = SIGEV_SIGNAL;
    pab->aio_sigevent.sigev_signo = AIO_SIGNUM;
    pab->aio_sigevent.sigev_value.sival_ptr = (void *)pch;
    pab->aio_buf = pch->buffer; /* always the same */
    /*
    || Initialize -- set up a signal handler for AIO_SIGNUM.
    */
    { 
        struct sigaction sa = {SA_SIGINFO,sigHandler2};
        ret = sigaction(AIO_SIGNUM,&sa,NULL);
        if (ret) ++errors; /* parent will shut down ASAP */
    }   
#if DO_SPROCS
    /*
    || Wait for the starting gun...
    */
    barrier(convene,nprocs);
#else
    if (ret) return;
#endif
    pch->etime = clock();
    do /* read and write, read and write... */
    {
        /*
        || Set up the aiocb for a read, queue it, and wait for it.
        */
        pab->aio_fildes = pch->fd;
        pab->aio_offset = pch->inbase;
        pab->aio_nbytes = BLOCKSIZE;
        if (!(ret = aio_read(pab)))
            ret = inWait2(pch);
        if (ret)
            break;  /* could not start read, or it ended badly */
        /*
        || get the result of the read() call, the count of bytes read.
        || Since aio_error returned 0, the count is nonnegative.
        || It could be 0, or less than BLOCKSIZE, indicating EOF.
        */
        bytes = aio_return(pab); /* actual read result */
        if (!bytes)
            break;  /* no need to write a last block of 0 */
        pch->inbase += bytes;   /* where to read next time */
        /*
        || Set up the aiocb for a write, queue it, and wait for it.
        */
        pab->aio_fildes = outFD;
        pab->aio_nbytes = bytes;
        pab->aio_offset = pch->outbase;
        if (!(ret = aio_write(pab)))
             ret = inWait2(pch);
        if (ret)
            break;
        pch->outbase += bytes;  /* where to write next time */
    } while ((!ret) && (bytes == BLOCKSIZE));
    /*
    || The loop is complete.  If no errors so far, use aio_fsync()
    || to ensure that output is complete.  This requires waiting
    || yet again.
    */
    if (!ret)
    {
        if (!(ret = aio_fsync(O_SYNC,pab)))
            ret = inWait2(pch);
    }
    /*
    || Flag any errors for the parent proc. If none, count elapsed time.
    */
    if (ret) ++errors;
    else pch->etime = (clock() - pch->etime);
#if DO_SPROCS
    /*
    || Rendezvous with the rest of the family, then quit.
    */
    barrier(convene,nprocs);
#endif
} /* end inProc2 */

/******************************************************************************
|| inProc3 uses a callback and a semaphore. It waits with a P operation.
|| The callback function executes a V operation.  This may come before or
|| after the P operation.
*/
void callBack3(union sigval usv)
{
    /*
    || The callback function receives the pointer to the child_t struct,
    || as prepared in aio_sigevent.sigev_value.sival_ptr.  Use this to 
    || post the semaphore in the child_t struct.
    */
    child_t *pch = usv.sival_ptr;
    usvsema(pch->sema);
    return;
}
int inWait3(child_t *pch)
{
    /*
    || Suspend, if necessary, by polling the semaphore.  The callback
    || function might be entered before we reach this point, or after.
    */
    uspsema(pch->sema);
    /*
    || Return the status of the aio operation associated with the sema.
    */
    return aio_error(&pch->acb);    
}
void inProc3(void *arg, size_t stk)
{
    child_t *pch = arg;         /* starting arg is ->child_t for my file */
    aiocb_t *pab = &pch->acb;   /* base address of the aiocb_t in child_t */
    int ret;                    /* as long as this is 0, all is ok */
    int bytes;                  /* #bytes read on each input */
    /*
    || Initialize -- request a callback in aio_sigevent. The address of
    || the child_t struct is passed as the siginfo value to be passed
    || into the callback. 
    */
    pab->aio_sigevent.sigev_notify = SIGEV_CALLBACK;
    pab->aio_sigevent.sigev_func = callBack3;
    pab->aio_sigevent.sigev_value.sival_ptr = (void *)pch;
    pab->aio_buf = pch->buffer; /* always the same */
#if DO_SPROCS
    /*
    || Wait for the starting gun...
    */
    barrier(convene,nprocs);
#endif
    pch->etime = clock();
    do /* read and write, read and write... */
    {
        /*
        || Set up the aiocb for a read, queue it, and wait for it.
        */
        pab->aio_fildes = pch->fd;
        pab->aio_offset = pch->inbase;
        pab->aio_nbytes = BLOCKSIZE;
        if (!(ret = aio_read(pab)))
            ret = inWait3(pch);
        if (ret)
            break;  /* read error */
        /*
        || get the result of the read() call, the count of bytes read.
        || Since aio_error returned 0, the count is nonnegative.
        || It could be 0, or less than BLOCKSIZE, indicating EOF.
        */
        bytes = aio_return(pab); /* actual read result */
        if (!bytes)
            break;  /* no need to write a last block of 0 */
        pch->inbase += bytes;   /* where to read next time */
        /*
        || Set up the aiocb for a write, queue it, and wait for it.
        */
        pab->aio_fildes = outFD;
        pab->aio_nbytes = bytes;
        pab->aio_offset = pch->outbase;
        if (!(ret = aio_write(pab)))
             ret = inWait3(pch);
        if (ret)
            break;
        pch->outbase += bytes;  /* where to write next time */
    } while ((!ret) && (bytes == BLOCKSIZE));
    /*
    || The loop is complete.  If no errors so far, use aio_fsync()
    || to ensure that output is complete.  This requires waiting
    || yet again.
    */
    if (!ret)
    {
        if (!(ret = aio_fsync(O_SYNC,pab)))
            ret = inWait3(pch);
    }
    /*
    || Flag any errors for the parent proc. If none, count elapsed time.
    */
    if (ret) ++errors;
    else pch->etime = (clock() - pch->etime);
#if DO_SPROCS
    /*
    || Rendezvous with the rest of the family, then quit.
    */
    barrier(convene,nprocs);
#endif
} /* end inProc3 */  

Guaranteed-Rate Request

The following subroutine simplifies the task of requesting a guaranteed rate of I/O transfer. The file descriptor passed to function requestRate() must describe a file located in the real-time subvolume of a volume managed by XLV and XFS.

/*
 * Simple function to request a guaranteed rate reservation.
 * Input:
 *      fd      file descriptor to be guaranteed
 *      dur     duration of guarantee in seconds
 *      bps     bytes per second required
 *      flag    one of SOFT_ or HARD_GUARANTEE [+VOD_LAYOUT]
 *              (extra entry points included for those who do not
 *              want to include sys/grio.h)
 *
 * Assumed:
 *      reservation start time of "1 second from now"
 *      guarantee unit time of 1 second
 *
 * Returns:
 *       0    success,  guarantee granted
 *      -1    error returned and displayed with perror()
 *      +n    n is the best bytes/second that XFS can offer
 *
 * Usage:
 *      #define BEST_RATE zillions
 *      #define MINIMAL_RATE somewhat_less
 *      if ( (ret = requestRate(fd, dur, BEST_RATE, SOFT_GUARANTEE)) )
 *      { // not a success
 *        if (ret >= MINIMAL_RATE) // acceptable lower rate offered
 *        ret = requestRate(fd, dur, ret, SOFT_GUARANTEE);
 *      }
 *      if (ret) // failed for some reason
 *      {
 *        if (0<ret) // not an error as such
 *           fprintf(stderr, "Cannot get rate\n");
 *        exit();
 *      }
 *      // guaranteed rate obtained, continue    
 */
#include <sys/types.h>  /* for time_t */
#include <time.h>       /* for time() */
#include <errno.h>      /* for error codes */
#include <stdio.h>      /* [fs]printf() */
#include "grio.h"       /* for grio_* */

/*
 * This subroutine displays a diagnostic message to stderr when
 * grio_request() returns an error. perror() cannot be used in
 * this case because the generic messages are not descriptive.
 * 
 */
void printGRIOerror( grio_resv_t *g )
{
    char work[80];
    char *msg = work;
    
    switch (g->gr_error)
    {
    case EINVAL:
    {
        msg = "unable to contact grio daemon";
        break;
    }
    case EBADF:
    {
        msg = "cannot stat file, or file already guaranteed";
        break;
    }
    case ESRCH:
    {
        msg = "invalid procid";
        break;
    }
    case ENOENT:
    {
        msg = "file has no (real-time?) extents";
        break;
    }
    case EIO:
    {
        msg = "incorrect start time";
        break;
    }
    case EACCES:
    {
        msg = (g->gr_flags & VOD_LAYOUT)
              ? "unable to provide VOD guarantee"
              : (
                (g->gr_flags & HARD_GUARANTEE)
                ? "unable to provide HARD guarantee"
                : "unable to provide SOFT guarantee"
            );
        break;
    }
    case ENOSPC:
    {
        sprintf(work, "out of bandwidth on device %s",
                    g->gr_errordev);
        break;      
    }
    default: /* in case they think of something else */
    {
        sprintf(work, "error %d", g->gr_error);
    }
    }
    fprintf(stderr, "grio_request: %s.\n", msg);
}

/*
 * This function actually places the request.
 */
int requestRate( int fd, int dur, int bps, int flag)
{
    int ret;
    grio_resv_t grio;
    
    grio.gr_duration= dur;
    grio.gr_start   = 1+time(NULL);
    grio.gr_optime  = 1; /* unit time is 1 second */
    grio.gr_opsize  = bps;
    grio.gr_flags   = flag;
    ret = grio_request(fd, &grio); 
    if (ret) /* not a success */
    {
        if ( (ENOSPC == grio.gr_error) /* insufficient bandwidth.. */
        &&   (grio.gr_opsize) ) /* ..but nonzero rate remaining */
            ret = grio.gr_opsize; /* return available rate */
        else /* some other problem or 0 bandwidth available */
            printGRIOerror(&grio);
    }
    return ret;
}
/*
 * When you don't want to #include sys/grio.h to define one constant...
 */
int requestHardRate( int fd, int dur, int bps )
{ return requestRate(fd, dur, bps, HARD_GUARANTEE); }

int requestSoftRate( int fd, int dur, int bps )
{ return requestRate(fd, dur, bps, SOFT_GUARANTEE); }

#ifdef UNIT_TEST
#include <sys/stat.h>
#include <fcntl.h>
/* requestRate pathname [rate [duration [flags ] ] ] */
int main(int argc, char **argv)
{
    int fd = open(argv[1], O_RDONLY);
    int bps = 1000000; /* 1MB */
    int dur = 60; /* a new york minute */
    int flg = SOFT_GUARANTEE;
    int rc;
    if (argc > 2) bps = atoi(argv[2]);
    if (argc > 3) dur = atoi(argv[3]);
    if (argc > 4) flg = atoi(argv[4]);
    printf("Requesting guarantee on fd=%d of %d bps for %d sec\n",
                                       fd,   bps,       dur);
    rc = requestRate(fd, dur, bps, flg);
    printf("requestRate() returns %d\n", rc);
}
#endif /*UNIT_TEST*/

Frame Scheduler Examples

A number of example programs are distributed with the REACT/Pro Frame Scheduler. This section describes them. Only one is reproduced here; the others are found on disk.

The example programs distributed with the Frame Scheduler are found in the directory /usr/react/src/examples. They are summarized in Table i and are discussed in more detail in the topics that follow.

Table A-1. Summary of Frame Scheduler Example Programs

Directory

Features of Example

simple
r4k_intr

Two processes scheduled on a single CPU at a frame rate slow enough to permit use of printf() for debugging. The examples differ in the time base used; and the r4k_intr code uses a barrier for synchronization.

mprogs

Like simple, but the scheduled processes are independent programs.

multi
ext_intr
user_intr
vsync_intr

Three synchronous Frame Schedulers running lightweight processes on three processors. These examples are much alike, differing mainly in the source of the time base interrupt.

complete
stop_resume

Like multi in starting three Frame Schedulers. Information about the activity processes is stored in arrays for convenient maintenance. The stop_resume code demonstrates frs_stop() and frs_resume() calls.

driver
dintr

driver contains a pseudo-device driver that demonstrates the Frame Scheduler device driver interface. dintr contains a program based on simple that uses the example driver as a time base.

sixtyhz
memlock

One process scheduled at a 60 Hz frame rate. The activity process in the memlock example locks its address space into memory before it joins the scheduler.

upreuse

Complex example that demonstrates the creation of a pool of reusable processes, and how they can be dispatched as activity processes on a Frame Scheduler.


Basic Example

The example in /usr/react/src/examples/simple shows how to create a simple application using the Frame Scheduler API. The code in /usr/react/src/examples/r4kintr is similar.

Real-Time Application Specification

The application consists of two processes that have to periodically execute a specific sequence of code. The period for the first process, process A, is 600 milliseconds. The period for the other process, process B, is 2400 ms.


Note: Such long periods are unrealistic for real-time applications. However, they allow the use of printf() calls within the “real-time” loops in this sample program.


Frame Scheduler Design

The two periods and their ratio determine the selection of the minor frame period—600 ms—and the number of minor frames per major frame—4, for a total of 2400 ms.

The discipline for process A is strict real-time (FRS_DISC_RT). Underrun and overrrun errors should cause signals.

Process B should run only once in 2400 ms, so it operates as Continuable over as many as 4 minor frames. For the first 3 frames, its discipline is Overrunnable and Continuable. For the last frame it is strict real-time. The Overrunnable discipline allows process B to run without yielding past the end of each minor frame. The Continuable discipline ensures that once process B does yield, it is not resumed until the fourth minor frame has passed. The combination allows process B to extend its execution to the allowable period of 2400 ms, and the strict real-time discipline at the end makes certain that it yields by the end of the major frame.

There is a single Frame Scheduler so a single processor is used by both processes. Process A runs within a minor frame until yielding or until the expiration of the minor frame period. In the latter case the frame scheduler generates an overrun error signaling that process A is misbehaving.

When process A yields, the frame scheduler immediately activates process B. It runs until yielding, or until the end of the minor frame at which point it is preempted. This is not an error since process B is Overrunable.

Starting the next minor frame, the Frame Scheduler allows process A to execute again. After it yields, process B is allowed to resume running, if it has not yet yielded. Again in the third and fourth minor frame, A is started, followed by B if it has not yet yielded. At the interrupt that signals the end of the fourth frame (and the end of the major frame), process B must have yielded, or an overrun error is signalled.

Example of Scheduling Separate Programs

The code in directory /usr/react/src/examples/mprogs does the same work as example simple (see “Basic Example”). However, the activity processes A and B are physically loaded as separate commands. The main program establishes the single Frame Scheduler. The activity processes are started as separate programs. They communicate with the main program using SVR4-compatible interprocess communication messages (see the intro(2) and msgget(2) reference pages).

There are three separate executables in the mprogs example. The master program, in master.c, is a command that has the following syntax:

master [-p cpu-number] [-s slave-count]

The cpu-number specifies which processor to use for the one Frame Scheduler this program creates. The default is processor 1. The slave-count tells the master how many subordinate programs will be enqueued to the Frame Scheduler. The default is two programs.

The problems that need to be solved in this example are as follows:

  • The frs-master program must enqueue the activity processes. However, since they are started as separate programs, the master has no direct way of knowing their process IDs, which are needed for frs_enqueue().

  • The activity processes need to specify upon which minor frames they should be enqueued, and with what discipline.

  • The master needs to enqueue the activities in the proper order on their minor frames, so they will be dispatched in the proper sequence. Therefore the master has to distinguish the subordinates in some way; it cannot treat them as interchangeable.

  • The activity processes must join the Frame Scheduler, so they need the handle of the Frame Scheduler to use as an argument to frs_join(). However, this information is in the master's address space.

  • If an error occurs when enqueueing, the master needs to tell the activity processes so they can terminate in an orderly way.

There are many ways in which these objectives could be met (for example, the three programs could share a shared-memory arena). In this example, the master and subordinates communicate using a simple protocol of messages exchanged using msgget() and msgput() (see the msgget(2) and msgput(2) reference pages). The sequence of operations is as follows:

  1. The master program creates a Frame Scheduler.

  2. The master sends a message inviting the most important subordinate to reply. (All the message queue handling is in module ipc.c, which is linked by all three programs.)

  3. The subordinate compiled from the file processA.c replies to this message, sending its process ID and requesting the FRS handle.

  4. The subordinate process A sends a series of messages, one for each minor queue on which it should enqueue. The master enqueues it as requested.

  5. The subordinate process A sends a “ready” message.

  6. The master sends a message inviting the next most important process to reply.

  7. The program compiled from processB.c will reply to this request, and steps 3-6 are repeated for as many slaves as the slave-count parameter to the master program. (Only two slaves are provided. However, you can easily create more using processB.c as a pattern.)

  8. The master issues frs_start(), and waits for the termination signal.

  9. The subordinates independently issue frs_join() and the real-time dispatching begins.

Examples of Multiple Synchronized Schedulers

The example in /usr/react/src/examples/multi demonstrates the creation of three synchronized Frame Schedulers. The three use the cycle counter to establish a minor frame interval of 50 ms. All three Frame Schedulers use 20 minor frames per major frame, for a major frame rate of 1 Hz.

The following processes are scheduled in this example:

  • Processes A and D require a frequency of 20 Hz

  • Process B requires a frequency of 10 Hz and can consume up to 100 ms of execution time each time

  • Process C requires a frequence of 5 Hz and can consume up to 200 ms of execution time each time

  • Process E requires a frequency of 4 Hz and can consume up to 250 ms of execution time each time

  • Process F requires a frequency of 2 Hz and can consume up to 500 ms of execution time each time

  • Processes K1, K2 and K3 are background processes that should run as often as possible, when time is available.

The processes are assigned to processors as follows:

  • Scheduler 1 runs processes A (20 Hz) and K1 (background).

  • Scheduler 2 runs processes B (10 Hz), C (5 Hz), and K2 (background).

  • Scheduler 3 runs processes D (20Hz), E (4 Hz), F (2 Hz), and K3.

In order to simplify the coding of the example, all real-time processes use the same function body, process_skeleton(), which is parameterized with the process name, the address of the Frame Scheduler it is to join, and the address of the “real-time” action it is to execute. In the sample code, all real-time actions are empty function bodies (feel free to load them down with code).

The examples in /usr/react/src/examples/ext_intr, user_intr, and vsync_intr are all similar to multi, differing mainly in the time base used. The examples in complete and stop_resume are similar in operation, but more evolved and complex in the way they manage subprocesses.


Tip: It is helpful to use the xdiff program when comparing these similar programs—see the xdiff(1) reference page.


Example of Device Driver

The code in /usr/react/src/examples/driver contains a skeletal test-bed for a kernel-level device driver that interacts with the Frame Scheduler. Most of the driver functions consist of minimal or empty stubs. However, the ioctl() entry point to the driver (see the ioctl(2) reference page) simulates a hardware interrupt and calls the Frame Scheduler entry point, frs_handle_driverintr() (see “Generating Interrupts”). This allows you to test the driver. Calling its ioctl() entry is equivalent to using frs_usrintr() (see “The Frame Scheduler API”).

The code in /usr/react/src/examples/dintr contains a variant of the simple example that uses a device driver as the time base. The program dintr/sendintr.c opens the driver, calls ioctl() to send one time-base interrupt, and closes the driver. (It could easily be extended to send a specified number of interrupts, or to send an interrupt each time the return key is pressed.)

Examples of a 60 Hz Frame Rate

The example in directory /usr/react/src/examples/sixtyhz demonstrates the ability to schedule a process at a frame rate of 60 Hz, a common rate in visual simulators. A single Frame Scheduler is created. It uses the cycle counter with an interval of 16,666 microseconds (16.66 ms, approximately 60 Hz). There is one minor frame per major frame.

One real-time process is enqueued to the Frame Scheduler. By changing the compiler constant LOGLOOPS you can change the amount of work it attempts to do in each frame.

This example also contains the code to query and to change the signal numbers used by the Frame Scheduler.

The example in /usr/react/src/examples/memlock is similar to the sixtyhz example, but the activity process uses plock() to lock its address space. Also, it executes one major frame's worth of frs_yield() calls immediately after return from frs_join(). The purpose of this is to “warm up” the processor cache with copies of the process code and data. (An actual application process could access its major data structures prior to this yield in order to speed up the caching process.)

Example of Managing Lightweight Processes

The code in /usr/react/src/examples/upreuse implements a simulated real-time application based on a pool of reusable processes. A reusable process is created with sproc() and described by a pdesc_t structure. Code in pqueue.c builds and maintains a pool of processes. Code in pdesc.c provides functions to get and release a process, and to dispatch one to execute a specific function.

The code in test_hello.c creates a pool of processes and dispatches each one in turn to display a message. The code in test_singlefrs.c creates a pool of processes and causes them to join a Frame Scheduler.