The programs in this appendix illustrate the use of some of the features discussed in the book. The following programs are included:
“Mapping and Reading the Cycle Counter” illustrates the use of the cycle-counter.
“Getting the Time of Day Stamp” illustrates the use of gettimeofday() and shows how to test its precision.
“Interprocess Communication” illustrates some uses of arenas, semaphores, and interval timers.
“Probing the Address Space” displays the addresses assigned to a process address space, and illustrates some uses of mmap().
“Deadline Scheduling Subroutines” illustrates the use of schedctl(2) to set a deadline scheduling policy.
“Asynchronous I/O Example” illustrates the use of asynchronous I/O including four different methods of testing for I/O completion, and also shows process creation with sproc() and the use of semaphores and barriers.
“Guaranteed-Rate Request” demonstrates how to request a guaranteed rate of I/O transfer.
“Frame Scheduler Examples” describes the sample programs distributed with the REACT/Pro Frame Scheduler.
This section contains two example programs. The first simply reports the precision of the hardware cycle counter. The second demonstrates mapping and reading the cycle counter.
The program in Example A-1 is a simple utility that gets the cycle counter precision using syssgi() and displays it. The timer precision (in bits, either 32 or 64) is displayed to standard output. Also, the precision is returned by the program, so it can be tested in a shell script in the $status shell variable.
/*****************************************************************************
||
|| This program makes the value returned by syssgi(SGI_CYCLECNTR_SIZE)
|| accessible at the command line. The output display can be read, or
|| tested in a shell script. The value is also returned, so it can
|| be tested in the $status variable.
||
*****************************************************************************/
#include <sys/syssgi.h> /* for syssgi(), SGI_QUERY_CYCLECNTR */
#include <stdio.h>
int main(int argc, char *argv[])
{
unsigned int tbc = syssgi(SGI_CYCLECNTR_SIZE);
int arg, quiet = 0;
for (arg=1; arg<argc; ++arg)
{
if (0==strcmp(argv[arg],"-q"))
{
quiet = 1;
}
else /* includes case of -h */
{
printf("%s [-h | -q]\n",argv[0]);
printf("\tReport the precision of the hardware cycle counter.\n");
printf("\tPrecision in bits displayed to stdout unless -q.\n");
printf("\tPrecision in bits returned as status.\n");
return tbc;
}
}
if (!quiet)
printf("%d bits in the cycle counter\n",tbc);
return tbc;
}
|
The program in Example A-2 shows how to map the high-precision cycle counter into memory and sample it. The file compiles to a library of the following functions:
mapTheTimer() | Uses mmap() to map the cycle counter into the address space. Returns the unit-value of the timer in picoseconds; for example returns 21000 in a Challenge where the timer unit value is 21 nanoseconds. |
timerBitCount() | Returns the number of bits of precision in the timer, which varies with the CPU board type, either 32 or 64 bits. |
readTimer32() | Returns the least-significant (or only) word of the timer value. |
readTimer64() | Returns the timer value as a 64-bit unsigned integer (extended with 0-bits when necessary). |
main() | Compiled only when variable UNIT_TEST is set, contains code to exercise the preceding functions. |
/*****************************************************************************
||
|| The functions in this module provide access to the free-running timer
|| on the CPU board of certain SGI systems.
||
|| timerBitCount()
||
|| Returns the number of bits of data in the timer, as reported
|| by syssgi(SGI_CYCLECNTR_SIZE):
|| 0 error reported by syssgi -- probably no timer in this machine
|| 32 in an Indy or Crimson
|| 64 in a Challenge, Onyx, and other big machines.
||
|| mapTheTimer()
||
|| This function tests the hardware environment. If the current system has
|| a timer, the function tries to map it into memory. Errors can include:
|| * 0 returned by timerBitCount()
|| * error returned by syssgi(SGI_QUERY_CYCLECNTR)
|| * error returned by mmap(2)
|| When there is no error, the function returns a positive integer which is
|| the number of picoseconds represented by one unit increment of the timer.
|| In the event of an error, the function returns 0, and errno is set to
|| some error code.
|| mapTheTimer() can be called multiple times without harm. To convert
|| its returned value to a fraction of a second, convert to double and
|| multiply by 1e-12.
||
|| readTimer32()
||
|| This function calls mapTheTimer(), if it has not been called already.
|| Thus the first attempt to read the clock will map it if necessary.
|| If the timer has been mapped, its least-significant bits are returned
|| as an unsigned 32-bit integer.
|| * if mapTheTimer() failed, the returned value is always 0
|| * if the timer has 32-bit precision, the returned value is
|| the whole timer value
|| * if the timer has 64-bit precision (e.g. Challenge), the returned
|| value is the low-order word.
||
|| readTimer64()
||
|| This function is like readTimer32(), except that it returns an unsigned
|| 64-bit integer.
|| * if mapTheTimer() failed, the returned value is always 0
|| * if the timer has 32-bit precision, the returned value is
|| the whole timer value, extended with high-order 0-bits
|| * if the timer has 64-bit precision, the returned value is the whole
|| timer value. The 64-bit timer is sampled in such as way as to
|| compensate for rollover while minimizing bus traffic.
||
|| main()
||
|| Compiled only when UNIT_TEST is defined, provides a functional test
|| platform for the above functions.
||
|| NOTE: in two of these routines we assume that this machine is operating
|| in big-endian mode, such that the least-significant 32 bits of a
|| long-long are at the higher word address.
||
*****************************************************************************/
#include <stddef.h> /* for NULL */
#include <fcntl.h> /* for O_RDONLY and open() */
#include <unistd.h> /* for getpagesize() */
#include <sys/mman.h> /* for constants used with mmap() */
#include <sgidefs.h> /* for __psint_t, __uint*_t, and ABI defs */
#include <sys/syssgi.h> /* for syssgi(), SGI_QUERY_CYCLECNTR */
#include <errno.h> /* for errno global */
/*****************************************************************************
|| The following globals are set up by mapTheTimer() the first time called.
|| timerMapAddress == NULL means mapTheTimer() has never been called
|| == -1 means mapTheTimer() called and failed
|| else it points to the timer in memory
|| The data type (void *) is coerced to __uint32_t or __uint64_t in use.
||
|| The "volatile" declaration keeps the compiler from optimizing away
|| successive references to it.
||
|| timerPicoSecs == 0 means the timer has not been mapped successfully
|| else is the value returned by syssgi(QUERY_CYCLECOUNTER)
||
|| timerPrecision == value returned by syssgi(SGI_CYCLECNTR_SIZE),
|| but as this value is needed in the timer-reading
|| functions, it is cached, so as to avoid a system call
|| every time we read the clock.
||
|| If this code was redone in C++ (not a bad idea, feel free) these would
|| be class variables.
*****************************************************************************/
#define TIMER_IS_MAPPED (0 != timerPicoSecs)
#define TIMER_MAP_ATTEMPTED (NULL != timerMapAddress)
static volatile void * timerMapAddress = NULL;
static unsigned int timerPicoSecs = 0;
static unsigned int timerPrecision = 0;
unsigned int
mapTheTimer()
{
__uint32_t timerUnits = 0; /* receives timer picosecond unit value */
__psint_t timerPhysAddr; /* receives timer absolute address */
__psint_t timerPhysVPN; /* timerPhysAddr masked to a page boundary */
__psint_t addrMask; /* page offset bit mask */
int fdMem; /* file descriptor for /dev/mmem */
if ( ! TIMER_MAP_ATTEMPTED) /* first time through this code */
{
/*
|| Get the physical address of the clock in full. If there
|| is no cycle counter on this machine, syssgi returns -1.
*/
timerPhysAddr = syssgi(SGI_QUERY_CYCLECNTR, &timerUnits);
if ((__psint_t)-1 != timerPhysAddr) /* we have a timer */
{
/*
|| Trim out the offset from the address leaving the
|| page number part of the address. (VPN == virtual page number)
*/
addrMask = getpagesize() - 1;
timerPhysVPN = timerPhysAddr & ~addrMask;
/*
|| Map the page containing the clock's address into the virtual
|| address space of this process.
*/
fdMem = open("/dev/mmem", O_RDONLY);
timerMapAddress = (void *) mmap(
NULL, /* addr = 0, don't care it goes */
addrMask, /* len = pagesize - 1 */
PROT_READ, /* prot = read-only */
MAP_PRIVATE, /* changes are unshared (n.a.) */
fdMem, /* map base is physical memory */
(off_t)timerPhysVPN /* source address to map */
);
if ((__psint_t)-1 != (__psint_t)timerMapAddress)
{
/*
|| mmap() succeeded, cache info in global variables.
*/
timerPicoSecs = timerUnits;
timerPrecision = syssgi(SGI_CYCLECNTR_SIZE);
/*
|| Restore any nonzero offset bits to mapped page address.
*/
timerMapAddress = (void*) (
((__psint_t)timerMapAddress) /* addr as int */
| (timerPhysAddr & addrMask) /* plus offset bits */
);
}
else
; /* mmap() failed, timerMapAddress == -1, errno set */
} /* end syssgi() successful */
else
{
timerMapAddress = (void *)-1; /* syssgi error, no timer (?) */
}
} /* end attempting to initialize */
return timerPicoSecs;
}
unsigned int
timerBitCount()
{
if (TIMER_IS_MAPPED)
return timerPrecision;
if ( ! TIMER_MAP_ATTEMPTED)
{
mapTheTimer();
return timerPrecision;
}
else return 0;
}
/*****************************************************************************
||
|| In both of the following routines, one goal is to minimize the number of
|| references to the mapped timer. Reason: each such reference is an
|| uncached memory reference plus a bus access, taking at least 1 usec and
|| possibly more depending on the machine. Unnecessary references to the
|| timer should be avoided when possible.
||
|| If the timer has 64 bits, return its least-significant word. Which word
|| is that? This code assumes the big-endian model. An alternative
|| would be to load the long-long value and force C to convert it. That is
|| be portable but would hit the bus twice instead of once, nullifying the
|| speed advantage that this routine has over the one following.
||
*****************************************************************************/
__uint32_t
readTimer32()
{
__uint32_t ret = 0;
if ( ! TIMER_IS_MAPPED ) mapTheTimer();
if ( TIMER_IS_MAPPED ) /* timer mapped ok */
{
if (64 == timerPrecision)
ret = ((__uint32_t *)timerMapAddress)[1]; /* low word of 2 */
else /* in IRIX 6.2, 32 bits is the only alternative */
ret = *((__uint32_t *)timerMapAddress);
}
return ret;
}
/*****************************************************************************
||
|| When the timer has 32 bits, just fake up a long-long and return it.
|| For long timers we must ask: was this code compiled to an ABI that does
|| atomic loads of long-longs (-64 or -n32), or not (-32)?
|| In the newer ABIs, we just fetch the 64-bit timer in one move.
||
|| When compiled under a 32-bit system, the generated code loads the timer
|| value in two "lw" instructions. The low word of the timer overflows into
|| the high word about every 90 seconds, and if that happens between the
|| lw's, the result will be wrong. Worse, we cannot be certain which of the
|| two words the compiler will choose to load first, the low or the high.
||
|| In order to minimize the number of uncached accesses, we test for
|| overflow only when it has recently happened; that is, when
|| the most significant 9 bits of the low word are all-0. This
|| condition defines a window of 0.17 seconds following the overflow
|| (21e-12 * 2^23 == .176160768).
|| If this were kernel code, the window could be much smaller. In enabled
|| code we have to allow for a series of interrupts between the load of the
|| upper and lower words. As it is, if we load the upper word just before
|| overflow, and an interrupt delays the next fetch 0.17+ seconds, we will
|| return an incorrect value.
||
*****************************************************************************/
__uint64_t
readTimer64()
{
union {
struct { __uint32_t msw,lsw; }w;
__uint64_t ll;
} ret;
ret.ll = 0;
if ( ! TIMER_IS_MAPPED ) mapTheTimer();
if ( TIMER_IS_MAPPED ) /* it mapped ok */
{
if (timerPrecision == 32)
{
ret.w.msw = 0;
ret.w.lsw = *((__uint32_t *)timerMapAddress);
}
else
{
#if (_MIPS_SIM == _MIPS_SIM_NABI32 || _MIPS_SIM == _MIPS_SIM_ABI64)
/* 64-bit loads are atomic */
ret.ll = *(__uint64_t *)timerMapAddress;
#else /* 64-bit loads are not atomic */
ret.w.msw = ((__uint32_t *)timerMapAddress)[0];
ret.w.lsw = ((__uint32_t *)timerMapAddress)[1];
if ( (ret.w.lsw & 0xff800000) == 0)
{
/*
|| The high word incremented not more than .17 sec ago.
|| Provided there is not a delay here exceeding 89.8 sec,
|| the following single load ensures we have the high word
|| value that is correctly associated with the low word
|| we already picked up.
*/
ret.w.msw = ((__uint32_t *)timerMapAddress)[0];
}
#endif
}
}
return ret.ll;
}
#ifdef UNIT_TEST
#include <stdio.h>
int main(int argc, char*argv[])
{
int j;
int numTix = 10;
unsigned int picosecs;
unsigned short tbits;
double dmicsecs;
if (argc>1) numTix = atoi(argv[1]);
if ( picosecs = mapTheTimer() )
{
tbits = timerBitCount();
dmicsecs = ((double)picosecs)/1e6;
printf("The timer has %d bits of precision\n",tbits);
printf("One timer unit == %d picoseconds or %g us\n",
picosecs, dmicsecs);
}
else
{
perror("mapTheTimer");
return errno;
}
{
__uint32_t st1, st2, stx;
st1 = readTimer32();
printf("\nreading timer as 32 bits\n\n");
for(j=0; j<numTix; ++j)
{
st2 = readTimer32();
stx = st2 - st1;
printf("0x%0x - 0x%0x = 0x%0x (%g usecs)\n",
st2, st1, stx, (stx*dmicsecs) );
st1 = st2;
}
}
{
__uint64_t lt1, lt2, ltx;
lt1 = readTimer64();
printf("\nreading timer as 64 bits\n\n");
for(j=0; j<numTix; ++j)
{
lt2 = readTimer64();
ltx = lt2 - lt1;
printf("0x%0llx - 0x%0llx = 0x%0llx (%g usecs)\n",
lt2, lt1, ltx, (ltx * dmicsecs));
lt1 = lt2;
}
}
}
#endif
|
The program in Example A-3 tests the precision of the time of day stamp returned by gettimeofday(). The function getTODdiff() contains an example call to gettimeofday().
The program in Example A-6 illustrates the use of some of the interprocess communication (IPC) features of IRIX, in particular:
Code following allocRBStuff() demonstrates the creation of a shared-memory arena and suballocation of memory, semaphores, and locks within the arena.
Code following inputProcess(void *arena) demonstrates the use of the P and V operations on a semaphore, and testing the value of a semaphore without waiting on it.
Code following outputProcess(void *arena) demonstrates the use of POSIX-type signal-handling functions.
Code following showSemaInfo(char *semaName, usema_t *sema) demonstrates how to extract metering information from a metered semaphore and display it.
Code following main(int argc, char**argv) demonstrates the creation of child processes using sproc().
The program models a real-time data-collection program. The main process establishes an arena. Within the arena it creates a data structure that defines and manages a ring buffer. Then the main process uses sproc() to create three processes:
inputProcess() generates random-integer “input data” and stores it in the ring buffer. To simulate an unpredictable and varying input rate, the process “receives” bursts of from 1 to 16 data items. The average input rate is calculable (see the commentary in the code).
The number of items to generate can be specified on the command line as the -c option followed by the count. The default is 2000 items. After generating that many items, inputProcess() waits until all data has been consumed, then terminates.
outputProcess()—of which two instances are created—takes data from the ring buffer. To simulate a steady average output rate, each process sets a repeating itimer and takes one data item each time the timer expires. The itimer interval represents the simulated “processing time” of a data item. This interval can be specified on the command line as the -t option followed by the interval in microseconds. The default is 10,000 (10 milliseconds per item per process, an output rate of 200 items/second).
After starting the three processes, the main process waits for one to terminate. When there are no errors, inputProcess() is the first and only process to terminate—the two outputProcess() instances end up blocked on a semaphore, waiting for more data.
The main process kills the remaining processes; then displays the metering information from the lock and semaphores, and terminates the program.
The three simulated real-time processes communicate through two semaphores and a lock.
Semaphore semRBdata represents the number of data items now in the ring buffer. inputProcess() does the V operation, increasing the semaphore count with each input datum; outputProcess() does the P operation, decreasing the count with each output.
Semaphore semRBspace represents the number of empty slots in the ring buffer. inputProcess() does the P operation to acquire an empty slot, and outputProcess() does the V operation when it releases a slot.
Lock lockRBupdate represents the right to alter the ring buffer index values. All processes set this lock before modifying the ring buffer, and clear it afterward.
The displayed metering data at the end of the program shows whether the output processes could keep up with the input process. It is necessary to run the program with a nondegrading real-time priority to get consistent results. The output in Example A-4 shows a case in which output did not keep up.
# npri -h 39 ./ringBuffer -t 20000 Lock lockRBupdate acquired 4004 times, 4004 without waiting (100%) Metering info on sema semRBdata P: 2004, 2000 with no wait (99%) V: 2002, 2 with P waiting (0%) Metering info on sema semRBspace P: 2002, 1423 with no wait (71%) V: 2002, 579 with P waiting (28%) |
In Example A-4, look first at the P operations for semRBspace. 71% of the time, when inputProcess() applies uspsema() to this semaphore to acquire a slot in the ring buffer, it does not wait. However, 29% of the time it did wait, meaning that the ring buffer was full and no free slots were available until an outputProcess() released one. Clearly, the output processes were not keeping up with the input data rate.
# npri -h 39 ./ringBuffer -t 5000 Lock lockRBupdate acquired 4004 times, 4004 without waiting (100%) Metering info on sema semRBdata P: 2004, 1565 with no wait (78%) V: 2002, 437 with P waiting (21%) Metering info on sema semRBspace P: 2002, 2002 with no wait (100%) V: 2002, 0 with P waiting (0%) |
Example A-5 shows a test run in which the output processes did keep up with the input rate. In every case, inputProcess() was able to acquire a slot from semRBspace without waiting. 22% of the time, when an outputProcess() tried to acquire a data item from semRBdata, it had to wait, meaning the ring buffer was empty. (This percentage would be higher if inputProcess() did not frequently dump blocks of 2-16 items into the buffer.)
The sample program in Example A-7 uses some generally unsafe coding tricks to get the addresses of segments for text, stack, library DSO and mapped data. It demonstrates the use of mmap() with /dev/zero, for default and absolute segment addresses.
#include <stddef.h> /* for standard malloc(3C) */
#include <unistd.h> /* for sbrk(2) */
#include <stdio.h> /* for printf */
#include <sys/types.h> /* for __psint_t */
/* include <sys/stat.h> */
#include <sys/fcntl.h> /* for O_RDWR */
#include <sys/mman.h> /* for mmap(2) */
#define DISPLAY(v,t) {printf("%s:\t%0lx\n",t,(__psint_t)v);}
int main()
{
/*
|| Get a mask that truncates an address to a page boundary.
*/
__psint_t psize = getpagesize();
__psint_t pmask = ~(psize-1);
/*
|| Get a file descriptor for the nothing device.
|| Use that FD to map two segments of memory containing 00.
*/
int zero = open("/dev/zero",O_RDWR);
void * zmap1 = mmap(0,16384,PROT_WRITE,MAP_SHARED,zero,0);
void * zmap2 = mmap(0,16384,PROT_WRITE,MAP_SHARED,zero,0);
/*
|| Map one segment at a designated address reserved for
|| user maps by the MIPS ABI.
*/
void * abi_map = (char *)mmap((void *)0x30040000L,16384,
PROT_WRITE,MAP_SHARED+MAP_FIXED,zero, 0);
/*
|| Get the address of this program.
*/
char * poke = (char *)((__psint_t)main);
/*
|| Get some program addresses supplied by ld(1), but note
|| the warnings in end(3C) -- these addresses "have no standard
|| definition" when multiple text/data segments exist.
*/
extern int _ftext[];
void * ld_ftext = (void *)_ftext;
extern int _etext[];
void * ld_etext = (void *)_etext;
extern int _fdata[];
void * ld_fdata = (void *)_fdata;
extern int _edata[];
void * ld_edata = (void *)_edata;
extern int _fbss[];
void * ld_fbss = (void *)_fbss;
extern int _end[];
void * ld_end = (void *)_end;
/*
|| Get the address of some code in the libc DSO.
*/
void * libc_adr = (void *)fprintf;
/*
|| Get the current start and end of the heap.
*/
void * malloc_adr = (void *)malloc((size_t)256);
void * brk_adr = sbrk(0);
/*
|| Get the address of an item in our stack space.
*/
void * stack_adr = (void *)&psize;
/*
|| Display all the above.
*/
DISPLAY(psize,"Page size")
DISPLAY(zmap1,"Mapped segment 1")
DISPLAY(zmap2,"Mapped segment 2")
DISPLAY(abi_map,"ABI mapped segment")
DISPLAY(ld_ftext,"Text starts")
DISPLAY(ld_etext,"Text ends")
DISPLAY(ld_fdata,"Initialized data starts")
DISPLAY(ld_edata,"Initialized data ends")
DISPLAY(ld_fbss,"Uninitialized starts")
DISPLAY(ld_end,"Uninitialized ends")
DISPLAY(malloc_adr,"Heap data starts")
DISPLAY(brk_adr,"Heap data ends")
DISPLAY(stack_adr,"Stack data")
DISPLAY(libc_adr,"Spot in one DSO")
/*
|| See if we can get away with patching our own text.
*/
if (!mprotect((void *)(pmask&(__psint_t)poke),psize,PROT_WRITE+PROT_EXEC))
{
poke[0] = poke[0];
printf("I wrote into program text\n");
}
else
{
perror("mprotect(text)");
}
}
|
The following example contains two subroutines that simplify the interface to the schedctl() function for deadline scheduling. If the code is compiled with variable UNIT_TEST defined, it compiles a main() procedure that runs a test. Otherwise it compiles only the functions. A test run of the program resembles the following:
% setDeadline 20 100 schedule pid 0 for 20% of 100ms --> 0 policy DL_ONLY-->0 policy DL_ANY-->0 policy DL_RELEASE-->0 |
On a uniprocessor, a request for much more than 20% of the CPU is rejected. On a multiprocessor, a request for 98% or 99% is generally successful.
The program in Example A-9 demonstrates the use some asynchronous I/O functions. The basic purpose of the program is to read a list of input files and write their concatenated contents as its output—work that does not normally require asynchronous I/O. However, this test program reads the input files using aio_read(), and writes the output files using aio_write() and aio_fsync(). In addition, it can be compiled in either of two ways,
to copy the input files one at a time, using subroutine calls
to copy the input files concurrently, using a separate process for each input file
There is no functional advantage to using multiple processes. Doing so merely makes the example more interesting. It also demonstrates that, even though multiple processes ask for output at different points in the same file at the same time, the output is written to the requested offsets.
The reading and writing is done in one of four functions. The functions all perform the following sequence of actions:
Initialize the aiocb for the type of notification desired. The type of notification is the principal difference between the functions: some use signals, some callback functions, some no notification.
Until the input file is exhausted,
Call aio_read() for up to one BLOCKSIZE amount from the next offset in the input file
Wait for the read to complete
Call aio_write() to write the data read to the next offset in the output file
Wait for the write to complete
Use aio_fsync() to ensure that output is complete and wait for it to complete.
The four functions, inProc0() through inProc3(), differ only in the method they use to wait for completion.
inProc0() alternates calling aio_error() with sginap() until the status is other than EINPROGRESS.
inProc1() calls aio_suspend() to wait for the current operation.
inProc2() sets the aiocb to request a signal on completion. Then it waits on a semaphore that is posted from the signal handler function.
inProc3() waits on a semaphore which is posted from a callback function.
You select which of the four function to use with the -a argument to the program. If you compile the program with the variable DO_SPROCS defined as 0, the chosen function is called as a subroutine once for each input file. If you compile with DO_SPROCS defined as 1, the chosen function is launched by sprocsp() once for each input file.
The following subroutine simplifies the task of requesting a guaranteed rate of I/O transfer. The file descriptor passed to function requestRate() must describe a file located in the real-time subvolume of a volume managed by XLV and XFS.
A number of example programs are distributed with the REACT/Pro Frame Scheduler. This section describes them. Only one is reproduced here; the others are found on disk.
The example programs distributed with the Frame Scheduler are found in the directory /usr/react/src/examples. They are summarized in Table i and are discussed in more detail in the topics that follow.
Table A-1. Summary of Frame Scheduler Example Programs
Directory | Features of Example |
|---|---|
simple | Two processes scheduled on a single CPU at a frame rate slow enough to permit use of printf() for debugging. The examples differ in the time base used; and the r4k_intr code uses a barrier for synchronization. |
mprogs | Like simple, but the scheduled processes are independent programs. |
multi | Three synchronous Frame Schedulers running lightweight processes on three processors. These examples are much alike, differing mainly in the source of the time base interrupt. |
complete | Like multi in starting three Frame Schedulers. Information about the activity processes is stored in arrays for convenient maintenance. The stop_resume code demonstrates frs_stop() and frs_resume() calls. |
driver | driver contains a pseudo-device driver that demonstrates the Frame Scheduler device driver interface. dintr contains a program based on simple that uses the example driver as a time base. |
sixtyhz | One process scheduled at a 60 Hz frame rate. The activity process in the memlock example locks its address space into memory before it joins the scheduler. |
upreuse | Complex example that demonstrates the creation of a pool of reusable processes, and how they can be dispatched as activity processes on a Frame Scheduler. |
The example in /usr/react/src/examples/simple shows how to create a simple application using the Frame Scheduler API. The code in /usr/react/src/examples/r4kintr is similar.
The application consists of two processes that have to periodically execute a specific sequence of code. The period for the first process, process A, is 600 milliseconds. The period for the other process, process B, is 2400 ms.
![]() | Note: Such long periods are unrealistic for real-time applications. However, they allow the use of printf() calls within the “real-time” loops in this sample program. |
The two periods and their ratio determine the selection of the minor frame period—600 ms—and the number of minor frames per major frame—4, for a total of 2400 ms.
The discipline for process A is strict real-time (FRS_DISC_RT). Underrun and overrrun errors should cause signals.
Process B should run only once in 2400 ms, so it operates as Continuable over as many as 4 minor frames. For the first 3 frames, its discipline is Overrunnable and Continuable. For the last frame it is strict real-time. The Overrunnable discipline allows process B to run without yielding past the end of each minor frame. The Continuable discipline ensures that once process B does yield, it is not resumed until the fourth minor frame has passed. The combination allows process B to extend its execution to the allowable period of 2400 ms, and the strict real-time discipline at the end makes certain that it yields by the end of the major frame.
There is a single Frame Scheduler so a single processor is used by both processes. Process A runs within a minor frame until yielding or until the expiration of the minor frame period. In the latter case the frame scheduler generates an overrun error signaling that process A is misbehaving.
When process A yields, the frame scheduler immediately activates process B. It runs until yielding, or until the end of the minor frame at which point it is preempted. This is not an error since process B is Overrunable.
Starting the next minor frame, the Frame Scheduler allows process A to execute again. After it yields, process B is allowed to resume running, if it has not yet yielded. Again in the third and fourth minor frame, A is started, followed by B if it has not yet yielded. At the interrupt that signals the end of the fourth frame (and the end of the major frame), process B must have yielded, or an overrun error is signalled.
The code in directory /usr/react/src/examples/mprogs does the same work as example simple (see “Basic Example”). However, the activity processes A and B are physically loaded as separate commands. The main program establishes the single Frame Scheduler. The activity processes are started as separate programs. They communicate with the main program using SVR4-compatible interprocess communication messages (see the intro(2) and msgget(2) reference pages).
There are three separate executables in the mprogs example. The master program, in master.c, is a command that has the following syntax:
master [-p cpu-number] [-s slave-count] |
The cpu-number specifies which processor to use for the one Frame Scheduler this program creates. The default is processor 1. The slave-count tells the master how many subordinate programs will be enqueued to the Frame Scheduler. The default is two programs.
The problems that need to be solved in this example are as follows:
The frs-master program must enqueue the activity processes. However, since they are started as separate programs, the master has no direct way of knowing their process IDs, which are needed for frs_enqueue().
The activity processes need to specify upon which minor frames they should be enqueued, and with what discipline.
The master needs to enqueue the activities in the proper order on their minor frames, so they will be dispatched in the proper sequence. Therefore the master has to distinguish the subordinates in some way; it cannot treat them as interchangeable.
The activity processes must join the Frame Scheduler, so they need the handle of the Frame Scheduler to use as an argument to frs_join(). However, this information is in the master's address space.
If an error occurs when enqueueing, the master needs to tell the activity processes so they can terminate in an orderly way.
There are many ways in which these objectives could be met (for example, the three programs could share a shared-memory arena). In this example, the master and subordinates communicate using a simple protocol of messages exchanged using msgget() and msgput() (see the msgget(2) and msgput(2) reference pages). The sequence of operations is as follows:
The master program creates a Frame Scheduler.
The master sends a message inviting the most important subordinate to reply. (All the message queue handling is in module ipc.c, which is linked by all three programs.)
The subordinate compiled from the file processA.c replies to this message, sending its process ID and requesting the FRS handle.
The subordinate process A sends a series of messages, one for each minor queue on which it should enqueue. The master enqueues it as requested.
The subordinate process A sends a “ready” message.
The master sends a message inviting the next most important process to reply.
The program compiled from processB.c will reply to this request, and steps 3-6 are repeated for as many slaves as the slave-count parameter to the master program. (Only two slaves are provided. However, you can easily create more using processB.c as a pattern.)
The master issues frs_start(), and waits for the termination signal.
The subordinates independently issue frs_join() and the real-time dispatching begins.
The example in /usr/react/src/examples/multi demonstrates the creation of three synchronized Frame Schedulers. The three use the cycle counter to establish a minor frame interval of 50 ms. All three Frame Schedulers use 20 minor frames per major frame, for a major frame rate of 1 Hz.
The following processes are scheduled in this example:
Processes A and D require a frequency of 20 Hz
Process B requires a frequency of 10 Hz and can consume up to 100 ms of execution time each time
Process C requires a frequence of 5 Hz and can consume up to 200 ms of execution time each time
Process E requires a frequency of 4 Hz and can consume up to 250 ms of execution time each time
Process F requires a frequency of 2 Hz and can consume up to 500 ms of execution time each time
Processes K1, K2 and K3 are background processes that should run as often as possible, when time is available.
The processes are assigned to processors as follows:
Scheduler 1 runs processes A (20 Hz) and K1 (background).
Scheduler 2 runs processes B (10 Hz), C (5 Hz), and K2 (background).
Scheduler 3 runs processes D (20Hz), E (4 Hz), F (2 Hz), and K3.
In order to simplify the coding of the example, all real-time processes use the same function body, process_skeleton(), which is parameterized with the process name, the address of the Frame Scheduler it is to join, and the address of the “real-time” action it is to execute. In the sample code, all real-time actions are empty function bodies (feel free to load them down with code).
The examples in /usr/react/src/examples/ext_intr, user_intr, and vsync_intr are all similar to multi, differing mainly in the time base used. The examples in complete and stop_resume are similar in operation, but more evolved and complex in the way they manage subprocesses.
![]() | Tip: It is helpful to use the xdiff program when comparing these similar programs—see the xdiff(1) reference page. |
The code in /usr/react/src/examples/driver contains a skeletal test-bed for a kernel-level device driver that interacts with the Frame Scheduler. Most of the driver functions consist of minimal or empty stubs. However, the ioctl() entry point to the driver (see the ioctl(2) reference page) simulates a hardware interrupt and calls the Frame Scheduler entry point, frs_handle_driverintr() (see “Generating Interrupts”). This allows you to test the driver. Calling its ioctl() entry is equivalent to using frs_usrintr() (see “The Frame Scheduler API”).
The code in /usr/react/src/examples/dintr contains a variant of the simple example that uses a device driver as the time base. The program dintr/sendintr.c opens the driver, calls ioctl() to send one time-base interrupt, and closes the driver. (It could easily be extended to send a specified number of interrupts, or to send an interrupt each time the return key is pressed.)
The example in directory /usr/react/src/examples/sixtyhz demonstrates the ability to schedule a process at a frame rate of 60 Hz, a common rate in visual simulators. A single Frame Scheduler is created. It uses the cycle counter with an interval of 16,666 microseconds (16.66 ms, approximately 60 Hz). There is one minor frame per major frame.
One real-time process is enqueued to the Frame Scheduler. By changing the compiler constant LOGLOOPS you can change the amount of work it attempts to do in each frame.
This example also contains the code to query and to change the signal numbers used by the Frame Scheduler.
The example in /usr/react/src/examples/memlock is similar to the sixtyhz example, but the activity process uses plock() to lock its address space. Also, it executes one major frame's worth of frs_yield() calls immediately after return from frs_join(). The purpose of this is to “warm up” the processor cache with copies of the process code and data. (An actual application process could access its major data structures prior to this yield in order to speed up the caching process.)
The code in /usr/react/src/examples/upreuse implements a simulated real-time application based on a pool of reusable processes. A reusable process is created with sproc() and described by a pdesc_t structure. Code in pqueue.c builds and maintains a pool of processes. Code in pdesc.c provides functions to get and release a process, and to dispatch one to execute a specific function.
The code in test_hello.c creates a pool of processes and dispatches each one in turn to display a message. The code in test_singlefrs.c creates a pool of processes and causes them to join a Frame Scheduler.