This chapter provides an introduction to Performance Co-Pilot (PCP), an overview of its individual components, and conceptual information to help you use this product. The following sections are included:
“Objectives of Performance Co-Pilot” covers the intended purposes of PCP.
“Overview of Component Software” describes PCP tools and agents.
“Conceptual Foundations” discusses and design theories behind PCP.
PCP provides a range of services that may be used to monitor and manage system performance. These services are distributed and scalable to accommodate the most complex system configurations and performance problems.
PCP is targeted at the performance analyst, benchmarker, capacity planner, developer, database administrator, or system administrator with an interest in overall system performance and a need to quickly isolate and understand performance behavior, resource utilization, activity levels, and bottlenecks in complex systems. Platforms that can benefit from this level of performance analysis include large servers, server clusters, or multi-server sites delivering DBMS, compute, Web, file, or video services.
To deal efficiently with the dynamic behavior of complex systems, performance analysts need to filter out “noise” from the overwhelming stream of performance data, and focus on exceptional scenarios. Visualization of current and historical performance data, and automated reasoning about performance data, effectively provide this filtering.
From the PCP end user's perspective, PCP presents an integrated suite of tools, user interfaces, and services that support real-time and retrospective performance analysis, with a bias towards eliminating mundane information and focusing attention on the exceptional and extraordinary performance behaviors. When this is done, the user can concentrate on in-depth analysis or target management procedures, for those critical system performance problems.
At the lowest level, performance metrics are collected and managed in autonomous performance domains such as the IRIX operating system, a database management system, a layered service, or an end-user application. These domains feature a multitude of access control policies, access methods, data semantics, and multiversion support. All this detail is irrelevant to the developer or user of a performance monitoring tool, and is hidden by the PCP infrastructure.
Performance Metric Domain Agents (PMDAs) within PCP encapsulate the knowledge about, and export performance information from, autonomous performance domains.
Usability and extensibility of performance management tools mandate a single scheme for naming performance metrics. The set of defined names constitutes a Performance Metrics Name Space (PMNS). Within PCP, the PMNS is adaptive so it can be extended, reshaped, and pruned to meet the needs of particular applications and users.
PCP provides a single interface to name and retrieve values for all performance metrics, independently of their source or location.
From a purely pragmatic viewpoint, a single workstation must be able to monitor the concurrent performance of multiple remote hosts. At the same time, a single host may be subject to monitoring from multiple remote workstations.
These requirements suggest a classical client-server architecture, which is exactly what PCP uses to provide concurrent and multiconnected access to performance metrics, independent of their host location.
Complex systems are subject to continual changes as network connections fail and are re-established; nodes are taken out of service and rebooted; hardware is added and removed; and software is upgraded, installed, or removed. Often these changes are asynchronous and remote (perhaps in another geographic region or domain of administrative control).
The distributed nature of the PCP (and the modular fashion in which performance metrics domains can be installed, upgraded, and configured on different hosts) enables PCP to adapt concurrently to changes in the monitored system(s). Variations in the available performance metrics as a consequence of configuration changes are handled automatically and become visible to all clients as soon as the reconfigured host is rebooted or the responsible agent is restarted.
PCP also detects loss of client-server connections, and most clients support subsequent automated reconnection.
A range of tools is provided to support flexible, adaptive logging of performance metrics for archive, playback, remote diagnosis, and capacity planning. PCP archive logs may be accumulated either at the host being monitored, at a monitoring workstation, or both.
A universal replay mechanism, modeled on VCR controls, supports “stop, rewind, random seek, and replay at variable speed” processing of archived performance data.
Most PCP applications are able to process archive logs and real-time performance data with equal facility. Unification of real-time access and access to the archive logs, in conjunction with VCR service, provides new and powerful ways to build performance tools and to review both current and historical performance data.
For operational and production environments, PCP provides a framework with scripts to customize in order to automate the execution of ongoing tasks such as these:
centralized archive logging for multiple remote hosts
archive log rotation, consolidation and culling
Web-based publishing of charts showing snapshots of performance activity levels in the recent past
flexible alarm monitoring for critical performance scenarios
retrospective performance audits covering the recent past; for example, daily or weekly checks for performance regressions or quality of service problems
PCP permits the integration of new performance metrics into the Performance Metrics Name Space (PMNS), the collection infrastructure, and the logging framework. The guiding principle is, “if it is important for monitoring system performance, and you can measure it, you can easily integrate it into the PCP framework.”
For many PCP customers, the most important performance metrics are not those already supported, but new performance metrics that characterize the essence of “good” or “bad” performance at their site, or within their particular application environment.
One example is an application that measures the round-trip time for a benign “probe” transaction against some mission-critical application.
For application developers, a library is provided to support easy-to-use insertion of trace and monitoring points within an application, and the automatic export of resultant performance data into the PCP framework. Other libraries and tools aid the development of customized and fully featured Performance Metrics Domain Agents (PMDAs).
Extensive source code examples are provided in the distribution, and by using the PCP toolkit and interfaces, these customized measures of performance or quality of service can be easily and seamlessly integrated into the PCP framework.
The following features are included in this release of Performance Co-Pilot:
| Metric Coverage |
| |
| Add-on Products |
The add-on products share the basic PCP operational model, APIs, architectural deployment, and protocols. Additional documentation is provided with each add-on product to describe specific installation, operation, and functional details. | |
| Secure Operation |
|
PCP is composed of text-based tools, graphical tools, and related commands. Each tool or command is fully documented by a reference page. These reference pages are named after the tools or commands they describe, and are accessible through the man command. For example, to see the pcp(1) reference page for the pcp command enter this command:
man pcp |
Many PCP tools and commands are accessible from an Icon Catalog on the IRIX desktop, grouped under PerfTools. In the Toolchest Find menu, choose PerfTools; an Icon Catalog appears, containing clickable PCP programs. To bring up a Web-based introduction to Performance Co-Pilot, click the AboutPCP icon.
A list of PCP tools and commands, grouped by functionality, is provided in the following four subsections.
These tools provide the principal services for the Performance Co-Pilot end-user with an interest in monitoring, visualizing, or processing performance information collected either in real time or from PCP archive logs:
| cachemiss | The cachemiss tool may be used to measure the cycle-time penalties associated with operations that “miss” in the primary and secondary data caches of the processor-memory hierarchy. | |
| dkvis | The dkvis tool displays a three-dimensional bar chart showing activity in the disk subsystem.[1] | |
| mpvis | The mpvis tool displays a three-dimensional bar chart of multiprocessor CPU utilization.† | |
| nfsvis | The nfsvis tool displays a three-dimensional bar chart showing NFS (Network File System) client and server request activity, for systems on which the optional NFS software product has been installed.† | |
| osvis | The osvis tool displays three-dimensional bar charts covering many aspects of system performance, including disk use, job load, memory, CPU activity, and network I/O.† | |
| oview | The oview tool visualizes the performance of Origin systems, showing a dynamic display of Origin node topology and performance. | |
| xbowvis | The xbowvis tool visualizes the Crossbow (XBow) packet and error rates on platforms that support this hardware.† | |
| pmchart | The pmchart tool displays trends over time for arbitrarily selected performance metrics from one or more hosts, or from one or more performance metric domains. | |
| pmdumpmineset | The pmdumpmineset tool is a wrapper for pmdumptext that produces data files suitable for importing into the MineSet data mining product. | |
| pmdumptext | The pmdumptext command outputs the values of performance metrics collected live or from a PCP archive, as ASCII text. | |
| pmem | The pmem command reports per-process memory usage statistics. Both virtual size and pro-rated physical memory usage are reported. | |
| pmgadgets | The pmgadgets command creates a small window containing a collection of graphical gadgets of assorted type and style, driven by performance metrics supplied by the PCP framework. Any numeric metric can be used to animate a gadget.[2] | |
| pmgevctr | The pmgevctr tool uses pmgadgets to display an animated gadget that reports activity in the CPU and memory subsystems. | |
| pmgirix | The pmgirix command determines the hardware configuration of a remote or local system, constructs a suitable specification for a system-level visual monitor, and launches the pmgadgets tool to animate the monitor using IRIX performance metrics. | |
| pmie | The pmie tool is an inference engine to evaluate predicate-action rules over performance metrics domain, for performance alarms, automated system management tasks, dynamic tuning configuration, and so on. | |
| pminfo | The pminfo command displays information about arbitrary performance metrics available from PCP, including help text with -T. | |
| pmkstat | The pmkstat command provides a text-based display of metrics that summarize system performance at a high level, suitable for ASCII logs or enquiry over a modem. | |
| pmsocks | The pmsocks command allows the execution of PCP tools through a network firewall system provided sockd services are supported. | |
| pmtime | The pmtime command provides a graphical user interface for PCP applications requiring time control. | |
| pmval | The pmval command provides text-based display of the values for arbitrary instances of a selected performance metric, suitable for ASCII logs or enquiry over a modem. | |
| pmview | The pmview tool is a generalized three dimensional Open Inventor application that supports dynamic displays of clusters of related performance metrics as groups of utilization blocks (or towers) on a common base plane. | |
| psmon | The psmon script selects a subset of the actively running processes and launches either pmchart or pmlogger to collect per-process metrics for those processes. |
Performance Co-Pilot provides the following tools to support real-time data collection, network transport, and archive log creation services for performance data:
| mkaf | The mkaf tool aggregates an arbitrary collection of PCP archive logs into a “folio” to be used with pmafm. | |
| pmafm | The pmafm tool is used to interrogate, manage, and replay an archive folio as created by mkaf, or the periodic archive log management scripts, or the “record” mode of other PCP tools. | |
| pmcd | The Performance Metrics Collection Daemon (PMCD). This daemon must run on each system being monitored, to collect and export the performance information necessary to monitor the system. | |
| pmdacisco | A Performance Metrics Domain Agent (PMDA) that extracts performance metrics from one or more Cisco routers. | |
| pmdahotproc | A PMDA that exports performance metrics from an instance domain of processes restricted to an interesting or “hot” set. | |
| pmdamailq | A PMDA that exports performance metrics describing the current state of items in the sendmail queue. | |
| pmdashping | A PMDA that exports performance metrics for the availability and quality of service (response-time) for arbitrary shell commands. | |
| pmdasummary | A PMDA that derives performance metrics values from values made available by other PMDAs. | |
| pmdatrace | A PMDA that exports transaction performance metrics from application processes that use the pcp_trace library. | |
| pmdumplog | The pmdumplog command displays selected state information, control data, and metric values from a PCP archive log created by pmlogger. | |
| pmlc | The pmlc command is used to exercise control over an instance of the PCP archive logger pmlogger, to modify the profile of which metrics are logged and/or how frequently their values are logged. | |
| pmlogger | The pmlogger command is used to create PCP archive logs of performance metrics over time. Many tools accept these PCP archive logs as alternative sources of metrics for retrospective analysis. | |
| pmlogextract | The pmlogextract command reads one or more PCP archive logs and creates a temporally merged and reduced PCP archive log as output. | |
| pmtrace | The pmtrace command provides a simple command-line interface to the trace PMDA and its associated pcp_trace library. |
Performance Co-Pilot provides the following tools to support the PCP infrastructure and assist operational procedures for PCP deployment in a production environment:
| cron.* | The cron.pmcheck, cron.pmdaily, and cron.pmsnap scripts are intended for periodic execution via cron to allow you to create a customized regime of administration and management for PCP archive log files and performance snapshots suitable for WWW publishing. | |
| dkmap | The dkmap tool creates a map of disk real estate usage. | |
| dkping | The dkping tool opens the named disk for reading and checks for a response. | |
| dkprobe | The dkprobe tool initializes disk performance metrics at boot time for some IRIX versions. It may be called from /etc/init.d/pcp. | |
| memclaim | The memclaim tool allocates and holds physical memory, simulating a reduction in physical memory. | |
| pmbrand | The pmbrand tool manages the “branded” file of valid PCP licenses. | |
| pmdate | This tool displays the current date and/or time, with an optional offset. | |
| pmdbg | PCP tools include internal diagnostic and debugging facilities that may be activated by run-time flags; pmdbg describes the available facilities and associated control flags. | |
| pmerr | The pmerr command translates PCP error codes into human-readable error messages. | |
| pmlaunch | The pmlaunch configuration directory contains metrics specification formats and a set of scripts for use by tools that are launching, and being launched by, other tools with no knowledge of each other. | |
| pmlock | The pmlock command attempts to acquire an exclusive lock by creating a file with a mode of 0. | |
| pmnewlog | The pmnewlog command is used to perform archive log rotation by stopping and restarting an instance of pmlogger. | |
| pmnsadd | The pmnsadd command adds a subtree of new names into a PMNS, as used by the components of PCP. | |
| pmnscomp | The pmnscomp command compiles a PMNS in ASCII format into a more efficient binary representation. | |
| pmnsdel | The pmnsdel command removes a subtree of names from a PMNS, as used by the components of the PCP. | |
| pmpost | The pmpost command appends the text message to the end of the PCP notice board file (/var/adm/pcplog/NOTICES). | |
| pmrules | The pmrules command provides a graphical user interface for instantiating and editing rules for pmie. | |
| pmstore | The pmstore command re-initializes counters or assigns new values to metrics that act as control variables. The command changes the current values for the specified instances of a single performance metric. |
The following PCP tools aid the development of new programs to consume performance data, and new agents to export performance data within the PCP framework:
| chkhelp | The chkhelp tool checks the consistency of performance metrics help database files. | |
| dbpmda | The dbpmda tool is an interactive debugger for PMDAs. This tool allows PMDA behavior to be exercised and tested. | |
| newhelp | The newhelp tool generates the database files for one or more source files of PCP help text. | |
| PMAPI | The Performance Metrics Application Programming Interface (PMAPI) defines a procedural interface for developing PCP client applications. | |
| pmclient | The pmclient tool is a simple client that uses the PMAPI to report some high-level system performance metrics. The source code for pmclient is included in the distribution. | |
| pmgenmap | The pmgenmap command is a program development tool that generates C declarations and cpp macros to aid the development of customized programs that use the facilities of PCP. | |
| PMDA | A library used by many shipped PMDAs to communicate with a pmcd process. It can expedite the development of new and custom PMDAs. |
The following sections provide a detailed overview of concepts that underpin PCP.
Across all of the supported performance metric domains, there are a large number of performance metrics. Each metric has its own structure and semantics. PCP presents a uniform interface to these metrics, independent of the underlying metric data source.
The Performance Metrics Name Space (PMNS) provides a hierarchical classification of external metric names, and a mapping from external names to internal metric identifiers. See “Performance Metrics Name Space” for a description of the PMNS.
When performance metric values are returned to a requesting application, there may be more than one value instance for a particular metric; for example, independent counts for each CPU, process, disk, or local filesystem. Internal instance identifiers correspond one to one with external (textual) descriptions of the members of an instance domain.
Transient performance metrics (such as per-process information, per-XLV volume, and so on) cause repeated requests for the same metric to return different numbers of values, or changes in the particular instance identifiers returned. These changes are expected and fully supported by the PCP infrastructure; however, metric instantiation is guaranteed to be valid only at the time of collection.
When performance metrics are retrieved, they are delivered in the context of a particular source of metrics, a point in time, and a profile of desired instances. This means that the application making the request has already negotiated to establish the context in which the request should be executed.
A metric source may be the current performance data from a particular host (a live or real-time source), or an archive log of performance data collected by pmlogger at some distant host or at an earlier time (a retrospective or archive source).
By default, the collection time for a performance metric is the current time of day for real-time sources, or current point within an archive source. For archives, the collection time may be reset to an arbitrary time within the bounds of the archive log.
![]() | Note: Performance Co-Pilot 2.0, along with IRIX release 6.5, have been developed to be completely Year 2000 compliant. |
Instrumentation for the purpose of performance monitoring typically consists of counts of activity or events, attribution of resource consumption, and service-time or response-time measures. This instrumentation may exist in one or more of the following functional domains, each with an associated access method (see Figure 1-1):
The IRIX kernel, including sar data structures, per-process resource consumption, network statistics, disk activity, or memory management instrumentation.
A layered software service such as activity logs for a World Wide Web server, or an NNTP news server.
A layered system product. For example, the temperature, voltage levels, and fan speeds from the environmental monitor in a CHALLENGE system, or the length of the mail queue as reported by mqueue.
A DBMS. For example, the V$ views and bstat/estat summaries for ORACLE, the tbmonitor statistics for INFORMIX, or the sp_monitor procedures for Sybase.
External equipment such as network routers and bridges.
An application program. For example, measured response time for a production application running a periodic and benign “probe” transaction (as often required in service quality agreements), or rate of computation and throughput in jobs per hour for a batch stream.
For each domain, the set of performance metrics may be viewed as an abstract data type, with an associated set of methods that may be used to
interrogate the metadata that describes the syntax and semantics of the performance metrics
control (enable or disable) the collection of some or all of the metrics
extract instantiations (current values) for some or all of the metrics
We refer to each functional domain as a Performance Metrics Domain (PMD) and assume that PMDs are functionally, architecturally, and administratively independent and autonomous. Obviously the set of PMDs available on any host is variable, and changes with time as software and hardware are installed and removed.
The number of PMDs may be further enlarged in cluster-based or network-based configurations, where there is potentially an instance of each Performance Metrics Domain on each node. Hence, the management of PMDs must be both extensible at a particular host and distributed across a number of hosts.
Each PMD on a particular host must be assigned a unique PMD identifier. In practice, this means unique identifiers are assigned globally for each PMD type. For example, the same identifier would be used for the IRIX PMD on all hosts.
The performance metrics collection architecture is distributed, in the sense that any performance tool may be executing remotely. However, a PMDA must run on the system for which it is collecting performance measurements. In most cases, connecting these tools together on the collection host is the responsibility of the pmcd process, as shown in Figure 1-2.
The host running the monitoring tools does not require any collection tools, including pmcd, because all requests for metrics are sent to the pmcd process on the collector host. These requests are then forwarded to the appropriate PMDAs, which respond with metric descriptions, help text, and most importantly, metric values.
The connections between monitor clients and pmcd processes are managed in libpcp, below the PMAPI level; see PMAPI(3). Connections between PMDAs and pmcd are managed by the PMDA routines; see PMDA(3). There can be multiple monitor clients and multiple PMDAs on the one host, but there may be at most one pmcd process.
Internally, each unique performance metric is identified by a Performance Metric Identifier (PMID) drawn from a universal set of identifiers, including some that are reserved for site-specific, application-specific, and customer-specific use.
An external name space (the performance metrics name space, or PMNS) maps from a hierarchy (or tree) of external names to PMIDs.
Each node in the name space tree is assigned a label that must begin with an alphabet character, and be followed by zero or more alphanumeric characters or the underscore (_) character. The root node of the tree has the special label of root.
A metric name is formed by traversing the tree from the root to a leaf node with each node label on the path separated by a period. The common prefix root. is omitted from all names. For example, in the small subsection of a PMNS shown in Figure 1-3, the following are valid names for performance metrics:
irix.kernel.percpu.syscall irix.network.tcp.rcvpack hw.router.recv.total_util |
Although a default PMNS is shipped and updated by the components of PCP, individual users may create their own name space for metrics of interest, and all tools may use a private PMNS, rather than the default PMNS.
![]() | Note: For some low-end bundles based on PCP (such as WebMeter in the WebFORCE products), the default PMNS cannot be modified, nor can an alternate PMNS be created. |
In PCP 1.x releases, the PMNS was local to the application that referred to PCP metrics by name. As of release 2.0, PMNS operations are directed to the host or archive that is the source of the desired performance metrics.
Distributed PMNS necessitated changes to PCP protocols between client applications and pmcd, and to the internal format of PCP archive files. PCP release 2.0 is compatible with earlier releases, so new PCP components operate correctly with either new or old PCP components. For example, connections to the PCP 1.x pmcd, or attempts to process a PCP archive created by a PCP 1.x pmlogger, revert to using the local PMNS.
Through the various Performance Metric Domains, the PCP must support a wide range of formats and semantics for performance metrics. This “metadata” describing the performance metrics includes
the internal identifier (Performance Metric Identifier, or PMID) for the metric
the format and encoding for the values of the metric, for example, an unsigned 32-bit integer or a string or a 64-bit IEEE format floating point number
the semantics of the metric, particularly the interpretation of the values as free-running counters or instantaneous values
the dimensionality of the values, in the dimensions of events, space, and time
the scale of values; for example, bytes, KB, or MB for the space dimension
an indication if the metric may have one or many associated values
short (and extended) help text describing the metric
For each metric, this metadata is defined within the associated PMDA, and PCP arranges for the information to be exported to the performance tools applications that use the metadata when interpreting the values for performance metrics.
There are two types of performance metrics, single valued and set valued, described in the following sections.
Some performance metrics have a singular value within their PMD. For example, available memory (or the total number of context switches) has only one value per PMD, that is, one value per host. The metadata describing the metric makes this fact known to applications that process values for these metrics.
Some performance metrics have a set of values or instances in each implementing PMD. For example, one value for each disk, one value for each process, one value for each CPU, or one value for each activation of a given application.
When a metric has multiple instances, the PCP framework does not pollute the name space with additional metric names; rather, a single metric may have an associated set of values. These multiple values are associated with the members of an instance domain, such that each instance has a unique instance identifier within the associated instance domain. For example, the “per CPU” instance domain may use the instance identifiers 0, 1, 2, 3, and so on to identify the configured processors in the system.
Internally, instance identifiers are encoded as binary values, but each PMD also supports corresponding strings as external names for the instance identifiers, and these names are used at the user interface to the PCP utilities.
For example, the performance metric irix.disk.dev.total counts I/O operations for each disk spindle, and the associated instance domain contains one member for each disk spindle. On a system with five specific disks, one value would be associated with each of the external and internal instance identifier pairs shown in Table 1-1.
Table 1-1. Sample Instance Identifiers for Disk Statistics
External Instance Identifier | Internal Instance Identifiers |
|---|---|
dks1d1 | 131329 |
dks1d2 | 131330 |
dks1d3 | 131331 |
dks3d1 | 131841 |
dks3d2 | 131842 |
Multiple performance metrics may be associated with a single instance domain.
Each PMD may dynamically establish the instances within an instance domain. For example, there may be one instance for the metric irix.kernel.percpu.idle on a workstation, but multiple instances on a multiprocessor server. Even more dynamic is irix.filesys.free, where the values report the amount of free space per file system, and the number of values tracks the mounting and unmounting of local filesystems.
PCP arranges for information describing instance domains to be exported from the PMDs to the applications that require this information. Applications may also choose to retrieve values for all instances of a performance metric, or some arbitrary subset of the available instances.
Hosts supporting PCP services are broadly classified into two categories:
Collector—Hosts that have pmcd and one or more Performance Metric Domain Agents (PMDAs) running to collect and export performance metrics.
Monitor—Hosts that import performance metrics from one or more collector hosts to be consumed by tools to monitor, manage. or record the performance of the collector hosts.
Each PCP enabled host can operate as a collector, a monitor, or both.
Performance Co-Pilot provides an infrastructure through the Performance Metrics Collection System (PMCS). It unifies the autonomous and distributed PMDAs into a cohesive pool of performance data, and provides the services required to create generalized and powerful performance tools.
The PMCS provides the framework that underpins the PMAPI, which is described in the Performance Co-Pilot Programmer's Guide. The PMCS is responsible for the following services on behalf of the performance tools developed on top of the PMAPI:
distributed namespace services
instance domain services
coordination with the processes and procedures required to control the description, collection, and extraction of performance metric values from agents that interface to the Performance Metric Domains (PMDs)
servicing incoming requests for local performance metric values and metadata from applications running either locally or on a remote system
The PMCS described in the previous section is used when PMAPI clients are requesting performance metrics from a real-time or live source.
The PMAPI also supports delivery of performance metrics from a historical source in the form of a PCP archive log. Archive logs are created using the pmlogger utility, and are “replayed” in an architecture as shown in Figure 1-4.
The PMAPI has been designed to minimize the differences required for an application to process performance data from an archive or from a real-time source. As a result, all PCP tools support live and retrospective monitoring with equal facility.
Much of the PCP product's potential for attacking difficult performance problems in production environments comes from the design philosophy that considers extensibility to be critically important.
The performance analyst can take advantage of the PCP infrastructure to deploy value-added performance monitoring tools and services. Here are some examples:
Easy extension of the PMCS and PMNS to accommodate new performance metrics and new sources of performance metrics, in particular using the interfaces of a special-purpose library to develop new PMDAs; see PMDA(3).
Use of libraries (libpcp_pmda and libpcp_trace) to aid in the development of new PMDAs to export performance metrics from local applications.
Operation on any performance metric using generalized toolkits.
Distribution of PCP components such as collectors across the network, placing the service where it can do the most good.
Dynamic adjustment to changes in system configuration.
Flexible customization built into the design of all PCP tools.
Creation of new monitor applications, using the routines described in PMAPI(3).