Chapter 2. Installing and Configuring Performance Co-Pilot

The sections in this chapter describe the basic installation and configuration steps necessary to run Performance Co-Pilot (PCP) on your systems. The following major sections are included:

Product Structure

In a typical deployment, Performance Co-Pilot would be installed in a collector configuration on one or more hosts, from which the performance information could then be collected, and in a monitor configuration on one or more workstations, from which the performance of the server systems could then be monitored.

PCP is packaged into a number of basic subsystem types that reflect the functional role of the product components. These subsystems may be installed using inst or swmgr:

  • core— The pcp_eoe.sw.eoe and pcp.sw.base subsystems must be installed on every PCP enabled host, that is, on both PCP monitor and PCP collector systems.

  • monitor—The pcp_eoe.sw.monitor and pcp.sw.monitor subsystems must be installed on every PCP monitor host. Subsystems pcp_eoe.books.help and pcp.books.help should be installed to provide help support for the GUI monitoring tools; see sgihelp(1).

  • collector—No additional installation is required because the performance monitor collector daemon (pmcd) is in the pcp_eoe.sw.eoe subsystem.

  • demo— The pcp.sw.demo subsystems provide source code for example applications and PMDAs that serve as templates for developing new modules to extend the PCP coverage of performance metrics or the capabilities of monitoring tools.

  • other— The other pcp.sw.* subsystems provide the support for the optional PMDAs, and when required, need to be installed on the PCP collector host, and subsequently configured before they become active.

  • gifts— The pcp_gifts.sw.* subsystems provide optional applications and services that may be individually installed as required.

  • documentation— The pcp.man.* and pcp.books.* subsystems provide release notes, reference pages, interactive tutorials, and IRIS InSight books, and may be installed as needed.

For complete information on the installable software packages, see the Performance Co-Pilot release notes, available through the relnotes or grelnotes commands.

Optional Software

The capabilities of your PCP installation may be extended with added performance metrics or visual tools that are available as add-on products, sold separately from the base Performance Co-Pilot product.

For example, PCP add-on products support the following:

  • Several common database packages. These optional products provide customized PMDAs (performance metric domain agents) and visual analysis tools for the ORACLE, Sybase, and INFORMIX database systems.

  • World Wide Web server performance monitoring.

  • Platform-specific extensions for Silicon Graphics array and IRIS FailSafe products.

In addition, WebMeter is a bundled package that supports reduced PCP functionality and services for World Wide Web servers.

License Constraints

On PCP monitoring systems, all of the display, visualization, and automated reasoning tools are licensed using “nodelocked” FLEXlm licenses. On PCP collector systems, the Performance Metrics Collection Daemon (PMCD) is also licensed using “nodelocked” FLEXlm licenses. Refer to the PCP release notes for details.

The other PCP tools and services (for example, the PMDAs or pmlogger) may be installed and executed without license constraints.

Some of the PCP maintenance tools for updating the PMNS, interrogating the PMCS, dumping an archive log, and so on, are not constrained by any license restrictions.

Using pmbrand to Query PCP License Capabilities

The pmbrand command manages the /var/pcp/pmns/Brand file, which contains binary information about PCP capabilities enabled by the various valid licenses on the system. If you are unsure of the license status for a particular host, pmbrand verifies and prints the current license information on that system, producing output similar to the following:

/usr/pcp/bin/pmbrand -l 
Licenses for system 690794d70
  PCP Collector
  PCP Monitor

Performance Metrics Collector Daemon (PMCD)

On each PCP collector system, you must be certain that the pmcd daemon is running. This daemon co-ordinates the gathering and exporting of performance statistics in response to requests from the PCP monitoring tools.

Starting and Stopping the PMCD Daemon

To start the daemon, enter the following commands as root on each PCP collector system:

chkconfig pmcd on 
/etc/init.d/pcp start 

These commands instruct the system to start the daemon immediately, and again whenever the system is booted. It is not necessary to start the daemon on the monitoring system unless you wish to collect performance information from it as well.

To stop pmcd immediately on a PCP collector system, enter the command

/etc/init.d/pcp stop 

Restarting an Unresponsive PMCD Daemon

Often, if a daemon is not responding on a PCP collector system, the problem can be resolved by stopping and then immediately restarting a fresh instance of the daemon. If you need to stop and then immediately restart pmcd on a PCP collector system, use the start argument provided with the script in /etc/init.d. The command syntax is

/etc/init.d/pcp start 

On startup, pmcd looks for a configuration file named /etc/pmcd.conf. This file specifies which agents cover which performance metrics domains and how pmcd should make contact with the agents. A comprehensive description of the configuration file syntax and semantics can be found in the pmcd(1) reference page.

If the configuration is changed, pmcd reconfigures itself when it receives the SIGHUP signal. Use the following command to send the SIGHUP signal to the daemon:

killall -HUP pmcd 

This is also useful when one of the PMDAs managed by pmcd has failed or has been terminated by pmcd. Upon receipt of the SIGHUP signal, pmcd restarts any PMDA that is configured but inactive.

PMCD Diagnostics and Error Messages

If there is a problem with pmcd, the first place to investigate should be the pmcd.log file. By default this file is in the /var/adm/pcplog directory, although setting the environment variable PCPLOGDIR before running /etc/init.d/pcp allows the file to be relocated.

PMCD Options and Configuration Files

There are two files that control PMCD operation. These are the /etc/pmcd.conf and /etc/config/pmcd.options files. The pmcd.options file contains the command-line options used with PMCD; it is read when the daemon is invoked by /etc/init.d/pcp. The pmcd.conf file contains configuration information regarding domain agents and the metrics that they monitor. These configuration files are described in the following sections.

The pmcd.options File

Command-line options for the PMCD are stored in the /etc/config/pmcd.options file. The PMCD can be invoked directly from a shell prompt, or it can be invoked by /etc/init.d/pcp as part of the boot process. It is usual and normal to invoke it using /etc/init.d/pcp, reserving shell invocation for debugging purposes.

The PMCD accepts certain command-line options to control its execution, and these options are placed in the pmcd.options file when /etc/init.d/pcp is being used to start the daemon. The accepted options are listed below:

-f  

This option causes the PMCD to be run in the foreground. The PMCD is usually run in the background, as are most daemons.

-i address 

For hosts with more than one network interface, this option specifies the interface on which this instance of the PMCD accepts connections. Multiple -i options may be specified. The default in the absence of any -i option is for PMCD to accept connections on all interfaces.

-l file 

This option specifies a log file. If no -l option is specified, the log file name is pmcd.log and it is created in the directory /var/adm/pcplog or in a directory as specified by the PCPLOGDIR environment variable.

-t seconds 

This option specifies the amount of time, in seconds, before PMCD times out on PDU exchanges with PMDAs. If no time out is specified, the default is five seconds. Setting time out to zero disables time outs.
The time out may be dynamically modified by storing the number of seconds into the metric pmcd.control.timeout using pmstore.

-T mask 

This option specifies whether connection and PDU tracing are turned on for debugging purposes.

See the pmcd(1) reference page for complete information on these options.

The default pmcd.options file shipped with PCP is similar to the following:

# command-line options to pmcd, uncomment/edit lines as required
# longer timeout delay for slow agents
# -t 10
# suppress timeouts
# -t 0
# make log go someplace else
# -l /some/place/else
# enable event tracing (1 for connections, 2 for PDUs, 3 for both)
# -T 3

The most commonly used options have been placed in this file for your convenience. To uncomment and use an option, simply remove the pound sign (#) at the beginning of the line with the option you wish to use. Restart pmcd for the change to take effect. That is, as superuser, enter the command

/etc/init.d/pcp start 

The pmcd.conf File

When the PMCD is invoked, it reads its configuration file, which is /etc/pmcd.conf. This file contains entries that specify the PMDAs (Performance Metric Domain Agents) used by this instance of the PMCD and which metrics are covered by these PMDAs. Also, you may specify access control rules in this file for the various hosts on your network. This file is described completely in the pmcd(1) reference page.

With standard operation of Performance Co-Pilot (even if you have not created and added your own PMDAs), you might need to edit this file in order to add any access control you wish to impose. If you do not add access control rules, all access for all operations is granted to all hosts. The default pmcd.conf file shipped with PCP is similar to the following:

# Name  Id   IPC   IPC Params   File/Cmd
irix    1    dso   irix_init    libirixpmda.so
pmcd    2    dso   pmcd_init    pmda_pmcd.so
proc    3    dso   proc_init    pmda_proc.so


Note: Because the PMCD runs with root privilege, you must be very careful not to configure PMDAs in this file if you are not sure of their action. Pay close attention that permissions on this file are not inadvertently downgraded to allow public write access.

Each entry in this configuration file contains rules that specify how to connect the PMCD to a particular PMDA and which metrics the PMDA monitors. A PMDA may be attached as a DSO or using a socket or a pair of pipes. The distinction between these attachment methods is described below.

An entry in the pmcd.conf file looks like this:

label_name   domain_number   type   path 

The label_name field specifies a name for the PMDA. The domain_number is an integer value that specifies a domain of metrics for the PMDA. The type field indicates the type of entry (DSO, socket, or pipe). The path field is for additional information, and varies according to the type of entry.

The following rules are common to DSO, socket, and pipe syntax:

label_name 

An alphanumeric string identifying the agent.

domain_number 

An unsigned integer specifying the agent's domain.

DSO entries follow this syntax:

label_name domain_number dso entry-point pathname 

The following rules apply to the DSO syntax:

dso 

The entry type.

entry-point 

The name of an initialization function called when the DSO is loaded.

path 

Designates the location of the DSO. If path begins with a slash (/), it is taken as an absolute path specifying the DSO; otherwise, the DSO is located in one of the directories /usr/pcp/lib or /var/pcp/lib.

Socket entries in the pmcd.conf file follow this syntax:

label_name domain_number socket addr_family address command [args] 

The following rules apply to the socket syntax:

socket 

The entry type.

addr_family 

Specifies if the socket is AF_INET or AF_UNIX. If the socket is INET, the word “inet” appears in this place. If the socket is UNIX, the word “unix” appears in this place.

address 

Specifies the address of the socket. For INET sockets, this is a port number or port name. For UNIX sockets, this is the name of the PMDA's socket on the local host.

command 

Specifies a command to start the PMDA when the PMCD is invoked and reads the configuration file.

args 

Optional arguments for command.

Pipe entries in the pmcd.conf file follow this syntax:

label_name domain_number pipe protocol command [args]

The following rules apply to the pipe syntax:

pipe 

The entry type.

protocol 

Specifies whether a text-based or a binary PCP protocol should be used over the pipes. Values for this parameter may be “text” and “binary.” The text-based protocol is provided for backwards compatibility, but otherwise its use is discouraged.

command 

Specifies a command to start the PMDA when the PMCD is invoked and reads the configuration file.

args 

Optional arguments for command.

Controlling Access to PMCD With pmcd.conf

There is an option extension you can place in the pmcd.conf file to control system access to performance metric data. To add an access control section, begin by placing the following line at the end of your pmcd.conf file:

[access] 

Below this line, you can add entries of the following forms:

allow hostlist : operations ; 
disallow hostlist : operations ; 

The hostlist is a comma-separated list of host identifiers; the following rules apply:

  • Hostnames must be in the local system's /etc/hosts file or known to the local DNS (domain name service).

  • IP addresses may be given in the usual four-field numeric notation. Subnet addresses may be specified using three or fewer numeric components and an asterisk as a wildcard for the last component in the address.

For example, the following hostlist entries are all valid:

whizkid
gate-wheeler.eng.com
123.101.27.44
localhost
155.116.24.*
192.*
*

The operations field can be any of the following:

  • A comma-separated list of the operation types described below.

  • The word “all” to allow or disallow all operations as specified in the first field.

  • The words “all except” and a list of operations. This entry allows or disallows all operations as specified in the first field except those listed.

The operations that can be allowed or disallowed are as follows:

fetch 

This operation allows retrieval of information from the PMCD. This may be information about a metric (such as a description, instance domain, or help text) or an actual value for a metric.

store 

This operation allows the PMCD to store metric values in PMDAs that permit store operations. Be cautious in allowing this operation, because it may be a security opening in large networks, although the PMDAs shipped with the PCP product typically reject “store” operations, except for selected performance metrics where the effect is benign.

For example, here is a sample access control portion of an /etc/pmcd.conf file:

allow whizkid :  all ; 
allow 192.127.4.* : fetch ; 
disallow gate-inet : store ; 

Complete information on access control syntax rules in the pmcd.conf file can be found in the pmcd(1) reference page.

Managing Optional PMDAs

Some PMDAs (Performance Metrics Domain Agents) shipped with PCP are designed to be installed and activated on every collector host, for example, irix, pmcd, and proc.

Other PMDAs are designed for optional activation and require some user action to make them operational. In some cases these PMDAs expect local site customization to reflect the operational environment, the system configuration, or the production workload. This customization is typically supported by interactive installation scripts for each PMDA.

Each PMDA has its own directory located below /usr/pcp/pmdas or /var/pcp/pmdas. In each directory a README file describes the metrics provided by the PMDA; a Remove script to unconfigure the PMDA, remove the associated metrics from the PMNS, and restart the pmcd daemon; and an Install script to install the PMDA, update the PMNS, and restart the pmcd daemon.

PMDA Installation

To install a PMDA you must perform a collector installation for each host on which the PMDA is required to export performance metrics. Because the PMNS is distributed as of PCP release 2.0, it is no longer necessary to install PMDAs with their associated PMNS on PCP monitor hosts.

Installation on a PCP Collector Host

You need to update the PMNS, configure the PMDA, and notify PMCD. The Install script for each PMDA automates these operations, as follows:

  1. Log in as root (the superuser).

  2. Move to the PMDA's directory. For example:

    cd /var/pcp/pmdas/cisco 
    

  3. In the unlikely event that you wish to use a non-default PMD (Performance Metrics Domain) assignment, determine the current PMD assignment:

    cat domain.h 
    

    Check that there is no conflict in the PMDs as defined in /var/pcp/pmns/stdpmid and the other PMDAs currently in use (listed in /etc/pmcd.conf). Edit domain.h to assign the new domain number if there is a conflict.

  4. Enter the command

    ./Install 
    

    You may be prompted to enter some local parameters or configuration options. The script applies all required changes to the control files and to the PMNS, and then notifies PMCD. The sample output below is illustrative of the interactions:

    You will need to choose an appropriate configuration for installation of the “cisco” Performance Metrics Domain Agent (PMDA).
      collector collect performance statistics on this system
      monitor   allow this system to monitor local and/or remote systems
      both      collector and monitor configuration for this system
    Please enter c(ollector) or m(onitor) or b(oth) [b] collector 
    Cisco hostname or IP address? [return to quit] wanmelb 
    A user-level password may be required for Cisco “show int” command.
        If you are unsure, try the command
            $ telnet wanmelb
        and if the prompt “Password:” appears, a user-level password is
        required; otherwise answer the next question with an empty line.
    User-level Cisco password? ******** 
    Probing Cisco for list of interfaces ...
    Enter interfaces to monitor, one per line in the format
    tX where “t” is a type and one of “e” (Ethernet), or “f” (Fddi), or “s” (Serial), or “a” (ATM), and “X” is an interface identifier which is either an integer (e.g.  4000 Series routers) or two integers separated by a slash (e.g. 7000 Series routers).
    The currently unselected interfaces for the Cisco “wanmelb” are:
        e0 s0 s1
    Enter “quit” to terminate the interface selection process.
    Interface? [e0] s0 
    The currently unselected interfaces for the Cisco “wanmelb” are:
        e0 s1
    Enter “quit” to terminate the interface selection process.
    Interface? [e0] s1 
    The currently unselected interfaces for the Cisco “wanmelb” are:
        e0
    Enter “quit” to terminate the interface selection process.
    Interface? [e0] quit 
    Cisco hostname or IP address? [return to quit]  
    Updating the Performance Metrics Name Space (PMNS) ...
    Installing pmchart view(s) ...
    Terminate PMDA if already installed ...
    Installing files ...
    Updating the PMCD control file, and notifying PMCD ...
    Check cisco metrics have appeared ... 5 metrics and 10 values
    

PMDA Removal

To remove a PMDA, you must perform a collector removal for each host on which the PMDA is currently installed. Because the PMNS is distributed as of PCP release 2.0, it is no longer necessary to remove PMDAs or their associated PMNS on PCP monitor hosts.

Removal on a PCP Collector Host

You need to update the PMNS, unconfigure the PMDA, and notify PMCD. The Remove script for each PMDA automates these operations, as follows:

  1. Log in as root (the superuser).

  2. Move to the PMDA's directory. For example:

    cd /var/pcp/pmdas/environ
    

  3. Enter the command

    ./Remove 
    

    The following output illustrates the result:

    Culling the Performance Metrics Name Space ...
    environ ... done
    Updating the PMCD control file, and notifying PMCD ...
    Removing files ...
    Check environ metrics have gone away ... OK
    

Troubleshooting

The following sections offer troubleshooting advice.

Advice for troubleshooting the archive logging system is provided in Chapter 7, “Archive Logging.”

Troubleshooting the Performance Metrics Name Space (PMNS)

To display the PMNS, use the pminfo command; see pminfo(1).

The PMNS at the collector host is updated whenever a PMDA is installed or removed, and may also be updated when new versions of the PCP or PCP add-on products are installed. During these operations, the ASCII version of the PMNS is typically updated, then the binary version is regenerated.

Missing and Incomplete Values for Performance Metrics

Missing or incomplete performance metric values is the result of their unavailability.

Metric Values Not Available

The following symptom has a known cause and resolution:

Symptom: 

Values for some or all of the instances of a performance metric are not available.

Cause: 

This can occur as a consequence of changes in the installation of modules (for example, a DBMS or an applications package) that provide the performance instrumentation underpinning the PMDAs. Changes in the selection of modules that are installed or operational, along with changes in the version of these modules, may make metrics appear and disappear over time.

In simple terms, the PMNS contains a metric name, but when that metric is requested, no PMDA at the collector host supports the metric.

For archive logs, the collection of metrics to be logged is a subset of the metrics available, so utilities replaying from a PCP archive log may not have access to all of the metrics available from a “live” (PMCD) source.

Resolution: 

Make sure the underlying instrumentation is available and the module is active. Ensure that the PMDA is running on the host to be monitored. If necessary, create a new archive log with a wider range of metrics to be logged.

Troubleshooting IRIX Metrics and the PMCD

The following issues involve IRIX and the PMCD:

  • no IRIX metrics available

  • cannot connect to remote PMCD

  • PMCD not reconfiguring after hangup

No IRIX Metrics Available

The following symptom has a known cause and resolution:

Symptom: 

Some of the IRIX metrics are unavailable.

Cause: 

PMCD (and therefore the IRIX PMDA) does not have permission to read /dev/kmem, or the running kernel is not the same as the kernel in /unix.

Resolution: 

Check /var/adm/pcplog/pmcd.log. An error message of the form

kmeminit: cannot open "/dev/kmem": ... 

means that PMCD cannot access /dev/kmem. Ensure that /dev/kmem is readable by group sys. For example, you should see something similar to this:

ls -lg /dev/kmem 
crw-r-----   1 sys    1,  1 May 28 15:16 /dev/kmem

Restart PMCD after correcting the group and/or file permissions, and the problem should be solved.

If the running kernel is not the same as the kernel in /unix, the IRIX PMDA cannot access raw data in the kernel. A message like this appears in /var/adm/pcplog/pmcd.log:

kmeminit: "/unix" is not namelist for the running kernel

The only resolution to this is to make the running kernel the same as the one in /unix. If the running kernel was booted from the filesystem, then renaming files to make /unix the booted kernel and restarting PMCD should resolve the problem. If the running kernel was booted over the network, then PMCD cannot access the kernel's symbol table and hence the metrics extracted by reading /dev/kmem directly are not available.

Cannot Connect to Remote PMCD

The following symptom has a known cause and resolution:

Symptom: 

A PCP client tool (such as pmchart, dkvis, or pmlogger) complains that it is unable to connect to a remote PMCD (or establish a PMAPI context), but you are sure that PMCD is active on the remote host.

Cause: 

To avoid hanging applications for the duration of TCP time outs, the PMAPI library implements its own time out when trying to establish a connection to a PMCD. If the connection to the host is over a slow network, then successful establishment of the connection may not be possible before the time out, and the attempt is abandoned.

Resolution: 

Establish that the PMCD on far-away-host is really alive, by connecting to its control port (TCP port number 4321 by default):

telnet far-away-host 4321 

This response indicates the PMCD is not running and needs restarting:

Unable to connect to remote host: Connection refused 

To restart the PMCD on that host, enter the following command:

/etc/init.d/pcp start 

This response indicates the PMCD is running:

Connected to far-away-host 

Interrupt the telnet session, increase the PMAPI timeout by setting environment PMCD_CONNECT_TIMEOUT to some number of seconds (60 for instance), and try the PCP tool again.

PMCD Not Reconfiguring After SIGHUP

The following symptom has a known cause and resolution:

Symptom 

PMCD does not reconfigure itself after receiving the SIGHUP signal.

Cause: 

If there is a syntax error in /etc/pmcd.conf, PMCD does not use the contents of the file. This can lead to situations in which the configuration file and PMCD's internal state do not agree.

Resolution: 

Always monitor PMCD's log. For example, use the following command in another window when reconfiguring PMCD, to watch errors occur:

tail -f /var/adm/pcplog/pmcd.log 

PMCD Does Not Start

The following symptom has a known cause and resolution:

Symptom: 

If the following messages appear in the PMCD log (/var/adm/pcplog/pmcd.log), consider the cause and resolution below:

pcp[27020] Error: OpenRequestSocket(4321) bind: Address already in use 
pcp[27020] Error: pmcd is already running 
pcp[27020] Error: pmcd not started due to errors! 

Cause: 

PMCD is already running or was terminated before it could clean up properly. The error occurs because the socket it advertises for client connections is already being used or has not been cleared by the kernel.

Resolution: 

Start PMCD as root (superuser) by typing:

/etc/init.d/pcp start 

Any existing PMCD is shut down and a new one is started in such a way that the symptomatic message should not appear.

If you are starting PMCD this way and the symptomatic message appears, there has been a problem with the connection to one of the deceased PMCD's clients. This could happen when the network connection to a remote client is lost and PMCD is subsequently terminated. The system may attempt to keep the socket open for a time to allow the remote client a chance to re-establish the connection and read any outstanding data. The only solution in these circumstances is to wait until the socket times out and the kernel deletes it. This netstat command displays the status of the socket and any connections:

netstat -a | grep 4321 

If the socket is in the FIN_WAIT or TIME_WAIT states, then you must wait for it to be deleted. Once the command above produces no output, PMCD may be restarted. Less commonly, you may have another program running on your system that uses the same internet port number (4321) that PMCD uses.

Refer to PCPIntro(3) for a description of how to override the default PMCD port assignment using the environment variable PMCD_PORT.