This chapter introduces the features and capabilities of the Oracle Parallel Server (OPS). It explains
OPS configuration
OPS instances and domains
OPS architectural features
OPS components and how they work together
OPS performance and optional Silicon Graphics software
OPS is a collection of Oracle instances running on separate CHALLENGE servers, providing simultaneous access to the same physical database. The physical database is the same as that for an ordinary non-OPS (nonparallel) Oracle RDBMS, except that it has separate redo logs and rollback segments for each instance. The redo log file is a compressed record of changes that a transaction has made.
For the Silicon Graphics platform, OPS is available in a dual-host configuration; each node can access the same shared disk storage. The hosts can be two CHALLENGE DM systems, two CHALLENGE L systems, or two CHALLENGE XL systems. A third host, a 24-bit Indy workstation running the IRISconsole software, functions as the OPS node controller and as a single point of administration for OPS. Figure 1-1 diagrams Silicon Graphics OPS hardware configuration.
Besides the IRISconsole software, the node controller also runs the OPS node control software, opsnc. The OPS node controller opsnc implements a fail-stop mechanism: in the event of a private network partition (a private network failure that results in the OPS instances being isolated from each other), only one OPS instance is permitted to continue providing service. The other instance is forced to crash and must be restarted by the system administrator. See “Starting OPS” in Chapter 2 for instructions.
Each OPS instance consists of the following software components:
Oracle RDBMS processes, including PMON, SMON, DBWR, LGWR
OPS Distributed Lock Manager processes: dlmmon/dlmd
OPS allows multiple hosts access to the same shared physical database. The Distributed Lock Manager (DLM) has the functionality that enables sharing. The Distributed Lock Manager program consists of two processes, dlmd and dlmmon.
OPS Connection Manager process: opscm
The OPS Connection Manager opscm implements a heartbeat protocol across both hosts to detect host and private network failures, monitors local DLM processes to detect lock manager failure, and provides a sync service to coordinate recovery for host failure and reintegration.
Figure 1-2 diagrams OPS software configuration.
An Oracle instance consists of a system global area (SGA) and a set of server processes that access the physical database located on disk. The SGA is a section of shared memory accessed by each of the OPS server processes in an instance. In an OPS system, multiple Oracle instances, each with its own SGA, server processes, redo log files, rollback segments, the Distributed Lock Manager (DLM) processes, and the Connection Manager (CM) process access the same physical database. These constitute one domain.
DLM domains are numbered starting with 0; DLM instances are numbered 0 and 1. Figure 1-3 diagrams an OPS configuration with a single domain.
In the configuration shown in Figure 1-3, the DLM domains are
0,0 (DLM domain 0, domain instance 0)
0,1 (DLM domain 0, domain instance 1)
Figure 1-4 diagrams an OPS configuration with two domains. In this configuration, the DLM domains are
0,0 (DLM domain 0, domain instance 0)
0,1 (DLM domain 0, domain instance 1)
1,0 (DLM domain 1, domain instance 0)
1,1 (DLM domain 1, domain instance 1)
In any of these cases, all Oracle instances that mount the same database must use the same domain.
Major architectural features of OPS are
high availability
High availability is provided at multiple levels:
If a node fails, the database is still accessible from the surviving node.
If the CHALLENGE RAID storage system is used, RAID-5 provides tolerance to any single point of failure within the RAID.
Each redo log file can be mirrored, so that an instance can survive failure of a log file.
consolidation of database administration, using the Indy workstation as node controller
high performance
OPS utilizes the full power of CHALLENGE system memory and its high-speed system bus performance. Operating system enhancements include changes to virtual memory for more efficient multiprocessing, raw I/Os, multiprocess networking, and process scheduling. Besides these enhancements, IRIX already supports real-time scheduling, CPU affinity, and, for the 64-bit OS, CPU partitioning (the ability to steer interrupts to specific CPUs), which are critical for DBMS performance.
distributed locks
Row-level locking, the finest level of locking granularity, minimizes the amount of data contention between transactions and maximizes concurrency. Oracle Parallel Server extends this feature by allowing multiple transactions on different nodes to lock and update different rows of any table in the database.
Row-level locking is independent of the parallel cache manager's use of distributed lock, which is used to keep the SGAs consistent with each other. Row-level locking is achieved by Oracle's internal concurrency control architecture. For distributed locking, the parallel cache manager uses a special background process, the LCNO process, which requests locks from the Distributed Lock Manager. The DLM is not used for row-level locking; thus its use is minimized and performance is enhanced.
For more information on Oracle RDBMS and OPS operation, consult its documentation.
OPS allows Oracle7 instances running on the two nodes to access a common Oracle database. This design allows users on multiple systems seamless access to common data, so that more computing resources are available to all applications that access the same database.
OPS is designed to allow any node in the cluster to be brought down, either voluntarily or involuntarily, without interrupting access to the database from the other nodes. If an Oracle instance or a system node fails, users from the failed node can migrate to the other running node and reconnect to the database.
Table 1-1 outlines the required and optional software for OPS.
Table 1-1. OPS Hardware and Software
Hardware | Required Software | Optional Software |
|---|---|---|
Two CHALLENGE DM, L, or XL servers (OPS hosts) | IRIX 5.3 with XFS | Performance Co-Pilot (PCP) |
CHALLENGE RAID storage system with two storage-control processors (SPs) and at least five disk modules |
|
|
IRISconsole: Indy workstation, with Silicon Graphics ST-1600 serial port multiplexer, standard Silicon Graphics CD-ROM, and cables | IRIX 5.3 with XFS |
|
Optional Vault storage enclosures (Vault M, L, or XL, respectively) |
|
|
Figure 1-5 diagrams an example OPS installation with storage systems.
The rest of this section describes specific components of OPS:
IRISconsole
CHALLENGE RAID storage system
XFS filesystem
Database Accelerator (DBA)
IRISconsole runs on a 24-bit Indy workstation with a minimum of 32 MB of memory and 20-inch display. For OPS, it is made up of the following:
IRISconsole software, including a graphical user interface, running under IRIX 5.3 with XFS
an IRISconsole ST-1600 multiplexer, including cabling connecting the Indy workstation to the ST-1600
a pair of serial cables included in the IRISconsole package, plus one additional pair, for connecting the two OPS hosts to the ST-1600
The IRISconsole software monitors each OPS host (node) through the host's Remote System Control and console ports via serial connection to the ST-1600 serial port server (multiplexer). If a node fails, IRISconsole can automatically start procedures defined by the OPS system administrator in addition to the failover procedures provided for in the OPS software.
![]() | Note: For full OPS and IRISconsole functionality, the Remote System Control and System Console ports on the CHALLENGE DM, L, or XL must be cabled to ports on the ST-1600. |
The IRISconsole software enables the administrator to
display, view, or take control of the console of an OPS host (or other attached system)
view real-time graphs of hardware operating statistics of an OPS host, such as voltage, operating temperature, and blower speeds; save the graphs as files and display them
set a threshold for operating statistics so that an alarm is activated when the threshold is reached and various activities can be triggered
view console activity logs and other system reports
For complete information on the IRISconsole, see the documentation:
IRISconsole Administrator's Guide (007-2872-00x)
IRISconsole ST-1600 Multiplexer Installation Guide (007-2839-00x)
The CHALLENGE RAID (Redundant Array of Inexpensive Disks) storage system provides a compact, high-capacity, high-availability source of disk storage for OPS in the form of multiple disk drive modules that you can replace when the storage system is powered on (hot-replaceable). Each CHALLENGE RAID storage system supports from five to twenty disk modules in groups of five.
The CHALLENGE RAID storage system supports RAID level 5: a group of five disk modules is bound together into a logical unit (LUN). A RAID–5 group maintains parity data that lets the disk group survive a disk module failure without losing data. In addition, in a CHALLENGE RAID storage system configured for OPS, the RAID-5 group can survive a single SCSI–2 internal bus failure, because each disk module in the group is bound on an independent SCSI–2 internal bus.
Through the storage-control processors (SPs), the SCSI–2 bus is split into five internal fast/narrow SCSI buses—A, B, C, D, and E—that connect the slots for the disk modules. For example, internal bus A connects the modules in slots A0, A1, A2, and A3, in that order. Figure 1-6 diagrams this configuration.
For OPS, the CHALLENGE RAID storage system must have two SPs. Each SP controls disk modules in groups of five. The second processor provides a second path to the disk modules as part of the failover strategy of OPS; see Figure 1-6. Each LUN is controlled by one of the SPs. The non-controlling SP takes over a LUN if its controlling SP fails.
In addition, both SPs are required for storage system caching to work: each processor temporarily stores modified data in its memory and writes the data to disk at the most expedient time.
For complete information on the CHALLENGE RAID storage system, see the CHALLENGE RAID-5 Owner's Guide (007-2532-00x).
XFS is a journaled filesystem that allows for extremely fast recovery time of filesystem structures during reboot. Recovery of XFS filesystems is independent of filesystem size. For this reason, XFS is particularly useful for OPS operation.
On a traditional UNIX® filesystem, a full filesystem check takes an amount of time proportional to the size of the filesystem. On XFS, the recovery time is in the seconds, because it is dependent upon system activity level, rather than filesystem size. Using XFS speeds up the time required to bring a failed node back online.
For complete information on the XFS filesystem, see the Getting Started With xFS Filesystems (007-2549-00x). This document is viewable in InSight.
The Database Accelerator (DBA) consists of kernel enhancements designed to boost performance specifically for Oracle. These kernel enhancements can help double the performance of write-intensive benchmarks, such as TPC-[AB], or building very large indexes for real-life applications. The kernel enhancements are as follows:
Postwait driver, a kernel software driver, provides very fast multithreaded synchronization mechanism for Oracle processes. It replaces the standard SVR4 mechanism of semaphore, which is too slow for the high TPS rate.
Kernel list I/O, an OS enhancement, allows the Oracle database writer to flush modified buffers to disks efficiently: a single Oracle database writer can flush at least 2000 buffers per second to disk drives. With only one system call, the database writer can initiate multiple writes to all disk drives in the system. Without this functionality, Oracle database writer performance would have to use shadow processes, thus incurring the overhead of process synchronization; another limitation would be the single-threaded nature of making one system call per disk write.
This section briefly explains how the following optional Silicon Graphics software products can enhance OPS performance:
PCP (Performance Co-Pilot)
IRIXpro
IRIX NetWorker 4.1.1
The Performance Co Pilot (PCP) provides a suite of tools for performance monitoring and performance management services across the spectrum of performance domains—hardware platforms, the operating systems, the DBMS, and the applications.
PCP runs in a client/server configuration: PCP agents (clients) monitor domains and send information to the PCP server, which graphically displays the information on the workstation. PCP can be used to monitor Oracle and system activity on both nodes in the OPS cluster.
IRIXpro is a suite of tools for the professional systems administrator. Applications included in IRIXpro are
Propel: software environment distribution and file management
Provision: distributed system monitoring
Problema: request desk
Proclaim: network configuration server
In particular, Propel can be used to transfer software from one OPS host to the other.
IRIX NetWorker reliably protects files against loss across an entire network of systems. NetWorker saves valuable administrator time by speeding and simplifying daily backup operations. As NetWorker backs up data, it creates a database of the saved data, making it easy to locate a file for recovery. Furthermore, as the network and number of files expand, NetWorker has the capacity and performance to handle the load.
The IRIX NetWorker 4.1.1 OS includes extended support for autochangers (jukeboxes and tape libraries), and archiving and retrieval capability. Its ability to back up raw files particularly makes it particularly suitable for use with OPS, since all Oracle files are XLV raw devices.
For complete information on NetWorker 4.1.1, see the documentation:
IRIX NetWorker Administrator's Guide (007-1458-0x0)
IRIX NetWorker User's Guide (007-2304-00x)
These documents are viewable in Insight.