This chapter includes important information that you should be aware of before you start NQE.
For information about compatibility issues and differences that your users may experience after upgrading to NQE 3.3, see the NQE Release Overview, publication RO-5237 3.3.
The checkpointing function is supported on IRIX 64-bit systems; NQE 3.3 requires IRIX release 6.4.1 or later for the checkpointing function to work.
![]() | Note: The new environment variable NQE_SHEPHERD_PID was added so that the qchkpnt(1) command will work on 64-bit IRIX systems. NQE_SHEPHERD_PID is added to the initiated job's environment only on 64-bit IRIX systems. The value of NQE_SHEPHERD_PID is the shepherd PID for the job. |
Checkpoint and restart operations work by default on IRIX 6.4.1 or later systems for jobs that have the following two characteristics:
The job was submitted using the default two-shell invocation (see section 5.3 in the NQE User's Guide, publication SG-2148)
Direct job output was requested when the job was submitted (if the command-line interface was used, this means the -ro option was used with either the qsub(1) or cqsub(1) command)
Jobs that do not have these characteristics cannot be restarted when using NQE as installed by default. In order for these jobs to be restartable, you must change the permissions on the $NQEBASE/spool/private directory from 700 to 711. The permission change must be made manually after the system has been installed and configured.
![]() | Caution: Changing these permissions may open a security hole and allow any user access to the internal NQE directories. If a user inadvertently or maliciously alters the files located there, running jobs may fail and NQE operations may be corrupted. Sites that are willing to accept the security risk in order to provide checkpoint and restart operations to all jobs should make the permission modification. |
This problem will be corrected in a future release of NQE and the IRIX checkpoint and restart software, allowing all jobs to be checkpointed while still maintaining proper security.
The -Rf option is now supported on the cqsub(1) command. The -Rf option forces the request to be restarted from a checkpoint image. This option is supported only on UNICOS systems.
The ilbrc file in the nqebase/etc directory may need to be modified, depending on your system configuration. This configuration file controls how ilb behaves when logging in to a remote machine. Currently, ilbrc contains references to /usr/bin/telnet and /usr/bin/rlogin. The administrator should ensure that these paths are correct on each of the machines on which NQE is installed.
![]() | Note: This was also a dependency for accessing the NQE 3.1 and 3.2 online documentation using the Cray DynaWeb server application that is provided with the NQE release package. |
A Cray DynaWeb server is required to access the following NQE 3.3 online documentation:
NQE Release Overview, publication RO-5237 3.3
NQE Installation, publication SG-5236 3.3
Introducing NQE, publication IN-2153 3.3
NQE User's Guide, publication SG-2148 3.3
NQE Administration, publication SG-2150 3.3
For additional information, see the Cray DynaWeb documentation that is included with your NQE 3.3 release package.
![]() | Note: This was also a dependency for the NQE 3.1 and 3.2 releases. |
This patch fixes the following problem:
s700_800 9.x mkdir -p will not cross a read-only NFS mount. |
The patch can be accessed through the Hewlett-Packard Support Line Services World Wide Web page at the following URL:
http://us.external.hp.com |
![]() | Note: This was also a dependency for the NQE 3.1 and 3.2 releases. |
![]() | Note: This was also a dependency for the NQE 3.1 and 3.2 releases. |
To work around the problem, do not attempt to restart msqld until the MSQL_TCP_PORT (603) has been released by TCP/IP. Use the following command to verify that the port is available.
# netstat -a | grep 603 |
You should not see any entries of the following forms, which indicate that the MSQL_TCP_PORT 603 is still in use:
latte.603 latte.974 8192 0 8192 0 TIME_WAIT localhost.603 localhost.974 8192 0 8192 0 LAST_ACK |
![]() | Note: If the mSQL port number is defined in /etc/services, the previous command may not show the active port because the name of the port appears as the service name. In this case, you should use the netstat -a | grep msql command. |
![]() | Note: This was also a dependency for the NQE 3.1 and 3.2 releases. |
![]() | Note: This was also a dependency for the NQE 3.1 and 32. releases. |
In addition, the following NLB attributes are not meaningful for UNICOS/mk systems:
| NLB_A_SWAPPING |
| NLB_SWAPPING |
| NLB_SWAPSIZE |
| NLB_SWAPFREE |
| NLB_FREEMEM |
| NLB_A_FREEMEM |
![]() | Note: This was also a dependency for the NQE 3.1 and 3.2 releases. |
These commands and absolute path names are used in the ftp interface and in the USCP interface to NQS. Sites that want to use the ftp or USCP interface to NQS must create symbolic links from /nqebase/bin to /usr/bin. This can be done after NQE is installed.
For the ftp interface, to create symbolic links from /nqebase/bin to /usr/bin, enter the following to delete the old commands and create links to the version available in NQE 3.3:
rm /usr/bin/qsub rm /usr/bin/qstat rm /usr/bin/qdel ln -s /nqebase/bin/qsub /usr/bin/qsub ln -s /nqebase/bin/qdel /usr/bin/qdel ln -s /nqebase/bin/qstat /usr/bin/qstat |
For the USCP interface, to create symbolic links from /nqebase/bin to /usr/bin, enter the following to delete the old commands and create links to the version available in NQE 3.3:
rm /usr/bin/qsub rm /usr/bin/qstat rm /usr/bin/qdel rm /usr/bin/qmsg ln -s /nqebase/bin/qsub /usr/bin/qsub ln -s /nqebase/bin/qdel /usr/bin/qdel ln -s /nqebase/bin/qstat /usr/bin/qstat ln -s /nqebase/bin/qmsg /usr/bin/qmsg |
![]() | Note: This was also a dependency for the NQE 3.1 and 3.2 releases. |
This path change affects both administrators and end users of NQS, FTA, and NQX. Users must be notified of the new command location so their user environments can be changed to access the commands from /nqebase/bin. For example, this command path change will affect user cron jobs, job submission scripts, and any user programs that reference NQS, FTA, or NQX commands.
The system files that set up user environments can be modified to add /nqebase/bin to the default path. The modules package can be used to set up the appropriate path to the NQE commands. See Chapter 16, “Using modules with NQE”, for more information about the modules package.
![]() | Note: This was also a dependency for the NQE 3.1 and 3. 2 releases. |
The NQS qmgr(8) command provides log file segmentation at NQS startup and periodically as desired. For further information, see the qmgr(8) man page.
![]() | Note: This was also a dependency for the NQE 3.1 and 3.2 releases. |
![]() | Note: This was also a dependency for the NQE 3.1 and 3.2 releases. |
NQE support of DCE/DFS does not include support for an installation running DFS-only file space. The NQE spool and binary trees, among other components, must reside in UNIX file space.
![]() | Note: On UNICOS systems, this release of NQE supports only DCE/DFS version 1.1. |
UNICOS and IRIX systems must have the DCE integrated login feature enabled in order for DCE credentials to be passed through NQE. For further information, see Section 14, “Configuring DCE/DFS” in NQE Administration, publication SG-2150 and Cray DCE Client Services/Cray DCE DFS Server Release Overview, publication RO-5225.
For IBM AIX 4.2 systems to run NQE 3.3 with DCE/DFS, you must install the IBM APAR ix59568 patch; otherwise, system crashes may occur, unless DCE/DFS is disabled. AIX customers may obtain the patch by calling 1-800-CALLAIX and requesting the fix for APAR ix59568.
![]() | Note: This patch is not needed when running NQE 3.3 without DCE/DFS on IBM AIX 4.2 systems. |
On HP-UX systems, NQE is configured to provide DCE authentication by adding the NQE_AUTHENTICATION variable, set to dce, in the nqeinfo file. If this is done on an HP-UX system that does not have DCE installed, NQE jobs initiated on this HP-UX system abort. The following mail message is sent to the job owner:
Request aborted via a signal. Request deleted. Aborting signal was: 6 |
The following message is written into the NQS log file:
/lib/dld.sl: Can't find path for shared library: libc_r.sl |
If this occurs, stop NQE on the HP-UX system and remove the NQE_AUTHENTICATION variable from the nqeinfo file on the HP-UX system. After you restart NQE, NQE jobs on this non-DCE HP-UX system will be run without this error.
As of the NQE 3.0 release, the PostScript file containing the Flexible License Manager End User Manual is no longer provided with NQE. For more information on FLEXlm, access the GLOBEtrotter Software, Inc., World Wide Web page at the following URL:
http://www.globetrotter.com |
Also, you can order the Flexible License Manager End User Manual from GLOBEtrotter Software, Inc., or from the Cray Research Distribution Center.