Chapter 3. Working With Applications

Many characters used in non-English languages, such as ê, ã, and ö, are not part of the ASCII codeset, and some applications do not properly recognize non-ASCII characters entered by a user. Applications fall roughly into three types, according to their abilities to handle such symbols:

This chapter discusses the behavior of each type of application as well as the behavior of a few specific applications.

Files Containing Non-ASCII Text

Here are some miscellaneous notes about non-ASCII text and how it interacts with the system and applications:

  • Many applications and system utilities allow you to create files that contain non-ASCII characters. You can use cp or rcp to copy such files.

  • Unless you use MediaMail or a similar application, you cannot directly mail files that contain non-ASCII characters. First you must use the uuencode(1) command to convert the files to a form that can be safely sent through electronic mail. See “Using Mail With Non-ASCII Text” for a discussion of how to use uuencode. Also note that user names and machine names—and everything else in the mail header—must be limited to ASCII characters in order to be recognized by mailers along the mail path.

  • You can use cat or more in xwsh or xterm to view files that contain non-ASCII characters; just be sure to set an appropriate locale and use an appropriate font. (See the man pages for xwsh(1G) and xterm(1) to find out how to change fonts in those programs.)

  • You can create files whose names contain non-ASCII characters using the Bourne shell (sh), the C shell (csh), and most X-based applications. Most graphical applications let you use a file dialog box to select such files, even though some GL-based applications may not allow you to enter non-ASCII symbols from the keyboard.

Problems With X-based and Mixed-Model Applications

X Window System-based applications query the X server to find out which symbols are engraved on each key of the keyboard you are using. The X protocol and library were designed to use non-ASCII symbols, so most X applications recognize keys with non-ASCII symbols. You may, however, encounter a few problems with some fonts and some applications under the X Window System. This section describes two such problems.

Characters Displayed As Spaces

In order to correctly display text that contains non-ASCII symbols in an X-based application, you must use fonts that support the codeset you need. If you try to display text that contains non-ASCII symbols using a font that doesn't include them, any non-ASCII symbol may drop out. For example, the string

Ich bin müde.

might (depending on the font) be displayed like this:

Ich bin m de.

or like this:

Ich bin mde.

Most X fonts support the entire Latin-1 character set, so most applications correctly display text with Latin-1 symbols. To display characters from the other ISO 8859 encodings (Latin-2, Latin-3, and so on), you need other fonts that use those encodings.

Inaccessible Alt Gr Characters

Some keys on some keyboards have more than two symbols associated with them, as shown in Figure 3-1. You can usually access the third or fourth symbol by holding down the Alt Gr (Alternate Group, also known as the mode switch) key while pressing the key for the desired character.

Figure 3-1. A key with an Alt Gr-accessible character on it


Most X-based applications handle keys with three or four symbols correctly. There are a few X-based applications, however, which do not correctly recognize more than two symbols on a given key, regardless of what those symbols might be. Pressing Alt Gr and a key in such an application does not generate any input.

You can use xmodmap(1) to rebind an unused key (such as a function key) to generate an otherwise inaccessible symbol in such an application. For example, the command:

IRIS% xmodmap -e "keysym F12 = ntilde Ntilde"

modifies the F12 key symbol (keysym) to generate ñ and Shift-F12 to generate Ñ in any X-based application. Most, though not all, applications correctly recognize characters generated by remapped keys.

If an application seems to have trouble finding the third or fourth character on a key, try using xmodmap to remove all other keysyms from the key that generates the Mode_switch keysym.

A Small Sampling of Silicon Graphics Applications

CASEVision tools are based on IRIS IM and provide basic internationalization support. The jot editor allows entry of non-ASCII symbols and displays them correctly if you use a Latin-1 font.


Note: IRIS IM is Silicon Graphics' port of OSF/Motif.


IRIS Showcase™

You can enter non-ASCII symbols into an IRIS Showcase™ (version 2.1.2 or later) document using the regular keyboard keys. All IRIS Showcase™ documents can contain non-ASCII symbols.

You can import text files that contain non-ASCII symbols if you have an appropriate font in which to display those symbols.

Other Applications

This section discusses the behavior of some specific applications.

FrameMaker

FrameMaker 3.1X and later correctly recognizes and uses non-ASCII symbols entered using keyboards which contain such symbols.

FrameMaker does not, however, recognize more than two symbols on a key. See “Inaccessible Alt Gr Characters” for some tips on working around this problem.

FrameMaker also has its own mechanism for entering non-ASCII characters with an American keyboard. For instructions, refer to the FrameMaker documentation.

UniPress and GNU Emacs

Both of the widely-available types of emacs have problems with characters in the upper half of the ISO character sets (that is, 8-bit characters).

Both UniPress and GNU emacs display non-ASCII symbols as 3-digit octal numbers. For example, they both display the string \351 instead of the character é when editing a file. Also, both kinds of emacs are erratic in interpreting non-ASCII keystrokes; some keys are simply ignored, while other keys are interpreted as emacs commands.

Terminal Emulators

The terminal emulators xwsh and xterm accept input of non-ASCII characters and correctly display text that contains non-ASCII symbols, provided that you use an appropriate font. For details on font selection, see the manual pages for these programs.

System Utilities and Shell-Based Applications

This section describes the non-ASCII-related capabilities and known bugs of the IRIX shells, a few commands (file, ar, and tar), the vi editor, and mail.

Using Shells

The C shell (csh) and Korn shell (ksh) correctly recognize and handle non-ASCII symbols. The Bourne shell (sh) handles non-ASCII characters in filenames correctly most of the time. A few bugs occur when you use wildcards on filenames with non-ASCII symbols in the Bourne shell, but these are minor and easily worked around. In most cases, wildcard characters correctly match non-ASCII as well as ASCII characters.

The file Command

The file command tries to determine the type of a specified file based on its contents. file can't tell the difference between text files containing Latin-1 and text files containing Latin-2 (or any of the other ISO 8859 encodings); as far as file is concerned, they're all 8-bit text. file can tell the difference, however, between ASCII files and 8-bit text files.

Creating Archives with ar and tar

You can create and read ar or tar archives that contain files with names that include non-ASCII symbols, but you may not be able to retrieve such archives on other manufacturers' systems or under earlier releases of IRIX. In other words, you can create such archives for backup purposes, but be careful about distributing them for extraction on other vendors' machines if their ar or tar commands support only ASCII characters.

The vi Editor

vi uses the LANG environment variable to determine character class. As a result, vi may not display non-ASCII characters in the C locale. To edit a file that contains non-ASCII characters, you may have to set the LANG environment variable to a non-ASCII locale (such as en_US or fr).

Using Mail With Non-ASCII Text

Most UNIX systems, including IRIX, use the Simple Mail Transfer Protocol (SMTP) to send electronic mail from system to system. The protocol specifies that data is sent in 7-bit bytes. The eighth bit of any transmitted byte is stripped off and ignored.

When you compose a message, most mail programs allow you to enter non-ASCII characters. But since all non-ASCII codesets use 8-bit bytes, any non-ASCII characters in your message are converted to essentially random ASCII characters by the time the message is received.

The fact that nearly every version of sendmail strips the eighth bits from mail makes it difficult to come up with a Silicon Graphics-specific solution. Even if Silicon Graphics violated SMTP protocols by modifying its system to pass 8-bit characters, any non-SGI (or older SGI) systems that the mail passed through would strip the eighth bit. The UNIX community is well aware of this problem, and a working group is designing a solution.

SGI's MediaMail software automatically encodes 8-bit characters into a 7-bit format when you send a message containing such characters, and it automatically decodes those characters at the other end. However, if you use MediaMail to send 8-bit characters to someone who isn't using MediaMail, the recipient must decode the message by hand.


Note: To make MediaMail handle 8-bit characters properly, you have to set the textpart_charset variable using the Options menu. See the MediaMail documentation for more information.

If you are sending a brief message that contains only a few non-ASCII symbols, you can use some of the common pure ASCII substitutions instead. For example, it is common to use “e:” to denote the `ë' character in Internet mail.

Using uuencode to Encode Non-ASCII Symbols in Mail

If you are transmitting a larger document that contains many non-ASCII symbols, you can encode the document with uuencode before you send it. The uuencode program converts any binary data into a 7-bits-per-byte form that can be safely sent through electronic mail.

On the uuencode command line, you must specify the filename to be used when the file is decoded on the remote system. For example, the command:

IRIS% uuencode message.new < message > message.uu

encodes the contents of message into message.uu. When the uudecode program unpacks the message on a remote system, it stores the decoded message in a file named message.new.

Once you have encoded a file using uuencode, you can use the standard mechanism provided by your mail program to include it in a mail message. When you receive an encoded message, you can run it through uudecode to get the original file.

Refer to the uuencode(1) and uudecode(1) manual pages for a complete description of these commands. Refer to the documentation for your mail program to learn how to incorporate a file into a mail message.