Chapter 5. Introduction to the DynaText Search Language

How Does DynaText Searching Work?

DynaText Search Language

DynaWeb uses the DynaText search engine, whose search language provides you with powerful search capabilities. The DynaText search language defines searchable characters, wildcard characters, and query keywords, along with a set of syntactic rules (syntax) for forming search expressions. By applying the syntactic rules, you can use both search strings and query keywords into query statements that search for patterns and narrow searches to specific portions of a book.

To the DynaText search language, a word is a group of contiguous alphanumeric characters (letters, digits, or both) bounded by spaces or punctuation marks. A phrase is a group of two or more contiguous words, plus any associated punctuation. In this guide, a word or phrase on which you search is a search string.

To improve the efficiency of search queries, each book possesses a method of limiting which words are searchable. This means that searchable words often vary from book to book. If you enter the same search query in books, you may get different results.

The Full-Text Index

The full-text index contains all searchable words and punctuation marks in a book. To find the occurrences of a search word in the book, the DynaText search engine looks up the word in the full-text index.

Any word or punctuation mark that is excluded from the full-text index (unindexed) is unsearchable.

Searchable Characters

The search language performs literal searches on the following types of characters:

  • Alphanumeric (letters and digits): Searches ignore the case of letters in search strings; for example, searching on the word “clock” matches: “clock”, “Clock”, “CLOCK”, and so forth.

  • Indexed punctuation marks): Punctuation marks except spaces.

  • Spaces


Note: Either a space or a punctuation mark can delimit the start or end of a word.


Wildcards

As well as allowing you to search for complete words and phrases, the search language provides you with placeholders called wildcards. A wildcard is a character that acts as a placeholder to represent any character. The DynaText search language provides two wildcards: * and ?. The table below describes these wildcards:

Table 5-1. Wildcard Characters

This...

Matches...

?

any single character

*

any set of 0 or more characters, up to the end of a word.

Within a phrase, you can insert an asterisk bounded by spaces ( * ) to represent any word occurring in a given position. The search engine does not parse words represented by an asterisk. Rather, it arbitrarily matches every word in the same position.


Note: Because wildcards have special meanings in the search language, you cannot search for them.


Hints About Using Wildcards

  • In text searches, the single-character wildcard (?) represents any single alphanumeric character.

  • Within a mark-up tag, the ? wildcard can represent punctuation (this affects context searches only).

  • A search query consisting of only a * finds all the indexed words. If your Web client highlights search hits, entering just an * is a way to get a feel for the unindexed words (those ignored by the search).

  • Where possible, avoid placing a wildcard at the front of a word fragment (for instance, ?agment or *gment).

    Starting a word with a wildcard forces the search engine to look at every word in the full-text index. In a large book, this can take a significant amount of time.

Search Strings

A search string is a grouping of one or more characters that represent one or more words or phrases and the associated spaces and punctuation marks.

When interpreting the characters in a search string, DynaText:

  • Interprets alphanumeric characters, spaces, and some punctuation marks literally, as searchable characters.

    A searchable character is a character on which DynaText searches and finds only that character. Note, however, that DynaText ignores the case of letters.

  • Interprets ? and * as wildcards.

  • Interprets an unindexed word as the * wildcard.

  • Interprets an unindexed punctuation mark as matching any punctuation mark or space.

Query Keywords

The DynaText search language view several common English words as query keywords. A query keyword is a reserved word with a specialized meaning in DynaText queries. Using query keywords, you can create query statements that specify a relationship between search strings or a relationship between one or more search strings and a structural element of a book.

Each of these types of query statement uses a set of keywords, which are listed in the following table:

Table 5-2. Query Keywords

This search type

Uses these keywords

Proximity

after, before, of, within, word, words

Boolean

and, or, not


Punctuation

Interpretation of Indexed Punctuation

Punctuation marks included in the full-text index (indexed punctuation) are searchable. The following punctuation marks are indexable eligible for the full-text index:

~ ^ . , - + / & % $ # @ !

As with words, the actually indexed punctuation can vary from book-to-book.


Note: By default, the period/full stop (.) and comma (,) are omitted from the full-text index and are, therefore, unsearchable.


Hints About Searching for Indexed Punctuation

  1. Indexed punctuation marks: ~ ^ . , - + / & % $ # @ !

    Searching for a word followed by an indexed punctuation mark exactly matches the search word. The punctuation mark delimits the end of the word.

    The next nonpunctuation character is considered to be in a separate word.

  2. Multiple Indexed Punctuation marksUsing more than one indexed punctuation character in a sequence is permissible, and generates expected results.

    Example: ebt//
    

    Finds only: ebt//

  3. Indexed punctuation immediately before a word restricts a search only if the punctuation is preceded by a space.

    Consider the following sample document:

    The experimental operation required the co-operation of several surgeons.
    

    Searching for “*operation” matches “anything operation”; for example, the sample text yields two hits:

    The experimental operation required the co-operation of several surgeons.
    

    In contrast, searching for “-operation” finds nothing.

  4. In text searches, the single-character wildcard (?) cannot represent a punctuation character

Interpretation of Unindexed Punctuation

If you include an unindexed punctuation mark in a search string, matches it to any punctuation character or space in the same position.

test

Finds

test test. test!

In a search, an unindexed punctuation mark is interchangeable with a space.

Some punctuation can never be indexed. For information about unindexable punctuation, see the table below.

Table 5-3.

Reserved Punctuation

Explanation

'”

Single or double quotes around query keywords cause them to be searched on as literal words

*

An asterisk is a wildcard that indicates any set of zero or more characters.

?

A question mark is a wildcard that indicates any single (one and only one) character.

;:

The colon and semi-colon are unindexable

( )

Parentheses control the order of evaluation within a search expression and help to avoid ambiguity in the search expression.

[ ]

Square brackets specify a class of characters. You can use them to search on a set of specific characters

{ }

Curly braces are reserved for use by the search language



Tip: In some cases you can find a word near some or all occurrences of an unindexed punctuation mark and use that word as a way of finding the neighboring punctuation mark. To experiment with this informal work-around, use the following syntax:


“*punctuation-mark*”

where punctuation-mark is some unindexable punctuation mark.

Keep in mind that the results of this form of search are likely to be incomplete.

Basic Searches on Words and Phrases

Searches on Individual Words

Individual word searches are possible only on indexed words.

To search for all occurrences of an indexed word, simply type the word into the Search panel in the Search panel. Capitalizing has no effect, because the search engine assumes that all letters are lowercase. For example, entering the word “orbit” in the Shuttle Press Kit book finds all cases of “orbit”, such as “Orbit”, “ORBIT”, and “orbit”.


Note: DynaText cannot search for text automatically generated by the SGML text-before or text-after properties. For example, for some books, labels such as “Note”, “Warning”, or “Caution” are sometimes defined in stylesheets as a text-before value. In such books, you would be unable to search for the Warning or Caution label.


Exact Word Searches

The most predictable form of word search is a search string containing the exact text of an indexed word. When you search on a word without specifying any punctuation, the search matches the word regardless of the surrounding punctuation marks or a spaces.

Fan

Finds

fan Fan FAN fan.(fan) fan!

The search can match any varient of the word “fan”, with or without adjacent punctuation.

Wildcards in Word Searches

To search on a fragment of a word, you must represent the missing part(s) of the word by inserting one or more wildcards.

Wild cards open up many possibilities for finding matches. They enable you to find all occurrences of words sharing a specific group of letters or digits.

Fan* 

Finds:

fan fanatic Fantasia fantastic

When searching for a wildcarded word fragment, many matches may be possible.

Sh??n

Finds

Sheen, shown, shorn . . .

When searching for the word “Sh??n”, the only possible matches are five-letter words that begin with “sh” and end with “n” .

compl?ment*

Finds:

compliment compliments complementing

When searching for the word “compl?ment”, the possible matches include words with either the root “complement” or “compliment” and, optionally, a suffix such a “s”, “ed”, or “ing”

Hints About Searching for Words

  1. You can search on an individual word only if it is indexed.

  2. Searching for a word without specifying any punctuation

  3. The search matches every instance of the word, regardless of the punctuation marks or a spaces that start and end the word.

    ebt 
    

    Finds:

    EBT ebt. ebt- ebt/ ebt_ 
    

  4. To find an unindexed word, construct a phrase containing that word. If the phrase exists in the book, it will be included among the matches.

Searches on Individual Phrases

A search phrase is a sequence of from two to ten words, at least one of which must be indexable, and the associated punctuation marks. When searching for a phrase, the search engine ignores line breaks.

Exact Phrase Searches

The most predictable form of phrase search is on a phrase containing only indexed words and punctuation.

Fans love shouting!

Finds:

Fans love shouting!
;fans love SHOUTING!

When searching for a phrase containing only indexed items, the possible matches differ only in case.

Wildcards in Phrase Searches

Wild cards open up many the possibilities for finding matching phrases that share:

  • A common structure

  • One or more specified indexed words (or wildcarded fragments thereof)

  • Optionally, unspecified words, each occupying specific position within the phrase and indicated by a * wildcard

  • Optionally, unindexed words, each occupying a specific position within the phrase, and interpreted the same as a * wildcard.

    uni* law*
    

Finds:

Universal Law
uninhibited lawyer
uninviting lawn

When searching for the phrase “uni* law*”, the possible matches vary widely. Adding one more character to either of the search words, would narrow this search considerably; for example: “univ* law”.

bright and beautiful 

Finds

bright and beautiful 
bright sun, beautiful

Searching for the phrase “bright and beautiful”, which contains an unindexed word (“and”), matches any three-word phrase beginning with bright and ending with beautiful.

Hints For Specifying Search Phrases

  1. Make your phrase searches as specific as you can.The more specific the search phrase, the narrower the search.

  2. Avoid including short commonly used words, because they are likely to be unindexable.

    The higher the proportion of unindexable words, the less predictable the results of the search. This is because, the search engine interprets each unindexed word as if it were the * wildcard and matches any word encountered in the same position within the phrase.

Searches on Query Keywords

To prevent the search engine from parsing the string for query keywords, insert single or double quotation around either just around the keyword or around the entire phrase.

The following search phrases are equivalent:

dish “containing” cream 'or' sugar
“dish containing cream or sugar” 

Typically, most of the keywords are unindexed, being commonly used English words. Searching on any unindexed word is the same as using a * wildcard.

Searches Within Equations

Depending on how your publisher has set up the equations in the book, you may be able to search for components of equations. Search for variable names in equations, just as you would search for words in the text of the book.

Restrictions: What You Cannot Search on

You cannot search on any of the following:

  • Unindexed words or punctuationYou cannot search expressly on any unindexed words or punctuation marks.

    See the listing earlier in this chapter.

  • Unquoted keywords

    You cannot search on keywords unless you surround them with single or double quotation marks.

    See “Searching on Query Keywords”.

  • Text-before or text-after values

    DynaText cannot search for text automatically generated by the SGML text-before or text-after properties. For example, for some books, a label such as “Chapter”, “Note”, or “Caution” is defined in stylesheets as a text-before value.