This chapter tells you how to employ advanced techniques when designing a system with Tracker. It covers the following topics:
Using dates
Customizing resources
Using the exec functions
Importing data
Preparing translation scripts
Date and time information is an important part of bug and enhancement request tracking systems, as it is for most database applications. The date on which a request is submitted, the due date for resolving the problem, and the actual fix date are just a few of the uses for date and time values in Tracker-based applications.
Tracker provides the date field type to support date and time data storage and manipulation. Date values are stored in Tracker databases, used in queries, displayed, and entered in a variety of formats.
Date field values represent a point in time to the nearest second. They do not represent a time interval such as six seconds or two days. Because they rely on UNIX time data types and function calls, date values represent times only since January 1st, 1970. Date values are ordered from this starting point, the lowest value, and increase toward later dates. Therefore, a date that occurs after a second date will have a higher value.
Internally, Tracker represents dates as integers—a convenient form for storage and retrieval. For data entry purposes, however, other formats are preferred.
Tracker provides a wide variety of date input formats. Dates can be supplied with as little information as the year or month, or specified to the nearest second. You can enter dates as a base time point plus or minus a time interval. You can also specify expressions for the current year, month, day, hour, or second.
Table 6-1 provides an example of the various values possible for a given date, along with the Tracker interpretation.
Table 6-1. Date Interpretation Examples
Date Input | Tracker Interpretation |
|---|---|
7/28/93 | Tue Jul 28 00:00:00 PDT 1993 |
July 28, 1993 | Tue Jul 28 00:00:00 PDT 1993 |
July 28 1993 | Tue Jul 28 00:00:00 PDT 1993 |
28-July-92 | Tue Jul 28 00:00:00 PDT 1993 |
July 28 10:00 1993 | Tue Jul 28 10:00:00 PDT 1993 |
10:00 July 28 1993 | Tue Jul 28 10:00:00 PDT 1993 |
July 28 | Tue Jul 28 00:00:00 PDT 1993 |
July 28 10PM | Tue Jul 28 22:00:00 PDT 1993 |
July 28 10PM EDT | Tue Jul 28 19:00:00 PDT 1993 |
July 1993 | Wed Jul 1 00:00:00 PDT 1993 |
July | Wed Jul 1 00:00:00 PDT 1993 |
1993 | Wed Jan 1 00:00:00 PST 1993 |
10PM | Tue Jul 28 22:00:00 PDT 1993 |
10:30:59 PM | Tue Jul 28 22:30:59 PDT 1993 |
today | Tue Jul 28 00:00:00 PDT 1993—Start of current day. Equivalent to “this day.” |
now | Tue Jul 28 11:10:14 PDT 1993—Current time to the nearest second. |
this year | Wed Jan 1 00:00:00 PST 1993—Start of current year. |
this month | Wed Jul 1 00:00:00 PDT 1993—Start of current month. |
today + 30:00:00:00 | Thu Aug 27 00:00:00 PDT 1993—30 days from today. |
July 28 - 72:00:00 | Sat Jul 25 00:00:00 PDT 1993—72 hours before July 28th of the current year. |
Now let's examine the rules Tracker uses to understand date input.
The year can be expressed as a 4-digit number or as a 1- or 2- digit number representing the year in the current century. For example, “92” means 1993, “44” is illegal (1944 is before 1970), and “1” is illegal now, but anytime after the beginning of the next century will mean “2001.”
The month can be either a number from 1 to 12 or a month name or a 3-letter month abbreviation. Case is not considered.
The day of the week can be included in date input, but is not used to interpret the date. In fact, the day of the week is ignored in date input. The day of the week can be a day of the week name or a 3-letter abbreviation.
The syntax for specifying the time of day is:
HH [ :MM [ :SS ] ] [ am_or_pm ] [ time zone ] |
The minute and second are optional, as are the AM or PM designator and the time zone. If neither AM nor PM is included in the time of day, the hour is interpreted on a 24-hour basis. If a time zone is not specified, the current local time zone is used. The time zone can be either a 3-letter time zone name or a 4-digit offset from GMT (now known as Coordinated Universal Time or UTC).
The recognized time zones are shown in Table 6-2.
Table 6-2. Time Zone Interpretation
Time Zone | Difference in Hours from GMT |
|---|---|
GMT (Greenwich Mean Time) | 0 |
GST (Greenwich Standard Time) | 0 |
GDT (Greenwich Daylight Time) | 0 |
EST (Eastern Standard Time) | -5 |
EDT (Eastern Daylight Time) | -4 |
CST (Central Standard Time) | -6 |
CDT (Central Daylight Time) | -5 |
MST (Mountain Standard Time) | -7 |
MDT (Mountain Daylight Time) | -6 |
PST (Pacific Standard Time) | -8 |
PDT (Pacific Daylight Time) | -7 |
BST (British Summer Time) | 1 |
MET | 1 |
EET | 2 |
JST | 9 |
You can specify a date in relative terms by adding or subtracting a time interval from a given base date. The adjustment always follows the base date and is specified as follows:
+ | - [ [ [ DD: ] HH: ] MM: ] SS |
Date display formats are as various as the input formats. Tracker designers and end users can customize the date display formats to meet their individual needs.
Tracker uses cftime(3C) to format date values for display. The environment variable CFTIME is used to alter the date display format. See the cftime(3C) man page for details on setting CFTIME environment variable.
Tracker uses these operators to compare date values: <, <=, =, >=, >, and <>. These are especially useful in DML. When a date field is compared with a literal date value, the granularity of the literal date value controls the granularity of the comparison. For example, consider the DML statement:
select * from tracker_request where due_date = July 1993; |
The literal date value is specified only to the month. The query will therefore match all due_date values anytime during that month, that is, from July 1 to July 31. Consider the statement:
select * from tracker_request where due_date = July 28 1993; |
The date value is a specific day and will match due dates from midnight to 23:59:59 of that day.
If the date is fully specified, as in:
select * from tracker_request where due_date = Jul 29 09:25:27 1993 |
then the match must be exact.
These date rules also apply to DML and the Tracker GUI.
Date values must be enclosed within single quotes when entered into PDL. For example:
due_date.setValue('now + 30:00:00:00');
|
You need not use quotation marks for date values in DML. Date values can be quoted using the explicit typing conventions for DML, for example:
select * from tracker_request where reopen_date = date '10:30PM'; |
However, using quotes around date values is sometimes necessary to enable the DML interpreter to parse the statement correctly.
As is true for any X11 application, CASEVision/Tracker applications such as rtsquery have a large number of resources that control their appearance and behavior. Default values for these resources are provided in an application defaults file, stored in the directory
/usr/lib/X11/app-defaults |
If you want to change any of these values for your own environment, you can set new values for them in your own resource file:
~/.Xdefaults |
However, handling resources in Tracker applications is slightly harder than handling other X11 applications. The widget names, and even the name of the application itself, are generally customized by the system designer, so this guide cannot refer to them directly.
X11 resource customization is a complex subject. This section is not a complete guide. One of the best sources for further information is the Xlib Programming Manual, by Adrian Nye, Volume One of the O'Reilly X Window System Series.
Your first step in customizing resources is to determine the name of the application, which is the name of the command that you type to bring up the application. This name also appears in the window title bar when you run the application. The main application shipped with Tracker is named rtsquery, and that name is used throughout this section. You can also use a name of your choice.
Everything in X11 has two names. The “name” mentioned above is actually the instance name; there is also a class name. The class name is sometimes shared by several instances (similar to a family name). This is more often true of widget and resource names than of application names; it is never true of Tracker application names.
Each auxiliary view (accessed from the main view's Views menu) in an application has an instance name, which appears in the title bar. The class name of an auxiliary view is the class name of the IRIS IM™ widget used to implement the view: XmForm; the auxiliary view class name is not related to its instance name. (IRIS IM is Silicon Graphics's port of the industry-
standard OSF/Motif™ for use on Silicon Graphics systems.)
All of the individual widgets that make up the application also have names, both instance and class. The system designer may have assigned names to some of the application's widgets in the PDL file that defines the application.
You can use the UNIX command more to look at the application file, /usr/local/bin/rtsquery, and see these names. Tracker automatically assigns names to any widgets that are not specifically named by the designer. The generated names are not easy to predict, so they will probably not be useful to you.
Finally, the resources have both instance names and class names. The instance name of a resource tells you what it actually does; the class name generally tells you what kind of information it is. For example, the resource useSmallFonts (instance name) controls whether rtsquery uses the smaller of its two sets of fonts. The class name is boolean (as is true for many other resources), which tells you that it takes values such as True and true (meaning use small fonts), or False and false (meaning do not use small fonts).
Names, especially the application names, are used throughout X11. In this section only two uses are discussed.
You can use the application class name as the name of the application defaults file. Since rtsquery's class name is Rtsquery, its application defaults file is in /usr/lib/X11/app-defaults/Rtsquery.
You can also use all of these names to set resource values. To set a resource, you name:
the widget for which you want to set the resource
the name of the resource itself
the value to which you want the resource set
If the resource is used only by one widget, or if you want to set it for all widgets that use it, you can omit the widget name; all widgets using that resource will pick up the set value.
Since there are many widgets in an application, several widgets often end up with the same name. To clarify which one you want, you can specify the widget hierarchy that contains the particular widget. Widgets are arranged inside one another, like a set of bowls in graduated sizes, and this structure is directly reflected in the PDL text that you find in the application file. A large widget makes up each whole window (called the view) and contains all of the other widgets. Arranged inside and completely filling the view are several display widgets, which create the sashes that allow you to change their height. Inside each display widget are other widgets: control-bars, rows, and so on. If you look at your application and the application file side by side, you should have no trouble matching the features and identifying the names that the system designer has assigned.
To set a resource for some widgets of a given kind, but not for others of the same kind, you need to specify the names of some of the containing widgets, as well as the name of the widget you actually wish to control. This is done with a limited sort of wild-carding. For example, to exactly specify a particular widget, you can specify the application name and all the widget names down to the end, as in:
app.widgOne.widgTwo.widgThree.resource: value |
Or, more commonly, you can leave out some or all of the widget names, replacing them with an asterisk:
app*widgThree.resource: value |
Since an application defaults file is only used by the “right” applications, you don't need to specify even the application name. Most resources are specified like this:
*widgThree.resource: value |
You can use this simple form in your ~/.Xdefaults file too, as long as you are sure no other applications will use the value. But it is generally safer to include the application name.
You can also set resources for all data of a particular type, which includes data in lists that do not have widgets. To do this, you must concatenate the type name (from the PDL fields section) and the resource itself:
Rtsquery*file_fileDisplayStyle: \ FileDisplayVobStorage |
Resources that you set in your own resource file usually take precedence over settings in the application defaults file. However, precedence is also based upon exactly how the resource is specified. To override a setting from the application defaults file, follow these steps:
Copy the line from the application defaults file into your .Xdefaults file.
Add the application name before the asterisk.
Change the value (the part after the colon).
Reload your resources, either by logging out or by typing
xrdb -load ~/.Xdefaults |
Restart the application.
This section lists the resources you'll most likely want to set in a Tracker application.
Where appropriate, the description will mention which kinds of widgets use this resource. Where this is not mentioned, the resources control the behavior of the entire application.
By default, if the file ~/.<appname>-query (or the file specified by the resource *startupQueryFileName) is found upon startup, the query stored there (by the application's Query menu “Save As Default” item) is executed immediately so that the application comes up displaying its results. Setting this resource to False prevents this.
All widgets that display data from the database obey this resource. If invalid data is entered into such a widget during an edit and this resource is set to true, a highlight is drawn around the field and a dialog is popped up to explain the error.
All widgets that display data from the database obey this resource. If invalid data is entered into such a widget during a query and this resource is set to true, a highlight is drawn around the field and a dialog is popped up to explain the error.
All widgets that display data of type file obey this resource. File names that name a file in a CASEVision/ClearCase Versioned Object Base (VOB) can be displayed in two formats. The default format, FileDisplayVobMount, is a standard ClearCase version-extended pathname, which includes the file system path to the file and the version information, for example:
/vobs/CASE/usr/src/foo.c@@/main/37 |
This format is familiar to most users. If displayed on a system where the VOB is mounted in another location, it will show the new location. If displayed on a system without the VOB mounted, or even without ClearCase installed, the string displayed to the last person who edited the file will be used.
The other display format, FileDisplayVobStorage, shows where the VOB is actually located, for example:
(vobhost:/storage/CASE)/usr/src/foo.c@@/main/37 |
This presentation can be useful to administrators investigating user complaints that the first form names a file that doesn't exist.
This resource controls the number of entities retrieved from the database each time the query results pane is updated. Setting this resource higher than the default may improve the scrolling performance of query results. Setting it too high may increase the delay before the first query results are displayed.
This resource applies to the layout of the field value options in the GUI. If the number of options is 25 (the default) or fewer, they appear on a rollover submenu reached from the “Values” menu item on the field menu. If there are more than 25 options, Tracker displays them in a scrolling list. You can set the default at which the scrolling list appears to any number you choose.
This setting means that you must click the mouse button in a field before you can enter data. Setting it to pointer means that you can type into whichever field is under the mouse pointer.
This setting means that you can move from one input field to another using <Ctrl Tab> (move forward) and <Shift Tab> (move backward). <Tab> also moves you forward from single line fields, but in multiline fields it merely inserts a tab. Setting it to False means you must move the mouse pointer to another field to enter data in it.
A scheme is a set of coordinated colors, fonts, and other properties. A number of schemes are provided in the CASEVision environment (installed along with Tracker). You select which scheme your CASEVision applications use by setting this resource. The default scheme, Lascaux, matches many other IRIX tools. You can preview the other schemes with cvscheme, which is also a part of CASEVision. (For more information, see the cvscheme man page or the CASEVision Environment Guide.)
The Tracker PDL provides many built-in facilities for implementing applications. It supports data, process, and GUI definition. The PDL does not, however, provide everything an application designer might need to implement Tracker applications. Some applications may require resources external to the PDL. The exec functions let you access external resources, thus allowing hybrid applications to be constructed.
The exec functions are PDL functions that provide access to UNIX commands and the DML language. The next section examines the exec functions for UNIX command access.
Two exec functions provide access to UNIX commands. They are execCommand and execFilter. execCommand executes a UNIX command and returns its exit status. execFilter also executes a UNIX command, but it returns the standard output of the command instead of the exit status.
The execCommand and execFilter functions are very useful in implementing Tracker applications, and the next example uses execFilter to demonstrate this. Suppose you define a field, id_string, that you wish to contain a short identifying string for each bug. You want this field to be filled in automatically upon submission of the bug report with the $ENTITY_ID field value and the submitter field value, as follows:
4-john |
This cannot be accomplished with the standard methods provided for PDL fields, but you can use execFilter to execute the UNIX command echo and save the result into the id_string field. The PDL source to do this looks like this:
actions {
id_string.setValue(
execFilter(“/bin/echo $ENTITY_ID-$submitter”));
}
|
Both execFilter and execCommand make the values of all fields available in environment variables. Thus $ENTITY_ID and $submitter reference environment variables hold the values of the two fields $ENTITY_ID and submitter respectively. Both also store all long-text field values in temporary files. The name of the temporary file is made available in an environment variable named <fieldname>_file, and the value of the environment variable fieldname is set to the special value ! (exclamation point) to signify the field's special treatment.
Both execFilter and execCommand make other information available in environment variables. The variable FIELD_LIST contains the names of all the request fields. The variable MODIFIED_FIELDS contains the names of all fields whose values have been changed by the current transition, either by user actions or by PDL methods like setValue. For those fields in the MODIFIED_FIELDS list, the old values are also available in variables named <fieldname>_old, or in the temporary file named in the <fieldname>_old_file variable for long-text fields.
For an example usage of the execCommand function, see the RTS main application PDL file, /usr/Tracker/RTS/Tracker.pdl. This file uses execCommand to execute the shell script rts_notify. See the shell script for examples of using environment variables to access field data.
The execSelect function is used to execute a DML select statement against the Tracker database from PDL. Its argument is the complete select statement text, and its return value is the result of the query.
This example of execSelect usage appears in /usr/Tracker/RTS/Tracker.pdl:
tempShortText.
setValue(execFilter('echo “select manager from project
where name = ”$project”;”'));
owner.setValue(owner.isSet ?
owner.value :
(project.isSet ?
owner.setValue(execSelect(tempShortText.value)) :
owner.setValue(execFilter('/bin/echo $bboard'))));
|
In this example, execSelect retrieves the name of the project manager from the database. The request is then conditionally assigned to the project manager. First the select statement itself is constructed using execFilter with echo to insert the project field's value into the select statement text.
The result of this execFilter is stored in a temporary scratch field and then later used as the argument to execSelect. The result of the execSelect is then used to set the value of the owner field.
The execSelect function is useful only when the result of the query is a single field value from a single request. If the select statement results in more than one request being selected, then execSelect returns no value. If more than one field is selected, only the first field mentioned in the select statement is returned.
Most users do not have the luxury of creating a new request tracking system. By the time you decide you need a product like Tracker, you already have a collection of defect reports. Once you have formalized your tracking process into a Tracker system, you will need to copy the information from your old system into your new Tracker system.
If your old system is on-line and capable of producing text files in a fairly consistent format, you should be able to build a tool to import your data into your Tracker system. This section presents one such tool, which was used at Silicon Graphics to import the entire Silicon Graphics bug history, over 40,000 bugs, from a previous system into our Tracker system. The example uses real data, including one of the actual bugs filed against Tracker during early development (now fixed).
You may find it easy to do your import in a two-step process:
Move the data into Tracker.
Use dml(1) interactively to clean up any minor problems that arise during import.
This is usually quicker than polishing your import script until it's flawless and loading and reloading the database while you test it. You can make your script do the entire job, if the effort seems justified. The script presented here is in continuous unattended use, importing data on an hourly basis from the old system (which is still in use in some parts of Silicon Graphics). Some of the lessons learned while developing this script may save you some trouble in developing your script.
The basic approach you'll use is to translate your data into the Tracker Data Manipulation Language (DML). DML is similar to many fourth generation database languages, such as SQL. It lets you enter, examine, and modify the data in the database. You can write an application that reads your current database directly, translates the data into DML, and feeds the DML to the interpreter, dml. More likely, however, you'll need to go through an intermediate text format, as outlined here.
First take a look at the DML aspects that are particularly important during an import; if you're able to translate directly from one to the other, that may be all you will need.
When importing data, it's best to explicitly type the data. This is an optional DML feature because most data values are automatically recognized as belonging to a particular type; however, explicit typing, can clear up ambiguities, such as the difference between short-text and long-text values. More importantly, while you're importing data from another system, some data might not match expectations—either Tracker's expectations, or the expectations you had when you wrote the import script. Explicit typing helps to ensure that such surprises don't slip past you into the Tracker database. Even if you take the approach recommended above, you'll at least want to know that there is a problem.
Explicit typing is accomplished by enclosing the data item in single quotes (apostrophes), and preceding it with the type name, that is, the name used to declare the field in the PDL file. Both long-text and short-text string types already require the surrounding quotes; you need to add only the type-name.
Your first attempt to import a large database may not be flawless, but you should be able to import a majority of your data easily, and then focus on the few reports with difficulties. You don't want to discard the reports you've inserted correctly, yet you don't want to turn one real report into two records in the database.
You should focus your attention on getting the import script right, rather than on remembering exactly which reports worked and which failed. The DML includes a very useful construct for this sort of situation, the insert ... where statement (this is not in standard SQL).
A normal insert statement creates a new record unconditionally, which is fine for the first time, but not so good when you're doing updates. The update ... where statement updates records already in the database, but does nothing if the record described in the where clause doesn't already exist. If only these two statements were available, you would be required to know the exact state of your database at all times and to adjust your import process accordingly. This is inconvenient during an import.
The insert ... where statement solves the problem. It's a synthesis of the insert statement and the update statement: if one or more records can be found that match the where clause, then it behaves as an update statement, changing the existing record(s) and creating no new ones. If, however, no such record exists, insert ... where behaves as an insert statement, creating one new record with all the values included in the statement.
When you're importing a block of data using your import script, you'll probably encounter a few reports that cause problems in some way. Often, you can make a minor enhancement to your import script, and re-import the same block of reports. If you've written your script to use insert ... where, the reports that succeeded before will merely update themselves to the same values; the reports that failed before will be inserted now.
To use insert ... where effectively, you'll need to know the field(s) of the incoming reports that uniquely identify them. (Your Tracker database assigns an $ENTITY_ID uniquely to each report, but you won't know that value for the incoming data, so it can't be used here.) Your old system may already have a sequence number or something similar. If not, you may be able to create a unique key using several fields. For example, if your old system records who originally submits a given report, and also records the time of submission fairly precisely, the submitter plus submission time will probably be unique because no single person can submit more than one report per second.
Some fields values may cover more than one line. For example, nearly every system has at least one field where the problem is described, which is usually more than a single line. Similarly, lists of items can span several lines. The example presented here uses nawk(1) to translate the text, however, and nawk is primarily line-oriented. A problem arises: you'll be writing this nawk script to recognize certain lines that identify fields and their values in the input, but you don't want to be misled by similar text buried within a body of text.
The solution used here is to notice when a multiline item begins; its beginning is easily recognized in the same way any other field value is recognized. A flag is set, indicating that the parsing is currently somewhere within a multiline item. A few rules concerned with noticing the end of such an item appear at the top of the script (so that they're considered before the general rules); these rules match only while the flag is set. These rules are principally responsible for recognizing the end of the multiline data.
The string data itself may contain quotes besides the quotes used to delimit it. The DML requires that such a quote has a backslash in front of it, like this: (\'). This in turn makes backslashes special: they, too, require backslashes in front of them.
If you have so many defect reports to import that you're considering writing a script to do it, you can probably determine how long this script will take to run. You can improve the performance of your script by making it surround groups of insertions in transactions; otherwise, each individual insertion will be a transaction by itself, which is time-consuming for the server. The exact number of reports you include in a given transaction isn't too critical in this context; grouping at least ten together is advantageous, but above that the rate of gain begins to fall off. If you make the transactions too large (say, hundreds of reports), things will slow down because the server grows too large (storing the pending transaction) and begins to page unduly. Twenty-five reports per transaction seemed to be a good compromise.
Your first requirement is to deal with the text output from your old database. The details of the format are not too important, since you'll be writing a translator anyway; make sure, though, that the format is easy to translate:
Start each new field on a separate line.
Start each line with the name of the field as you've declared it in the PDL.
Use consistent punctuation style.
If at all possible, enquote the string values in this text so that strings begin and end with an apostrophe, and embedded apostrophes are escaped with backslashes.
The input format for this example is shown in Figure 6-1. Several details of this format are worth pointing out:
The line of asterisks is part of the output; it precedes each report (called incidents in this system).
The incident_id is the unique identifier for the old system.
The data is enquoted, but is not entirely consistent; some integers are quoted, some are not.
Fields whose values are potentially multiline (such as description) have a double colon; all other fields have a single colon.
There are many fields in this form. In fact, the system as a whole has even more than these; when a field is blank for a given report, this text copy simply ignores that field.
Finally, notice that the first two lines of the description can be easily mistaken for field values. The older system that was being replaced here was based upon netnews and email messages—simple text messages that were read by an administrator to sort out details such as priority and assigned engineer. This stylized form of text entry made that job easier.
Keep your PDL file handy while preparing your translation script so that you can easily generate the code necessary to include the explicit typing. The fields section of the PDL file used in this system follows.
fields {
Product : short-text;
SGI_only : boolean;
alpha : short-text;
assigned_engineer : short-text;
assigned_group : short-text;
category : one-of software, hardware,
documentation ...;
classification : one-of bug, rfe, note;
closed_date : date;
command : short-text;
description : long-text;
doc_affected : short-text;
fix_descriptio : long-text;
fix_policy : short-text;
fixed_by : short-text;
incident_id : int;
irix_release : short-text;
machine : short-text;
message_id : short-text;
model_cpu : short-text;
model_gfx : short-text;
modified_date : date;
modified_user : short-text;
newsgroups : list-of one-of sgi_engr_case_bugs,
sgi_bugs_compilers,
sgi_bugs_dogwood,
sgi_bugs_cypress,
sgi_bugs_printware,
sgi_bugs_lonestar
...;
opened_date : date;
peripheral : short-text;
priority, dev_priority, CSD_priority : int;
product_version : short-text;
project : short-text;
released_product : boolean;
reported_by_customer : boolean;
reproducible : boolean;
resolution_id : short-text;
submitter : short-text;
submitter_domain : short-text;
submitter_machine : short-text;
summary : short-text;
to_incident_id : short-text;
importdate : date;
}
|
The translation tool in this case is merely a nawk script that recognizes each field by name and translates each one in a separate nawk production. This approach enables you to clearly recognize when an unexpected field comes along. This inflates the script quite a bit, with repetitious and uninteresting text. Most of the useful details are either at the head or at the tail.
Here are a few highlights of the script, the text of which follows immediately afterwards.
The script expects to be run with one or more file names in its command line. A verbose flag, -v, may be provided first, producing some diagnostic output. You'll probably be doing a lot of diagnosing of this script, as you learn what your data actually looks like (instead of what you thought it looked like); you can include any debugging aids.
A collection of nawk functions are defined at the beginning:
rpt() merely reports a message to stderr so that it doesn't get mixed into the output stream, which is probably being fed to dml.
error() reports a message, annotated as an error, which is an important distinction, even when you're both the author and the user of the script.
barf() reports an error and also arranges to skip the rest of the current incident.
beginbug() performs initial tasks at the beginning of each bug.
where() emits the where clause of the insert ... where statement used to put the data into the Tracker database. As you'll see below, the incident_id is noted when encountered in the body of the input, so it may be used here. where() is only used from within.
endbug(), performs necessary closing tasks. The counters maintained here keep track of when to close one transaction and open another.
intval(), enumval(), boolval(), strval(), lstrval(), and dateval() all reformat the input format into the proper format for each particular data type.
startLongText() is called whenever a long-text field is encountered. The test at the top checks if the field actually has only one line of text. If there are several lines, then the flag InLongText is set.
Two productions follow, guarded with by the InLongText flag. They recognize two different kinds of transitions from one report to the next.
The (skipbug == True) production implements the skipping of unparsable data set up by barf(). A production guarded by the flag InBugHeader is concerned with skipping the blank lines that follow the line of asterisks that begin a bug. The final InLongText line recognizes and passes through the lines that make up the body of a long-text field. They also notice the end of the field (notice the careful handling of apostrophes and back slashes).
A long string of trivial productions follows, each of which recognizes one particular field and calls the appropriate formatting routine (intval() and the other routines described above).
Eventually, the exception conditions are handled (delineated by a long comment line consisting of hash marks).
The END production is executed by nawk when it runs out of input data. In addition to ending the current bug as it would have if a new bug had been encountered, END deals with the possibility that the input set is not an exact multiple of the number of reports being bundled into a transaction. In this case, the end transaction is not printed by endbug; this would result in an error, and the entire transaction would not be performed, so END adds it.
Here is the translation script.
#!/bin/ksh
if [[ “$1” = -v ]] ; then
shift
verbose=-v
else
verbose=
fi
/usr/bin/nawk `
func rpt(msg) {
system(“echo >&2 “ msg);
}
func error(msg) {
rpt(“ERROR: “msg);
}
func barf(reason) {
error(“\\\”” FILENAME “\\\”, line “ FNR “: error: “ reason);
skipbug = True;
next;
}
func beginbug() {
if (counter == 0) {
printf “begin transaction;\n”;
}
printf “insert into tracker_request set \n”;
}
func where() {
printf “ where incident_id = “incident_id;
printf “;\n”
}
func endbug() {
# Do nothing on first pass (a trick to handle startup)
# Elsewise, print the where-clause.
# Once in a while, also end the transaction (and set counter == 0,
# to trigger beginbug() to start a new one).
if (counter == -1) {
counter = 0;
}
else if (counter == 24) {
printf “ importdate = date `\''”importdate”'\''\n”;
where();
printf “end transaction;\n”;
counter = 0;
fullcount += 1;
}
else {
printf “ importdate = “importdate”\n”;
where();
counter += 1;
fullcount += 1;
}
incident_id = 0;
tvbug_id = 0;
}
func intval() {
gsub(“`\''”,””,$0);
return “int `\''”$3”'\''”;
};
func enumval() {
gsub(“`\''”,””);
gsub(“,”,”'\'', one-of `\''”);
$1 = ““;
$2 = ““;
$3 = “one-of `\''”$3;
return $0”'\''”;
};
func boolval() {
gsub(“`\''”,””,$0);
if ($3 = 1) {
return “boolean `\''True'\''”;
}
else {
return “boolean `\''False'\''”;
};
};
func strval() {
$1 = “ “$1;
$2 = “= short-text”;
return $0;
};
func lstrval() {
$1 = “ “$1;
$2 = “= long-text”;
return $0;
};
func dateval() {
$1 = “date”;
$2 = ““;
gsub(“AM”,””,$0);
gsub(“PM”,””,$0);
return $0;
};
func startLongText() {
if ($0 ~ /.*[^\\]'\''$/) {
print “,”;
}
else{
InLongText = True;
}
}
BEGIN {
True = 1;
False = 0;
# Set things to recognize the new one
InBugHeader = True;
InLongText = False;
skipbug = False;
counter = -1;
fullcount = 0;
}
(InLongText != True) && ($0 ~ /^Results for pvquery command:/) {
if (verbose==”-v”) {
rpt(“\””$0”\””);
}
next;
};
(InLongText != True) && ($0 ~ /^\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*$/) {
endbug();
beginbug();
skipbug = False;
InBugHeader = True;
next;
}
(skipbug == True) {next;}
(InBugHeader == True) && ($0 ~ /^$/) {
# Skip
InBugHeader = False;
next;
}
(InLongText == True){
print $0;
if (($0 ~ /^'\''[ ]*$/) ||
($0 ~ /.*[^\\]'\''[ ]*$/) ||
($0 ~ /.*[^\\]\\\\'\''[ ]*$/)) {
print “,”;
InLongText = False;
};
next;
}
/^tvbug_id : / {
tvbug_id = $3;
next;
}
/^incident_id : / {
# integer
incident_id = intval();
print “ incident_id = “intval()”,”;
#rpt(“Incident: “incident_id);
next;
}
/^CSD_priority : `\''/ {
print “ CSD_priority = “intval()”,”;
next;
}
/^SGI_only : / {
# bool
print “ SGI_only = “boolval()”,”;
next;
}
/^age : / {
next; # Ignore this computed value
}
/^alpha : `\''/ {
print strval()”,”;
next;
}
/^assigned_engineer : `\''/ {
print strval()”,”;
next;
}
/^assigned_group : `\''/ {
print strval()”,”;
next;
}
/^category : `\''/ {
# software (hardware?)
print “ category = “enumval()”,”;
next;
}
/^classification : `\''/ {
# bug, rfe
print “ classification = “enumval()”,”;
next;
}
/^closed_date : `\''/ {
# date
print “ closed_date = “dateval()”,”;
next;
}
/^command : `\''/ {
# person
print strval()”,”;
next;
}
/^description :: `\''/ {
# long-text
print lstrval();
startLongText();
next;
}
/^dev_priority : `\''/ {
print “ dev_priority = “intval()”,”;
next;
}
/^doc_affected : `\''/ {
print strval()”,”;
next;
}
/^fix_description :: `\''/ {
print lstrval();
startLongText();
next;
}
/^fix_policy : `\''/ {
gsub(“-”,”_”,$0)
print strval()”,”;
next;
}
/^fixed_by : `\''/ {
# person
print strval()”,”;
next;
}
/^irix_release : `\''/ {
# short-text
print strval()”,”;
next;
}
/^machine : `\''/ {
# string
print strval()”,”;
next;
}
/^message_id : `\''/ {
# short-text
print strval()”,”;
next;
}
/^model_cpu : `\''/ {
# string
print strval()”,”;
next;
}
/^model_gfx : `\''/ {
# string
print strval()”,”;
next;
}
/^modified_date : `\''/ {
# date
print “ modified_date = “dateval()”,”;
next;
}
/^modified_user : `\''/ {
# person
print strval()”,”;
next;
}
/^newsgroups : `\''/ {
# list-of one-of
gsub(“\\.”,”_”,$0);
print “ newsgroups = (“enumval()”),”;
next;
}
/^opened_date : `\''/ {
# date
print “ opened_date = “dateval()”,”;
next;
}
/^peripheral : `\''/ {
print strval()”,”;
next;
}
/^priority : `\''/ {
print “ priority = “intval()”,”;
next;
}
/^product : / {
# list-of short-text?
$1 = “Product”;
print strval()”,”;
next;
}
/^product_version : `\''/ {
# person
print strval()”,”;
next;
}
/^project : `\''/ {
print strval()”,”;
next;
}
/^released_product : / {
# bool
print “ released_product = “boolval()”,”;
next;
}
/^reported_by_customer : / {
# bool
print “ reported_by_customer = “boolval()”,”;
next;
}
/^reproducible : / {
# bool
print “ reproducible = “boolval()”,”;
next;
}
/^resolution_id : / {
# integer
print “resolution_id = “intval()”,”;
next;
}
/^status : `\''/ {
print “ $STATE = “enumval()”,”;
next;
}
/^submitter : `\''/ {
# login
print strval()”,”;
next;
}
/^submitter_domain : `\''/ {
# string
print strval()”,”;
next;
}
/^submitter_machine : `\''/ {
# string
print strval()”,”;
next;
}
/^summary : `\''/ {
# short-text
print strval()”,”;
next;
}
########################################################
/^No incidents match criteria$/ {exit;}
/^$/ {next;}
/^[^ ]* :: `\''/{
# untested
error(“Unrecognized long text in incident “ incident_id
“: “ $0);
print strval();
startLongText();
next;
}
{
error(“Unrecognized short text in incident “ incident_id “: “ $0);
print strval()”,”;
next;
}
END{
endbug();
if (counter != 0) {
# endbug wont have noticed we need a transend here.
print “end transaction;”;
};
if (verbose==”-v”) {
rpt(“Total incidents imported: “fullcount);
}
}
` importdate=”$(/bin/date)” verbose=${verbose} “$@”
|
Here is the DML text produced for the sample incident, after passing through the filter.
begin transaction; insert into tracker_request set incident_id = int `106064', submitter = short-text `jackr', submitter_machine = short-text `dblues', opened_date = date `May 04 1993 03:00', category = one-of `software', classification = one-of `rfe', summary = short-text `RFE: Mouse motion posts redundant syntax-warning popups', priority = int `4', reproducible = boolean `True', SGI_only = boolean `True', message_id = short-text `kcamfic@sgi.sgi.com', newsgroups = ( one-of `sgi_engr_case_bugs'), released_product = boolean `True', reported_by_customer = boolean `True', description = long-text `assign to: jackr priority: 4 When there\'s a syntax error in the entry form, every time the mouse passes through that field, another copy of the error message is popped up. Since we expect mouse motion while these alerts are up (context-sensitive help and so on), this is tacky. Perhaps each pane or field could keep track of whether it already has an alert up? Does the Vk message thingie return the widget ID (so you could check if it\'s still alive)?' , resolution_id = int `106064', project = short-text `tracker', $STATE = one-of `closed', dev_priority = int `4', assigned_engineer = short-text `johnt', fixed_by = short-text `jackr', closed_date = date `May 21 1993 01:13', fix_description = long-text `fixed by previous take' , modified_date = date `May 21 1993 12:13', modified_user = short-text `jackr', importdate = Wed Jul 22 16:37:16 PDT 1993 where incident_id = int `106064'; end transaction; |