Chapter 4. MineSet User's Guide Errata

This chapter corrects some of the errors in the MineSet User's Guide . All of the chapter and section references refer to the User's Guide.

This chapter contains the following sections:

MineSet Help

Context-sensitive help is available throughout Mineset. Type shift-F1 to turn the mouse cursor into a question mark, then click on the area for which you would like help.

Chapter 3, “The Tool Manager”

In Chapter 3, “The Tool Manager,” two sections, “The Add Column Button” and “The Filter Button” point the reader to the Tree Visualizer Appendix (Appendix B) for a further explanation of the available operators and functions. Actually, the Tree Visualizer and the Tool Manager have slightly different options available.

The expression language used in the Filter and Add Column panels is similar to expressions in C, C++, and Java. The basic operators are the same:

+

addition

-

subtraction

*

multiplication

/

division

( )

parentheses for grouping expressions

%

modulo (remainder after division)

!

logical NOT

~

logical NOT

&&

logical AND

||

logical OR

 

^

logical exclusive OR

==

equal to

!=

not equal to

<=

less than or equal to

<

less than

>=

greater than or equal to

>

greater than

&

bitwise AND

|

bitwise OR

The expression language also provides the following:

isNull( )

determines if the value in parentheses is null

if ( ) then ( ) else ( )

standard if/then/else

( ) ? ( ) : ( )

C syntax if/then/else

divide( x, y, z )

divide x by y, and give value z if y is 0


Chapter 14, “Inducing and Visualizing the Decision Table”

In Chapter 14, “Inducing and Visualizing the Decision Table,” Figure 14-13 is incorrect. Figure 4-1 shows the correct illustration.

Figure 4-1. Correction for Figure 14-13, Closer Inspection of the Adult Dataset

Figure 4-1 Correction for Figure 14-13, Closer Inspection of the Adult Dataset

Chapter 15, “Inducing and Visualizing the Regression Tree”

In Chapter 15, “Inducing and Visualizing the Regression Tree,” the “Decision Nodes” subsection of the “Visualizing the Regression Tree” section should read as follows:

Decision nodes specify the attribute that is tested at the node. Values (or ranges of values) against which the attributes are tested are shown at the lines. Each possible value for the attribute matches exactly one line. For example, the root of the Regression Tree in Figure 15-1 tests the attribute age; the two lines emanating from the node partition values for that attribute (<= 27.5, > 27.5) so that every possible value matches either the right branch or the left branch. If the value is unknown and there is no line labeled with a question mark, the mean or median label value at the current node is predicted.

Also in Chapter 15 , the second and third paragraphs of the “Node Information,” subsection of the “Visualizing the Regression Tree” section should be deleted.

Appendix A, “Flat File Support for MineSet”

In “The .schema File” section of Appendix A, the “Data Statements” subsection lists the data types allowed in data statements. Enumerations, fixed arrays, and enumerated arrays were inadvertently left out of the list (they are described in later subsections of Appendix A, however).

The following is the corrected wording:

Data Statements

The data statements declare the columns in the data file. The columns must be declared in the order they appear in the data file. The format of most data statements is:

type name;

where type is int, float, double string, dataString, date, and fixedString(n), where n is an
integer representing the width of the string; name is the variable name. Unlike in C, only one variable can be declared per statement.

Other supported types include enumerations, fixed arrays, and enumerated arrays. These data types must be declared inside the `input' section, before the declaration of the specific column.

Appendix D, “Creating Data and Configuration Files for the Scatter Visualizer”

In Appendix D, “Creating Data and Configuration Files for the Scatter Visualizer,” the “The Max Clause” subsection of the “Size Statement” section is incorrect. The correct wording is as follows:

The Max Clause

Normally, the size variable is mapped to the size of the entities, so that the biggest entity has a size of 5. This size can be changed by specifying a different value. If there is no size variable, the default maximum size is 5. The max clause has the form:

max float

Appendix E, “Creating Data and Configuration Files for the Splat Visualizer”

In Appendix E, “Creating Data and Configuration Files for the Splat Visualizer,” the “Opacity Statement” section is incorrect. The correct version is as follows:

Opacity Statement

In the Splat Visualizer, the opacity is based on counts, or more generally, record weights.

If a column is mapped to this requirement, it is used to weight each record (rather than using 1) when computing a value for the opacity. Thus, if you had a column with values for population, density, or the result of a count aggregation, you might want to map this column to the opacity (weight) requirement. If you had no such column, the requirement can be left unmapped, and a column of 1's is used by default.

The opacity statement describes how a field of data is mapped to the opacity of the splats. The opacity statement consists of a series of clauses, separated by commas:

opacity clause1, clause2,...

Alternatively, the clauses can be given in separate opacity statements.

The Opacity Variable

The first clause normally contains the name of a field to be mapped to opacity. The field must be of a number type (int, float, or double), of which float is the most efficient.

The Max Clause

The max clause allows you to alter the initial opacity setting for the scene. The most opaque splat in the scene will match the value specified in this max clause. The default is 1. The max clause has the form:

max float