This chapter corrects some of the errors in the MineSet User's Guide . All of the chapter and section references refer to the User's Guide.
This chapter contains the following sections:
Context-sensitive help is available throughout Mineset. Type shift-F1 to turn the mouse cursor into a question mark, then click on the area for which you would like help.
In Chapter 3, “The Tool Manager,” two sections, “The Add Column Button” and “The Filter Button” point the reader to the Tree Visualizer Appendix (Appendix B) for a further explanation of the available operators and functions. Actually, the Tree Visualizer and the Tool Manager have slightly different options available.
The expression language used in the Filter and Add Column panels is similar to expressions in C, C++, and Java. The basic operators are the same:
+ | addition |
- | subtraction |
* | multiplication |
/ | division |
( ) | parentheses for grouping expressions |
% | modulo (remainder after division) |
! | logical NOT |
~ | logical NOT |
&& | logical AND |
|| | logical OR |
^ | logical exclusive OR |
== | equal to |
!= | not equal to |
<= | less than or equal to |
< | less than |
>= | greater than or equal to |
> | greater than |
& | bitwise AND |
| | bitwise OR |
The expression language also provides the following:
isNull( ) | determines if the value in parentheses is null |
if ( ) then ( ) else ( ) | standard if/then/else |
( ) ? ( ) : ( ) | C syntax if/then/else |
divide( x, y, z ) | divide x by y, and give value z if y is 0 |
In Chapter 14, “Inducing and Visualizing the Decision Table,” Figure 14-13 is incorrect. Figure 4-1 shows the correct illustration.
In Chapter 15, “Inducing and Visualizing the Regression Tree,” the “Decision Nodes” subsection of the “Visualizing the Regression Tree” section should read as follows:
Decision nodes specify the attribute that is tested at the node. Values (or ranges of values) against which the attributes are tested are shown at the lines. Each possible value for the attribute matches exactly one line. For example, the root of the Regression Tree in Figure 15-1 tests the attribute age; the two lines emanating from the node partition values for that attribute (<= 27.5, > 27.5) so that every possible value matches either the right branch or the left branch. If the value is unknown and there is no line labeled with a question mark, the mean or median label value at the current node is predicted.
Also in Chapter 15 , the second and third paragraphs of the “Node Information,” subsection of the “Visualizing the Regression Tree” section should be deleted.
In “The .schema File” section of Appendix A, the “Data Statements” subsection lists the data types allowed in data statements. Enumerations, fixed arrays, and enumerated arrays were inadvertently left out of the list (they are described in later subsections of Appendix A, however).
The following is the corrected wording:
The data statements declare the columns in the data file. The columns must be declared in the order they appear in the data file. The format of most data statements is:
type name; |
where type is int, float, double string, dataString, date, and fixedString(n), where n is an
integer representing the width of the string; name is the variable name. Unlike in C, only one variable can be declared per statement.
Other supported types include enumerations, fixed arrays, and enumerated arrays. These data types must be declared inside the `input' section, before the declaration of the specific column.
In Appendix D, “Creating Data and Configuration Files for the Scatter Visualizer,” the “The Max Clause” subsection of the “Size Statement” section is incorrect. The correct wording is as follows:
In Appendix E, “Creating Data and Configuration Files for the Splat Visualizer,” the “Opacity Statement” section is incorrect. The correct version is as follows:
In the Splat Visualizer, the opacity is based on counts, or more generally, record weights.
If a column is mapped to this requirement, it is used to weight each record (rather than using 1) when computing a value for the opacity. Thus, if you had a column with values for population, density, or the result of a count aggregation, you might want to map this column to the opacity (weight) requirement. If you had no such column, the requirement can be left unmapped, and a column of 1's is used by default.
The opacity statement describes how a field of data is mapped to the opacity of the splats. The opacity statement consists of a series of clauses, separated by commas:
opacity clause1, clause2,... |
Alternatively, the clauses can be given in separate opacity statements.