Levow (1998) used a decision tree to allow a natural language
recognition system to recognise when a human user
was giving a spoken correction, thus allowing it to learn
and identify problem utterances, while Greenspan (1998)
used a decision tree classifier for object recognition in cluttered
and noisy images. However, regardless of how wide
their application, decision trees are still most commonly used
for the purposes of classification, although this term can be
used here to cover an extremely broad range of topics, and
can be applied within a wide range of human endeavours.
Jensen and Arnspang (1999), for example, used a decision
tree classification system to identify musical instruments
from timbre data, while Zmazek, Todorovski, Dz?eroski,
Vaupotic?, and Kobal (2003) used the paradigm to predict
radon gas concentration from other environmental factors,
leading to a possible future earthquake prediction system.
Going a step further, Bernstein and Provost (2001) used
decision trees in the development of a knowledge discovery
assistant, in order to categorise different methods used to
solve a specific problem.
Lim, Loh, and Shih
(1998) compared several decision tree, statistical and neural
network methods on a variety of datasets. Both of these
works showed that a wide range of speed and accuracies
can be obtained from the different decision tree algorithms
commonly used, and that the effectiveness of different algorithms
varies greatly with the dataset.
Friedman, Kohavi, and Yun (1996) discussed the problems
of constructing decision trees, the main one of which
is what question to ask at each node in order to divide and
conquer the data set optimally. They showed that this
problem becomes harder as one deals with larger and larger
data sets, and with more and more variables. Fulton, Kasif,
Salzberg, and Waltz (1996), in a related analysis of the
problems of generating decision trees capable of dealing
with large, complex data sets, showed that it is simpler to
construct decision trees that can deal with a small subset
of the original data set. This concept is one that is important
to the present work, as is discussed later in the Introduction.
Another related and relevant topic was covered by
Alsabti, Ranka, and Singh (1998), who discussed the problems
of scaling decision trees up to large datasets, with the
loss of accuracy that often occurs as a result.
Garofalakis, Hyun, Rastogi, and Shim (2000) discussed
methods for constructing decision trees with user-defined
constraints such as size limits or accuracy. These limits
are often important for users to be able to understand or
use the data sets adequately, or to avoid over-fitting the
decision tree to the data that is available. Ankerst, Elsen,
Ester, and Kriegel (1999) used an interactive approach,
with the user updating the decision tree through the use
of a visualisation of the training data. This method resulted
in a more intuitive decision tree and one that the user was
capable of implementing according to their existing knowledge
about the system in question.