Machine learning

Next: Artificial neural networks Up: Classes of methods Previous: Classes of methods

Machine learning

Machine learning is the subfield of artificial intelligence concerned with the design of automatic procedures able to learn from examples. Concept learning from examples denotes the process of deriving a logical description of the necessary and sufficient conditions corresponding to a class of objects, i.e. a rule in some given representation language. A major concern is to find out adequate compromises between rule complexity and data fit, so as to avoid overfitting and to privilege interpretability.

Top down induction of decision trees (TDIDT) is one of the most successful classes of such methods which was popularized by Quinlan [5]. Figure 2 shows a hypothetical binary decision tree (DT) : to infer the output information corresponding to given input attribute values, one traverses the tree, starting at the top-node, and applying sequentially the dichotomous tests encountered to select the appropriate successor. When a terminal node is reached, the output information stored there is retrieved.

Figure 2: Hypothetical decision tree

As suggested by the acronym, TDIDT approaches the decision tree learning in a divide and conquer fashion, whereby a decision tree is progressively built up, starting with the top-node and ending up with the terminal nodes. At each step, a tip-node of the growing tree is considered and the algorithm decides whether it will be a terminal node or should be further developed. To develop a node, an appropriate attribute is first identified, together with a dichotomy on its values. The subset of its learning examples corresponding to the node is then split according to this dichotomy into two subsets corresponding to the successors of the current node. The terminal nodes are ``decorated'' with appropriate information on the output values derived from their learning examples, e.g. the majority class label.

To build good decision trees, an algorithm must rely on appropriate optimal splitting and stop splitting rules. Optimal splitting has to do with selecting a dichotomy at a test node so as to provide a maximum amount of information on the output value, whereas stop splitting has to identify situations where further splitting would either be useless or lead to performance degradation, due to overfitting. In our simulations we have used the TDIDT method described in [6].

Decision trees have been quite extensively studied in the context of various security assessment problems [11][10][9][8][7]. A main asset lies in the explicit and logical representation of the induced classification rules and the resulting unique explanatory capability. In particular, the method provides systematic correlation analyses among different attributes and identifies the most discriminating attributes at each tree node. From the computational viewpoint it is efficient at the learning stage as well as at the prediction stage.

Next: Artificial neural networks Up: Classes of methods Previous: Classes of methods

Wed Jan 18 20:24:41 MET 1995