Several results are also presented in the arena of multiple models. The multiple models approach in relevant to the problem of making accurate classifications in ``real-world'' domains since it facilitates evidence combination which is needed to accurately learn on such domains. It is also useful when learning from small training data samples in which many models appear to be equally "good" w.r.t. the given evaluation metric. Such models often have quite varying error rates on test data so in such situations, the single model method has problems. Increasing search only partly addresses this problem whereas the multiple models approach has the potential to be much more useful. The most important result of the multiple models research is that the *amount* of error reduction afforded by the multiple models approach is linearly correlated with the degree to which the individual models make errors in an uncorrelated manner. This work is the first to model the degree of error reduction due to the use of multiple models. It is also shown that it is possible to learn models that make less correlated errors in domains in which there are many ties in the search evaluation metric during learning. The third major result of the research on multiple models is the realization that models should be learned that make errors in a negatively-correlated manner rather than those that make errors in an uncorrelated (statistically independent) manner. The thesis also presents results on learning probabilistic first-order rules from relational data. It is shown that learning a class description for each class in the data - the one-per-class approach - and attaching probabilistic estimates to the learned rules allows accurate classifications to be made on real-world data sets. The thesis presents the system HYDRA which implements this approach. It is shown that the resulting classifications are often more accurate than those made by three existing methods for learning from noisy, relational data. Furthermore, the learned rules are relational and so are more expressive than the attribute-value rules learned by most induction systems.
Finally, results are presented on the small-disjuncts problem in which rules that apply to rare subclasses have high error rates The thesis presents the first approach that is simultaneously successful at reducing the error rates of small disjucnts while also reducing the overall error rate by a statistically significant margin. The previous approach which aimed to reduce small disjunct error rates only did so at the expense of increasing the error rates of large disjuncts. It is shown that the one-per-class approach reduces error rates for such rare rules while not sacrificing the error rates of the other rules.
An empirical evaluation reveals that the new methods consistently perform as well or better than existing combining schemes for a variety of prediction problems. The success of these algorithms is explained empirically and analytically by demonstrating how they adhere to a set of theoretical and heuristic guidelines.
A byproduct of the empirical investigation is the evidence that existing combining methods fail to satisfy one or more of the guidelines defined. The new combining approaches satisfy these criteria by relying upon Singular Value Decomposition as a tool for filtering out the redundancy and noise in the predictions of the learn models, and for characterizing the areas of the example space where each model is superior. The SVD-based representation used in the new combining methods aids in avoiding sensitivity to correlated predictions without discarding any learned models. Therefore, the unique contributions of each model can still be discovered and exploited. An added advantage of the combining algorithms derived in this dissertation is that they are not limited to models generated by a single algorithm; they may be applied to model sets generated by a diverse collection of machine learning and statistical modeling methods.
The three main contributions of this dissertation are:
Michael J. Pazzani