Michael J. Pazzani: Research

Research Overview:

The common theme behind my research is the investigation and analysis of learning methods that make use of prior knowledge to guide the learning process. Typically, these learning methods combine empirical (i.e., correlational or data-driven) and explanation-based (i.e., analytical or knowledge-intensive) learning techniques. The goal is to create learning systems that accept as input background knowledge, although incomplete and incorrect, along with training examples, and learn to make classifications that are more accurate than that made by either the background knowledge alone, or by the results of applying an induction algorithm on the training data.

My early work on OCCAM [1] describes a learning system that has the capability of acquiring knowledge empirically and later using this knowledge to facilitate knowledge-intensive learning. This research was inspired by psychological findings on the types of information that people use during learning and how this information affects the rate of learning. Part of this research also focused on the acquisition of causal relationships [2]. In this paper, it is argued that in addition to specific knowledge of actions and effect, the process of learning causal relationships is also facilitated by general knowledge of causality. That is, causal relationships that conform to one of a number of common patterns of causal relationships are easier for human subjects to learn. This paper also provides experimental evidence collected from human subjects. An experiment shows that human subjects learning a causal relationship that conforms to one particular causal pattern require fewer trials than subjects learning a causal relationship that violates this pattern. In more recent research in this framework [3], I have addressed issues of learning when the background knowledge is overly general. In addition, in [4] I have addressed the issue of the acquisition of the common patterns of causal relationships used by OCCAM and show that they can be formed by looking for commonalities among rules found by an empirical learner.

[1] Pazzani, M. (1990). Creating a memory of causal relationships- An integration of empirical and explanation-based learning methods. Hillsdale, NJ- Lawrence Erlbaum Associates.

[2] Pazzani, M. (1991). A computational theory of learning causal relationships. Cognitive Science, 15, 401-424.

[3] Pazzani, M. (1989). Detecting and correcting errors of omission after explanation-based learning. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 713-718). Detroit, MI- Morgan Kaufmann.

[4] Pazzani, M. (1992). Learning causal patterns: Making a transition from data-driven to theory-driven learning. Machine Learning, 11, 173-194. Abstract

Information about getting a copy of Occam.

My more recent research has focused on more thoroughly investigating issues raised during the development of OCCAM. In particular, I have explored the use of prior knowledge in learning from mathematical, psychological, computational and applied points of view.

Mathematical modeling of learning algorithms

We have developed an approach to average case analysis of learning algorithms [5,6]. The average case model is based upon determining, for a given distribution of examples, the probability that an algorithm will revise its hypothesis and the effect that revising the hypothesis will have on the accuracy of the hypothesis. Models were created for three different algorithms learning from a specific distribution (a product distribution). It was shown that a particular algorithm combining empirical and explanation-based learning [7] is more accurate than either its empirical or explanation-based component alone. An average case model was needed because the "standard" mathematical models of learning algorithms (i.e., Probably Approximately Correct models) are worst-case models and the worst-case behavior of all three algorithms is identical. This paper also evaluates the assumptions from which the average-case model was derived by experimentally demonstrating that it accounts for a large percentage of the variation in observed accuracy on a naturally occurring data set and on several artificial data sets deliberately constructed to violate assumptions of the model.

[5] Pazzani, M., & Sarrett, W. (1989). Average case analysis of conjunctive learning algorithms. Proceedings of the Seventh International Conference on Machine Learning (pp. 339-347). Austin, TX- Morgan Kaufmann.

[6] Pazzani, M., & Sarrett, W. (1992). A framework for average case analysis of conjunctive learning algorithms. Machine Learning, 9, 342-372. Abstract

[7] Sarrett, W., & Pazzani, M. (1989). One-sided algorithms for integrating empirical and explanation-based learning. Proceedings of the Sixth International Workshop on Machine Learning (pp. 26-28). Ithaca, NY- Morgan Kaufmann.

Investigation of human learning

My interest in machine learning algorithms that combine correlational and analytical methods arises from the fact that much of human learning of predictive relationships can be characterized as using a combination of these methods. Part of my research has been the conducting of psychological experiments to assess the impact of various types of prior knowledge on human learning rates. The psychological research is performed to test hypotheses that occur during the development of computational models of learning. Typically, the dependent variable measured is the learning rate as determined by the number of trials. We have described [7] a learning problem in which the effect of consistency with prior knowledge dominates the effect of concept complexity in human learners. Further work [8] expands upon this finding. In particular, in addition to the learning rate, this paper shows that prior knowledge influences the attributes that subjects attend to and the types of hypotheses formed. In addition, this publication shows that the type of knowledge used by human subjects cannot easily be encoded into the domain theory used by explanation-based learning. A different encoding of knowledge, an influence theory, is proposed. In such a theory, the influence of several factors is known, but a domain theory does not specify a systematic means of combining the factors. In [9] we extend this work and evaluate a competing model based upon weighting of features rather than consistency with prior knowledge. It is shown that feature weighting models cannot account for certain complex concepts in which two individual features individually exert a positive influence on an outcome but collectively exert a negative influence.

[8] Pazzani, M., & Schulenburg, D. (1989). The influence of prior theories on the ease of concept acquisition. Proceedings of the Eleventh Annual Conference of the Cognitive Science Society (pp. 812-819). Ann Arbor, MI- Lawrence Erlbaum.

[9] Pazzani, M. (1991). The influence of prior knowledge on concept acquisition: Experimental and computational results. Journal of Experimental Psychology- Learning, Memory & Cognition, 17, 3, 416-432. Abstract

[10] Pazzani, M., & Silverstein, G. (1990). Feature selection and hypothesis selection models of induction. Proceedings of the Twelfth Annual Conference of the Cognitive Science Society (pp. 221-228). Cambridge, MA- Lawrence Erlbaum.

A more recent paper Learning Sets of Related Concepts: A Shared Task Model by Tim Hume and Michael J. Pazzani from the 1995 Cognitive Science Conference is available in HTML format.

The development of learning algorithms

I have also been involved with the creation of practical learning algorithms. Recently [11,12] several of my graduate students (Cliff Brunk, Kamal Ali and Glenn Silverstein) and I have created a significant extension to Ross Quinlan's FOIL program. FOIL is an empirical learning program that uses an information-based evaluation function to learn Horn Clause concepts. I have constructed a compatible explanation-based learning program that uses the same information-based metric to guide the proof process. The resulting combined system (FOCL) has been shown to take advantage of incomplete and incorrect domain theories. In addition, the effect of other kinds of background knowledge, such as typing and commutative relationships, was considered both empirically and analytically. FOCL is novel in that the integration between empirical and explanation-based learning is tighter than that of previous systems. Both the explanation-based and empirical learning components serve the same purpose (adding literals to a clause under construction) and use the same evaluation function. Furthermore, the domain theory used by the explanation-based program defines relations that can be used by the empirical program. We have also investigated learning in this framework when the training data is incorrectly classified [13].

[11] Pazzani, M., & Kibler, D. (1992). The role of prior knowledge in inductive learning. Machine Learning, 9, 54-97.

[12] Pazzani, M., Brunk, C., & Silverstein, G. (1991). A knowledge-intensive approach to learning relational concepts. Proceedings of the Eighth International Workshop on Machine Learning (pp. 432-436). Evanston, IL- Morgan Kaufmann.

[13] Brunk, C., & Pazzani, M. (1991). An investigation of noise-tolerant relational concept learning algorithms. Proceedings of the Eighth International Workshop on Machine Learning (pp. 389-391). Evanston, IL- Morgan Kaufmann. Abstract

Information about getting a copy of FOCL.

Applications of learning methods

A portion of my research has focused on adapting existing learning methods to problems in areas such as engineering and political science. For example, [14] describes an application of explanation-based learning to a problem of identifying conditions that are indicative of a component failure in the attitude control system of the DSCS-III satellite. This paper describes a method for learning diagnosis heuristics (i.e., rules that encode associations between data values in a telemetry stream) from information contained in a qualitative model of the satellite. The application of a learning method to data on foreign trade negotiations is discussed in [15].

[14] Pazzani, M. (1989). Learning fault diagnosis heuristics from device descriptions. In Y. Kodratoff & R. Michalski (Eds.), Machine Learning- An artificial intelligence approach (Volume III). Los Altos, CA- Morgan Kaufmann.

[15] Cain, T., Pazzani, M., & Silverstein, G. (1991). Using domain knowledge to influence similarity judgments. Case-Based Reasoning Workshop. Washington, DC- Morgan Kaufmann.

Research Talks

Michael J. Pazzani
Department of Information and Computer Science,
University of California, Irvine
Irvine, CA 92697-3425