Current and Recent Research Projects in the Smyth DataLab Group
Projects in our research group span a variety of topics involving fundamental research on basic aspects of
machine learning, including predictive modeling, deep learning, Bayesian methods,
time-series and sequence modeling, image and spatial data analysis, text analysis, and more. Our group
has a long and successful history of developing new ideas and algorithms that lie at the intersection of machine
learning and statistics. We are
also involved in numerous applications, working with expert collaborators to apply machine learning
and statistical techniques to address real-world problems across a variety of important application
areas in medicine, science, engineering, the social sciences, and business.
PhD students interested in joining our
Datalab research group should apply to either the
Computer Science or Statistics PhD programs at UC Irvine. Students with strong quantitative backgrounds are encouraged to apply (with undergraduate or Master's degrees in areas such as Computer Science, Electrical Engineering, Statistics, Mathematics, or related areas).
Below are some examples of current and recent projects in our group. The list below is illustrative
rather than exhaustive - we have broad interests in our research group and are often exploring new ideas and projects that are not listed below.
Statistical and probabilistic methods for deep learning
Developing new theories and techniques that bridge statistical ideas with methods from deep learning, such as
Bayesian forecasting and online probabilistic calibration for deep neural networks. We have recently started multiple new projects in this area on topics such as new methods for
quantifying uncertainty and confidence in predictions from black-box models and development of novel hybrid algorithms that blend human and machine predictions. Joint work with
Professor Mark Steyvers in Cognitive Sciences at UCI.
Funded by the National Science Foundation and by a Qualcomm faculty award.
Statistical and deep learning models for event data over time
Exploring new statistical and deep learning approaches models for analyzing time-series event data, such as behavioral data from digital devices such as mobile phones or biomedical event data, using approaches such as marked point processes and probabilistic embedding methods. Joint work with
Professor Stephan Mandt's group at UCI. Funded by the National Science Foundation and by gift funding from Adobe, Google, and Xerox.
One component of this research is a digital forensics project, which
is part of the CSAFE consortium funded by the National Institute for Standards and Technology (NIST),
in collaboration with researchers in statistics, computer science, and criminology at Carnegie Mellon University, Iowa State, and the University of Virginia.
Climate and environmental data analysis
Spatio-temporal models for analyzing and predicting environmental and climate processes using
satellite data related to precipitation, temperature, land-use,
wildfires, and more.
This is part of a
collaboration with the Earth Systems Science Department and Civil and Environmental
Engineering Departments at UCI, as well as University of Chicago and University of Wisconsin, Madison. This research is also part of a large
NSF NRT graduate training program award at UCI at the interface of machine learning and the physical sciences running from 2016 to 2021.
Funded by NASA, the
National Science Foundation, and the California Strategic Growth Council.
Automated analysis of human dialog with applications to healthcare
Investigating sequential classification methods for detection of topics and emotions in patient-doctor dialog, using
techniques such as topic models, hidden Markov models and recurrent neural networks.
In collaboration with the University of Utah, University of Washington, and UCSD.
Funded by the National Institutes of Health (NIH), PCORI, and SAP.
Machine learning techniques for medical diagnosis
Developing new statistical machine learning methods for analysis of multivariate cell-level flow-cytometry data with
applications to medical diagnosis.
In collaboration with the J. Craig Venter Institute, UCSD, and Stanford
University. Funded by the National Institutes of Health (NIH).
Machine learning for analysis of online education data
Developing new machine learning methods that can extract useful information about student behavior
from time-series clickstream data from student online activity.
In collaboration with the Department of Education at UCI.
Funded by the National Science Foundation.