Current and Recent Research Projects in the Smyth DataLab Group
Projects in our research group span a variety of topics involving fundamental research on basic aspects of
machine learning, including predictive modeling, deep learning, Bayesian methods,
time-series and sequence modeling, image and spatial data analysis, text analysis, and more. Our group
has a long and successful history of developing new ideas and algorithms that lie at the intersection of machine
learning and statistics. We are
also involved in numerous applications, working with expert collaborators to apply machine learning
and statistical techniques to address real-world problems across a variety of important application
areas in medicine, science, engineering, the social sciences, and business.
PhD students interested in joining our
Datalab research group should apply to either the
Computer Science or Statistics PhD programs at UC Irvine. Students with strong quantitative backgrounds are particularly encouraged to apply (with undergraduate or Master's degrees in Computer Science, Electrical Engineering, Statistics, Mathematics, Physics, or related areas).
Below are some examples of current and recent projects in our group. The list below is illustrative
rather than exhaustive - we have broad interests in our research group and are often exploring new ideas and projects that are not listed below.
Statistical and probabilistic methods for machine learning
Developing new theories and techniques that bridge statistical ideas with methods from deep learning, such as the use of
Bayesian forecasting and online probabilistic calibration for deep neural networks. We have recently started multiple new projects in this area on topics such as new methods for
quantifying uncertainty and confidence in predictions from black-box models and development of novel hybrid algorithms that blend human and machine predictions. Joint work with
Professor Mark Steyvers in Cognitive Sciences at UCI.
Funded by the National Science Foundation and by a Qualcomm faculty award.
Statistical and deep learning models for sequential data over time
Exploring new statistical and deep learning approaches models for analyzing time-series data, such as behavioral data from digital devices such as mobile phones or biomedical event data, using approaches such as marked point processes and probabilistic embedding methods. Joint work with
Professor Stephan Mandt's group at UCI. Funded by the National Science Foundation, the Hasso Plattner Institute, and the CSAFE Center for Statistics and Applications in Forensic Evidence (funded by NIST).
Machine learning for climate and environmental data analysis
Spatio-temporal machine learning models for analyzing and predicting environmental and climate processes using
satellite data related to precipitation, temperature, land-use,
wildfires, and more.
This is part of a
collaboration with the Earth Systems Science Department and Civil and Environmental
Engineering Departments at UCI, as well as University of Chicago and University of Wisconsin, Madison. This research is also part of a large
NSF NRT graduate training program award at UCI at the interface of machine learning and the physical sciences running from 2016 to 2021.
Funded by NASA, the
National Science Foundation, and the California Strategic Growth Council.
Machine learning for health and medicine
Our group is involved in a number of projects related to health and medicine, including
development of new statistical machine learning methods for analysis of multivariate point sets of cell-level flow-cytometry data, with
applications to cancer diagnosis, in collaboration with the J. Craig Venter Institute, UCSD, and Stanford
University, and funded by the National Institutes of Health (NIH).
Also, sequential machine learning models for automated analysis of topics and emotions in patient-doctor dialog, in collaboration with the University of Utah, University of Washington, and UCSD, and funded by the National Institutes of Health (NIH), PCORI, and SAP.