Clustering of Regimes in the Earth's Upper Atmosphere
Project Participants
Introduction
We are investigating probabilistic mixture models for detecting
clusters in geopotential height records in the Northern Hemisphere. Geopotential
heights (the height at which the atmosphere attains a certain pressure)
have been recorded twice daily since 1948 on a spatial grid of over 500
points in the Northern Hemisphere. Of interest from an atmospheric science
viewpoint is the existence of specific spatial patterns which recur consistently
across different winters. The existence, shape, persistence, and number
of these patterns have important implications for our understanding of long-term
variability in the Earth's climate.
Methodology
The original spatial grid measurements are projected into
a low-dimensional space using principal components analysis.
Projected daily data from 1948 to 1993 are shown above.
We have investigated the use of
finite mixtures of Gaussian densities to model the density of the data in
the first few principal components. The Expectation-Maximization (EM) procedure
is used to estimate the parameters of the mixture models from the historical
data. The fitted Gaussian components are interpreted as probabilistic
clusters. The means of the Gaussian densities in principal component space
can be used to construct equivalent maps on the full spatial
grid, allowing for a physical interpretation of the fitted clusters.
Cross-validated likelihood is then used to determine the best number of
Gaussian components to fit to the data.
Results to date
- Cross-validation determined that k=3 Gaussian components
explain the data better than any other value of k.
- The spatial maps for the 3-component model correspond
to 3 very well-known maps in atmospheric science:
the Gulf of Alaska
pattern, the Rockies ridge pattern, and the Greenland
blocking pattern (e.g., see Cheng and Wallace, 1995).
- This is the first objective quantification of multimodality
in geopotential height data, subject to the Gaussian mixture assumption.
- The k=3 result is intriguing, since it produces the exact
same 3 clusters as Cheng and Wallace (1995) found in their clustering study
performed using hierarchical clustering directly on the spatial grids,
i.e., the same set of 3 maps were found independently in both the Cheng
and Wallace study and in ours, using completely different clustering
methodologies.
This tends to suggest that the 3 regimes (clusters) are strongly
present in the data.
Papers
-
P. Smyth, M. Ghil, and K. Ide,
`Multiple regimes in Northern hemisphere height fields
via mixture model clustering,'
Technical Report UCI-ICS 98-08. (an extended version
of this paper will appear in the Journal of Atmospheric
Science, in press.)
-
P. Smyth,
`Model selection for probabilistic clustering
using cross-validated likelihood,'
Technical Report UCI-ICS 98-09.
-
P. Smyth and D. Wolpert,
`An evaluation of linearly combining
density estimators via stacking,' Technical Report UCI-ICS 98-25.
-
P. Smyth,
`Clustering using Monte Carlo cross-validation,'
Proceedings of the 2nd International Conference on
Knowledge Discovery and Data Mining, AAAI Press, 1996.
Funding
This work is supported by a grant from the National
Science Foundation and by funding from the Jet Propulsion Laboratory and
NASA.
Related Projects at the DataLab
- Markov modeling of remotely-sensed images with cloud
contmination
- Clustering of sequences
Related Web Pages of Interest
Information and Computer Science
University of California, Irvine
CA 92717-3425
Last modified: November 16th 1998