Other Applications

This chapter presents two shorter investigations which use Graph Diffusion Distance.

Shape Analysis for Discretized Meshes

In this section we demonstrate that graph diffusion distance captures structural properties of 3D point clouds. Ten 3D meshes (see Figure [fig:mesh_grid] for an illustration of the meshes used) were chosen to represent an array of objects with varying structural and topological properties. Not all of the mesh files chosen are simple manifolds: for example, the “y-tube" is an open-ended cylinder with a fin around its equator. Each mesh was used to produce multiple graphs, via the following procedure:

[sample_step_a] Subsampling the mesh to 1000 points;
Performing a clustering step on the new point cloud to identify 256 cluster centers;
Connecting each cluster center to its 16 nearest neighbors in the set of cluster centers.

Since each pass of this procedure (with different random seeds) varied in Step [sample_step_a], each pass produced a different graph. We generated 20 graphs for each mesh, and compared the graphs using GDD.

3D meshes used in the shape analysis experiment. Each mesh was used to produce several sampled discretizations, which were then compared using GDD.

[fig:mesh_grid]

The results of this experiment can be seen in Figure [fig:gdd_of_meshes]. This Figure shows the three first principal components of the distance matrix of GDD on the dataset of graphs produced as described above. Each point represents one graph in the dataset, and is colored according to the mesh which was used to generate it. Most notably, all the clusters are tight and do not overlap. Close clusters represent structurally similar objects: for example, the cluster of graphs from the tube mesh is very close to the cluster derived from the tube with an equatorial fin. This synthetic dataset example demonstrates that graph diffusion distance is able to compare structural information about point clouds and meshes.

Embedding of pairwise distances between mesh discretizations. We see that GDD clusters each category of mesh tightly, and furthermore that clusters are nearby when they are structurally similar meshes, and distant otherwise. Axes represent the three principal components of the distance matrix and are thus unitless.

[fig:gdd_of_meshes]

Morphological Analysis of Cell Networks

In this section, we present an application of GDD to biological image analysis, and a generalization of GDD that makes it more suitable for machine learning tasks.

Biological Background

Species in the plant genus Arabidopsis are of high interest in plant morphology studies, since 1) its genome was fully sequenced in 1996, relatively early (Kaul et al. 2000), and 2) its structure makes it relatively easy to capture images of the area of active cell division: the shoot apical meristem (SAM). Recent work (Schaefer et al. 2017) has found that mutant Arabidopsis specimens with a decreased level of expression of genes trm6,trm7, and trm8 demonstrate more variance in the placement of new cell walls during cell division.

We prepare a dataset of Arabidopsis images with the following procedure:

Two varieties of Arabidopsis (wild type as well as “trm678 mutants”: mutants with decreased expression of all three of TRM6,TRM7,and TRM8) were planted and kept in the short-day condition (8 hours of light, 16 hours of dark) for 6 weeks.
Plants were transferred to long-day conditions and kept there until the SAM had formed. This took two weeks for wild-type plants and three weeks for TRM678 mutants.
The SAM of each pant was then was then dissected and observed with a confocal microscope, Leica SP8 upright scanning confocal microscope equipped with a water immersion objective (HC FLUOTAR L 25x/0.95 W VISIR).
This resulted in 3D images of the SAM from above (e.g. perpendicular to the plane of cell division) collected for both types of specimens.
Each 3D image was converted to a 2D image showing only the cell wall of the top layer of cells in the SAM.

We take 20 confocal microscope images (13 from wild-type plants and 7 from trm678 mutants) of shoot apical meristems, and process them to extract graphs representing local neighborhoods of cells. Each graph consists of a cell and its 63 closest neighbors (64 cells total). Cell neighborhood selection was limited to the central region of each SAM image, since the primordia surrounding the SAM are known to have different morphological properties. For each cell neighborhood, we produce a graph by connecting two cells iff their shared boundary is 30 pixels or longer. For each edge, we save the length of this shared boundary, as well as the angle of the edge from horizontal and the edge length. We extracted 600 cell neighborhoods for each type, for a total of 1200 graphs. See Figure [fig:cell_lines] for an example SAM image and resulting graph, and see Figure [fig:cell_extracted].

Left: a microscope image of the SAM of a mutant Arabidopsis specimen. Right: the same specimen, with separate cells false-colored and an example extracted cell neighborhood graph overlaid.

[fig:cell_lines]

Top: ten example morphological graphs extracted from wild-type SAM images. Bottom: ten morphological graphs drawn from trm678 mutant images.

[fig:cell_extracted]

Species in the plant genus Arabidopsis are of high interest in plant morphology studies, owing to the facts that 1) its genome was fully sequenced in 1996, relatively early (Kaul et al. 2000), and 2) its structure makes it relatively easy to capture images of the area of active cell division: the shoot apical meristem (SAM). Cell-scale interaction of microtubules (as described in Chapter 7) and microtubule bundles are thought to influence the orientation of cell division planes in the SAM, and thereby influence the morphology of the organism as a whole. However, this connection is not yet entirely well-understood. Recent work (Dumais 2009; Schaefer et al. 2017) has found that mutations which decrease the level of expression of genes TRM6,TRM7,and TRM8 inhibit the formation of the preprophase band (PPB). Arabidopsis mutants with inhibited PPB formation demonstrate more variance in the placement of new cell walls during cell division.

Dataset

The dataset used in this analysis was prepared as follows:

Two varieties of Arabidopsis (wild type as well as “trm678 mutants”: mutants with decreased expression of all three of TRM6,TRM7,and TRM8) were planted and kept in the short-day condition (8 hours of light, 16 hours of dark) for 6 weeks.
Plants were transferred to long-day conditions and kept there until the SAM had formed. This took two weeks for wild-type plants and three weeks for TRM678 mutants.
The SAM of each pant was then was then dissected and observed with a confocal microscope, Leica SP8 upright scanning confocal microscope equipped with a water immersion objective (HC FLUOTAR L 25x/0.95 W VISIR).
This resulted in 3D images of the SAM from above (e.g. perpendicular to the plane of cell division) collected for both types of specimens.
Each 3D image was converted to a 2D image showing only the cell wall of the top layer of cells in the SAM.
Each image was segmented into individual cells, by finding connected components separated by cell walls.
Each segmented image was used to produce several graphs: a cell and its 63 nearest neighbors were taken as vertices (so that each graph had 64 vertices total). Vertices were connected if their corresponding cells shared a cell wall. We will refer to the resulting graph as a “morphological graph". Only cells which were within 128 pixels of the center of the image were used, to ensure that each morphological graph was only composed of cells in the central region of the SAM. This is important because cell morphology is radically different at the boundary between the SAM and the surrounding primordia. See Figure [fig:cell_images] for examples of cell segmentations and extracted graphs.

This procedure produced a population of 1062 morphological graphs of 64 vertices each: 513 graphs coming from wild-type cell images, and 549 graphs coming from trm678 mutants. See Figure [fig:cell_lines] for examples. Graph Diffusion Distance was used to compute a $1062 \times 1062$ distance matrix comparing each pair of graphs. Figure [fig:cell_line_embeddings] shows an embedding of these distances, produced by plotting each point according to its projection onto the first two principal components of the distance matrix. This technique, multidimensional scaling (MDS), produces a 2D embedding such that inter-point distances in the embedding are as close as possible to those in the original distance matrix. We see that GDD appears to detect a trend: the embedding shows a clear clustering of each type of cell with cells of the same type.

Left: an example of a segmentation of a TRM678 mutant shoot apical meristem. Right: the same, but for a wild-type SAM. Both images are false-colored, with random colors per cell. Additionally each image also has an example morphological graph overlaid, with edges colored in white.

[fig:cell_images]

A distance-preserving embedding in 2D of diffusion distances between morphological graphs. Each point represents one morphological graph, and graphs are colored according to origin. This embedding reveals that the typical distance between morphological graphs of the same type is small, and the typical distance between morphological graphs of different types is large.

[fig:cell_line_embeddings]

Introduction

Graph Diffusion Distance (GDD) is a measure of similarity between graphs originally introduced by Hammond et. al (Hammond, Gur, and Johnson 2013) and greatly expanded by Scott et. al (C. Scott and Mjolsness 2021). This metric measures the similarity of two graphs by comparing their respective spectra. However, it is well-known that there exist pairs of cospectral graphs which are not isomorphic but have identical spectra. Furthermore, because even a small change to entries of a matrix may change its eigenvalues, another limitation of GDD is that it is sensitive to small changes in the topology of the graph (as well as small variations in edge weights). Finally, since GDD does not make use of edge or node attributes, it cannot distinguish between two different signals on the same source graph, diminishing its applicability in data science. In this work, we provide several generalizations to GDD which resolve these issues and make it a powerful machine learning tool for graphs.

Graph Diffusion Distance

We use the definition of Graph Diffusion Distance (GDD) first given by Hammond et. al and later expanded (to cover differently-sized graphs) by Scott et. al. Given two graphs $G_1$ and $G_2$ , let $L_1$ and $L_2$ be their respective graph Laplacians. Furthermore, let $L_i = U_i \Lambda_i U_i^T$ be the diagonalizations of each Laplacian, so that $\Lambda_i$ is a diagonal matrix whose $j$ th diagonal entry, $\lambda^{(i)}_j$ , is the $j$ th eigenvalue of $L_i$ . Then the graph diffusion distance between these graphs is given by $\begin{aligned} \label{eqn:diff_dist_defn} D(G_1,G_2) &= \sup_t {\left| \left| e^{t L_1} - e ^ {t L_2} \right| \right|}_F = \sup_t {\left| \left| e^{t \Lambda_1} - e^{t \Lambda_2} \right| \right|}_F = \sup_t \sqrt{\sum_{j=1}^n {\left( e^{t \lambda^{(1)}_j} - e^{t \lambda^{(2)}_j} \right)}^2 } .\end{aligned}$ This simplification relies on several properties of the Frobenius norm and the exponential map (rotation invariance and continuity, respectively) which we shall not detail here. It is clear that this distance measure requires the two graphs to be the same size, since otherwise this matrix difference is not defined.

GDD is a differentiable function of $t$ and edge weights

Once all of the eigenvalues $\lambda_i$ and eigenvectors $v_i$ (of a matrix $L$ ) are computed, we may backpropagate through the eigendecomposition as described in (Nelson 1976) and (Andrew, Chu, and Lancaster 1993). If our edge weights (and therefore the values in the Laplacian matrix $L$ ) are parametrized by some value $\theta$ , and our loss function $\mathcal{L}$ is dependent on the eigenvalues of $L$ , then we can collect the gradient $\frac{\partial \mathcal{L}}{\partial \theta}$ as: $\label{eqn:derivative_eigen} \frac{\partial \mathcal{L}}{\partial \theta} = \sum_k \left( \frac{\partial \mathcal{L}}{\partial \lambda_k} \frac{\partial \lambda_k}{\partial \theta} \right) = \sum_k \left( \frac{\partial \mathcal{L}}{\partial \lambda_k} v_k^T \frac{\partial L}{\partial \theta} v_k \right).$ In practice, if the entries of $L$ are computed as a function of $\theta$ using an automatic differentiation package (such as PyTorch (Paszke et al. 2019)) the gradient matrix $\frac{\partial L}{\partial \theta}$ is already known before eigendecomposition. We note here that for any fixed value of $t$ , all of the operations needed to compute GDD are either simple linear algebra or continuous or both. Therefore, for any loss function $\mathcal{L}$ which takes the GDD between two graphs as input, we may optimize $\mathcal{L}$ by backpropagation through the calculation of GDD using Equation [eqn:derivative_eigen].

Weighted Diffusion Distance

We make two main changes to GDD to make it capable of being tuned to specific graph data. First, we replace the real-valued optimization over $t$ with a maximum over an explicit list of $t$ values ${t_1, t_2, \ldots t_p}$ . This removes the need for an optimization step inside the GDD calculation. Second, we re-weight the Frobenius norm in the GDD calculation with a vector of weights $\beta_j$ which is the same length as the list of eigenvalues (these weights are normalized to sum to 1). The resulting GDD calculation is then: $\label{eqn:diff_dist_tunable_v1} D(G_1,G_2) = \max_{t \in t_1, t_2, \ldots t_p} \sqrt{\sum_{j=1}^n {\beta_j \left(e^{t \lambda^{(1)}_j} - e^{t \lambda^{(2)}_j} \right)}^2 } .$ This distance calculation may then be explicitly included in the computation graph (e.g. in PyTorch) of a machine learning model, without needing to invoke some external optimizer to find the supremum over all $t$ . $t_n$ and $\beta_j$ may be tuned by gradient descent or some other optimization algorithm to minimize a loss function which takes $d$ as input. Tuning the $t_n$ values results in a list of values of $t$ for which GDD is most informative for a given dataset, while tuning $\beta_j$ reweights GDD to pay most attention to the eigenvalues which are most discriminative. In the experiments in Section [sec:numeric] we demonstrate the efficacy of tuning these parameters using contrastive loss.

Learning Edge Weighting Functions

Here, we note that if graph edge weights are determined by some function $f$ parametrized by $\theta$ , we may still apply all of the machinery of Sections 8.4.1 and 8.4.2. A common edge weighting function for graphs embedded in Euclidean space is the Gaussian Distance Kernel, $w_{ij} = \exp \left( \frac{-1}{2\sigma^2} d_{ij}\right)$ , where $d_{ij}$ is the distance between nodes $i$ and $j$ in the embedding. $\sigma$ is the ‘radius’ of the distance kernel and can be tuned in the same way as $\beta$ and $t$ . In cases like the data discussed in this section, our edge weights are vector-valued, and it is therefore advantageous to replace this hand-picked edge weight with weights chosen by a general function approximator, e.g. an artificial neural network (Fukushima and Miyake 1982). As before, the parameters of this ANN could be tuned using gradients backpropagated through the GDD calculation and eigendecomposition. Example weights learned by an ANN, trained with the constrastive loss function, can be seen in Figure [fig:learned_edge_weights].

A neural network model learns edge weights which distinguish two classes of graphs. Each row shows the weight values assigned by the network at different times during the training process, from pre-training (far left) to convergence (right). The top row represents a patch of wild-type Arabidopsis cells, and the bottom row represents mutants. The pictured edge weights cause these two categories of graph to have distinct spectra. — A neural network model learns edge weights which distinguish two classes of graphs. Each row shows the weight values assigned by the network at different times during the training process, from pre-training (far left) to convergence (right). The top row represents a patch of wild-type *Arabidopsis* cells, and the bottom row represents mutants. The pictured edge weights cause these two categories of graph to have distinct spectra.

[fig:learned_edge_weights]

Numeric Experiments

Data Description

In this brief section we describe the two datasets used in preparation of this manuscript. For more details, please see the README accompanying each dataset.

Skeletonized MNIST

As another comparison, we try our method on graphs representing skeletonized images from the MNIST dataset. 300 MNIST digits from each category (3000 total) were selected at random. Each MNIST digit was converted into a skeleton using the skeletonize method from sklearn(Pedregosa et al. 2011), with default parameters. Each skeleton was then converted into the nearest-neighbors graph of live pixels. Finally, each graph was reduced to 32 nodes by randomly contracting neighboring nodes (excluding nodes which were endpoints or junctions). A vector-valued edge label was saved for each edge, including the angle of that edge from horizontal and the length of the edge. The dataset used in this section was prepared as follows:

Two varieties of Arabidopsis (wild type as well as “trm678 mutants”: mutants with decreased expression of all three of TRM6,TRM7,and TRM8) were planted and kept in the short-day condition (8 hours of light, 16 hours of dark) for 6 weeks.
Plants were transferred to long-day conditions and kept there until the SAM had formed. This took two weeks for wild-type plants and three weeks for TRM678 mutants.
The SAM of each pant was then was then dissected and observed with a confocal microscope, Leica SP8 upright scanning confocal microscope equipped with a water immersion objective (HC FLUOTAR L 25x/0.95 W VISIR).
This resulted in 3D images of the SAM from above (e.g. perpendicular to the plane of cell division) collected for both types of specimens.
Each 3D image was converted to a 2D image showing only the cell wall of the top layer of cells in the SAM.
Each image was segmented into individual cells, by finding connected components separated by cell walls.
Each segmented image was used to produce several graphs: a cell and its 63 nearest neighbors were taken as vertices (so that each graph had 64 vertices total). Vertices were connected if their corresponding cells shared a cell wall. We will refer to the resulting graph as a “morphological graph". Only cells which were within 128 pixels of the center of the image were used, to ensure that each morphological graph was only composed of cells in the central region of the SAM. This is important because cell morphology is radically different at the boundary between the SAM and the surrounding primordia. See Figure [fig:cell_images] for examples of cell segmentations and extracted graphs.

This procedure produced a population of 580 morphological graphs of 64 vertices each: 290 graphs coming from wild-type cell images, and 290 graphs coming from trm678 mutants. See Figure [fig:cell_lines] for example segmented images and an example graph. [sec:numeric] We test each of the GDD generalizations proposed, on each dataset. Both datasets were split 85/15 % train/validation; All metrics we report are calculated on the validation set. For each dataset, we compare the following four methods: 1) GDD on unweighted graphs, with no tuning of $t$ or other parameters; 2) Gaussian kernel edge weights (fixed $\sigma$ ), with $t$ and $\beta_i$ tuned; 3) Gaussian kernel edge weights, with $t$ , $\sigma$ , and $\beta_i$ 4) General edge weights parametrized by a small neural network. For methods 2 and 3, the input to the distance kernel was the distance between nodes in the original image. All parameters were tuned using ADAMOpt (Kingma and Ba 2014) (with default PyTorch hyperparameters and batch size 256) to minimize the contrastive loss function (Hadsell, Chopra, and LeCun 2006). Training took 200 epochs. For the neural network approach, edge weights were chosen as the final output of a neural network with three layers of sizes $\{32,128,1\}$ with SiLU activations on the first two layers and no activation function on the last layer. Results of these experiments can be found in Table [tab:valid_results_morpho], and distance matrices (along with a distance-preserving embedding (Cox and Cox 2008) of all points in the dataset) for the cell morphology dataset can be seen in Figure [fig:embedding_of_graphs]. The distance matrices developed using the ANN approach clearly show better separation between the two categories. The validation accuracy on the cell morphology dataset is best for the ANN method.

[fig:embedding_of_graphs]

Method	Validation Accuracy % (morpho)
GDD only	77.7
$t$ -tuning and $\beta$ -weights	85.0
$t$ and $\sigma$ -tuning, $\beta$ -weights	85.5
ANN Parametrization	98.3

[tab:valid_results_morpho]

Conclusion and Future Work

This section presents two applications of GDD to the classification of graphs embedded in 2D and 3D. In both, we demonstrate that GDD (and a parametrized variant) produce clusters which draw out structural differences in our dataset(s). Additionally, we demonstrate a method to compute distance metrics between edge-labelled graphs, in such a way as to respect class labels. This approach is flexible and can be implemented entirely in PyTorch, making it possible to learn a distance metric between graphs that were previously not able to be discriminated by Graph Diffusion Distance. In the future we hope to apply this method to more heterogenous graph datasets by including the varying-size version of GDD. We also note here that our neural network approach, as described, is not a Graph Neural Network in the sense described by prior works like (Kipf and Welling 2016; Bacciu et al. 2020), as there is no message-passing step. We expect message-passing layers to directly improve these results and hope to include them in a future version of differentiable GDD.