Statistical Models for Event-based Social Network Data

Christopher DuBois, Statistics @ UC Irvine

Carter Butts, Sociology @ UC Irvine

Daniel McFarland, Sociology @ Stanford

Padhraic Smyth, Computer Science @ UC Irvine

Event-based network data

Example: Interactions among students

Communication in small groups

Modeling goals

Dynamics of interaction in small groups
- Differential propensity to act
Investigate association with known covariates
- regarding the individual
- previous history of events
- context

The problem of interest for today's talk involves a set of entities and events among them occurring over time. Each of these events can be called edges or relational events. One example we will return to throughout today's talk will be one where the nodes represent students in a particular classroom and the edges represent direction acts of communication. The individuals might be distinguished by some covariate, such as age, as represented by the different colors here. One of the most fundamental goals is to characterize the differential propensity to interact, and better understandhow these differences are associated with known covariates. For example, we might want to ask whether interactions between same-gender students are more likely and, furthermore, how this varies by context. To answer these types of questions, today we will discuss a statistical model for interaction in small groups of people. ----- The problem of interest for today's talk involves a set of entities and events among them occurring over time. Each event represents some interaction directed from one entity to another. I might refer to these entities as nodes, vertices, or individuals. Each of these events can be called edges or relational events. One example we will return to throughout today's talk will be one where the nodes represent students in a particular classroom and the edges represent direction acts of communication. The individuals might be distinguished by some covariate, such as age, as represented by the different colors here. This type of interaction data is of particular interest to Sociologists who study social networks and the dynamics of small groups. One of the most fundamental goals is to characterize the differential propensity to interact, and better understandhow these differences are associated with known covariates. For example, we might want to ask whether interactions between same-gender students are more likely and, furthermore, how this varies by context. To answer these types of questions, today we will discuss a statistical model for interaction in small groups of people. I will be talking about some of the challenges that arise and how we propose to solve them.

Outline

Previous work: Models for network event data
- Background
- Relational event framework
- Simulated example
Contribution: Hierarchical models for multiple sequences of events
Application: High school classroom dynamics
Future directions

Dynamic network data

"Static" analysis

Aggregate over time and analyze the weighted network

(Holland, JASA 1981) (Feinberg, JASA 1985)

Dynamic network data

"Snapshot" analysis

Time 1

$p(Y_1 | \theta_1)$

Time 2

$p(Y_2 | \theta_2)$

Time t

$p(Y_t | \theta_t)$

Dynamic network data

"Snapshot" analysis: Questions

How do we choose the resolution for aggregating?
A network for each day? A network for each hour?
What happens if the behavior of interest is at a smaller time scale?

Dynamic network data

Event-level analysis

$N$ people: $i,j \in \{1,\ldots,N\}$
Each interaction is an edge with a timestamp: $(i,j,t)$
Observe a sequence of edges: $\{(i_k,j_k,t_k), k\in [1,M]\}$
Often edges can occur more than once
Can benefit from thinking of each edge as a process with some rate

Modeling rates of events

rate of interaction $(i,j)$ at time $t$

vector of covariates about $(i,j)$ at time $t$

Model rate of each edge using previous history

Modeling rates of events

Assume event rates change only when events occur.

A relational event model

Rate of k^th observed event

A relational event model

Represents the fact that no event occurred between event k-1 and event k

A relational event model

For more info see (Butts, 2008).

Simulation

Model specification

$\lambda_{ij}(t) = \exp\{\beta'X_{ij}(t)\}$

Simulation

Model specification

$\log \lambda_{ij}(t) = \beta_0 + \beta_1 X_{ij1} + \beta_2 X_{ij2} + \beta_3 X_{ij3} + \beta_4 X_{ij4} + \beta_5 X_{ij5} + \beta_6 X_{ij6}$

Simulation

Model specification

$\log \lambda_{ij}(t) = \beta_0 + $ $\beta_1 X_{ij1}$$ + \beta_2 X_{ij2} + \beta_3 X_{ij3} + \beta_4 X_{ij4} + \beta_5 X_{ij5} + \beta_6 X_{ij6}$

Simulation

Model specification

$\log \lambda_{ij}(t) = \beta_0 + \beta_1 X_{ij1} + $$\ \beta_2 X_{ij2}$$ + \beta_3 X_{ij3} + \beta_4 X_{ij4} + \beta_5 X_{ij5} + \beta_6 X_{ij6}$

Simulation

Model specification

$\log \lambda_{ij}(t) = \beta_0 + \beta_1 X_{ij1} + \beta_2 X_{ij2} +$ $\ \beta_3 X_{ij3}$$ + \beta_4 X_{ij4} + \beta_5 X_{ij5} + \beta_6 X_{ij6}$

Simulation

Model specification: Participation shifts

$\log \lambda_{ij}(t) = \beta_0 + \beta_1 X_{ij1} + \beta_2 X_{ij2} + \beta_3 X_{ij3} + \beta_4 X_{ij4} + \beta_5 X_{ij5} + \beta_6 X_{ij6}$

Additive effects that model turn-taking in conversation (Gibson, 2003)

Simulation

Model specification: $(j,i)$ to $(i,j)$ effect

$\log \lambda_{ij}(t) = \beta_0 + \beta_1 X_{ij1} + \beta_2 X_{ij2} + \beta_3 X_{ij3} + $ $\ \ \beta_4 X_{ij4}$$ + \beta_5 X_{ij5} + \beta_6 X_{ij6}$

E.g. $X_{ij4}$ immediately after event (12,15)

Simulation

Model specification: $(k,i)$ to $(i,j)$ effect

$\log \lambda_{ij}(t) = \beta_0 + \beta_1 X_{ij1} + \beta_2 X_{ij2} + \beta_3 X_{ij3} + \beta_4 X_{ij4} + $ $\ \ \beta_5 X_{ij5}$$ + \beta_6 X_{ij6}$

E.g. $X_{ij5}$ immediately after event (12,15)

Simulation

Model specification: $(i,k)$ to $(i,j)$ effect

$\log \lambda_{ij}(t) = \beta_0 + \beta_1 X_{ij1} + \beta_2 X_{ij2} + \beta_3 X_{ij3} + \beta_4 X_{ij4} + \beta_5 X_{ij5} + $ $\ \ \beta_6 X_{ij6}$

E.g. $X_{ij6}$ immediately after event (12,15)

Simulation

Model specification

$\log \lambda_{ij}(t) = \beta_0 + \beta_1 X_{ij1} + \beta_2 X_{ij2} + \beta_3 X_{ij3} + \beta_4 X_{ij4} + \beta_5 X_{ij5} + \beta_6 X_{ij6}$

E.g. Entire matrix $\lambda$ immediately after event (12,15)

Simulation

Model specification

$\lambda_{ij}(t_k)$

$(i_k,j_k)$

Outline

Previous work: Models for event data in networks
Contribution: Hierarchical models for multiple sequences of events
- Motivation and specification of our hierarchical model
- Prediction experiments
- MCMC with parallel tempering
Application: High school classroom dynamics
Future directions

Possible issues in estimation

Some event sequences may have few events
Effects may have few relevant events

Multiple sequences

May have a collection of event sequences
Problem: Model applies only to a single sequence

Hierarchical event models

Proposed Contribution

Hierarchical Bayesian model for multiple sequences
Principled way of sharing information across sequences
Perform population-level inferences
Understand sequence-level variation
Stabilize inference for poorly-constrained parameters

Hierarchical event models

Hierarchical simulation

Parameters are Gaussians drawn from upper level

For each classroom $j$ and effect $p$:

$\begin{align}Y_j \sim& \mbox{REM}(\theta_j,\mathbf{X}_j) \\ \theta_{jp} \sim& \mbox{Normal}(\mu_p,\sigma_p^2) \\ \mu_p, \sigma_p \propto & 1/\sigma_p \end{align}$

Simulated data:

# individuals: 15
# sequences: 20
$\mu_p$ as in original example
$\sigma_p = .5$

Inference

Optimizing the posterior

$\begin{align} p(\theta,\mu,\sigma | \mathbf{Y}, \mathbf{X}) \propto & \prod_{j=1}^J \ p(Y_j|\theta_j,\mathbf{X}_j) \prod_{p=1}^P \ p(\theta_{jp}|\mu_p,\sigma_p) p(\mu_p,\sigma_p)\end{align}$

Hierarchical simulation

Learning back known parameters

Hierarchical event models

Predicting the next event

At each time $t$, rank $\lambda_{ij}(t)$

Precision @ k: Proportion of the next events that were ranked higher than k

Hierarchical event models

Predicting the next event

Inference

Markov chain Monte Carlo

Metropolis-Hastings: construct a Markov chain whose stationary distribution is the posterior distribution of our parameters
MH can get stuck easily: Peaked modes, some parameters might be at the boundary, and some $\theta_{jp}$ are more variable than others.
Finding a good proposal distribution is tricky

Inference

Parallel tempering

Inference

Parallel tempering

Inference

Parallel tempering: Sampling

Unnormalized distribution of interest: $g(\Theta)$

$$ \begin{align} \pi(\Theta_1,\ldots,\Theta_J) \propto& \prod_{j=1}^J h_j(\Theta_j) \\ h_j(\Theta_j) \propto & \exp \{-g(\Theta_j)/t_j \} \end{align} $$ where $t_0 < \cdots < t_J$ and $h_j(\Theta_j)$ is the target distribution for chain $j$.

Inference

Parallel tempering: Swapping

Swap between chains $j$ and $k$ at iteration $t$ with acceptance probability: $$\begin{align} \min \left\{ 1, \frac{h_j(\Theta_{k}^{(t)})h_{k}(\Theta_{j}^{(t)})}{h_j(\Theta_j^{(t)})h_{k}(\Theta_{k}^{(t)})} \right\} \end{align} $$

(Geyer 1991), (Madras 2003)

Outline

Previous work: Models for event data in networks
Contribution: Hierarchical models for multiple sequences of events
Application: High school classroom dynamics
- Background
- Parameter estimates
- Model selection via DIC
- Model checking via the posterior predictive
Future directions

Application

High school classroom data

Collected via participant observation (McFarland, 2001)
650 classroom sessions
Covariates about the class: subject, teacher, etc.
Covariates about the individuals: race, extracurriculars, etc.

Application