I’m fitting a large self-exciting point process model (preprint here). The data are the location and times of thousands of events (crimes), and each event temporarily increases the risk of more events in its vicinity. The risk decays exponentially in time and like a Gaussian in space.
Unfortunately the log-likelihood for point processes is pretty hairy: the model specifies an intensity function lambda(s, t), the rate of events at every point in time and space, and the log-likelihood is (eq 4 in the paper)
sum(lambda(s_i, t_i)) + \int lambda(s,t) ds dt
And since lambda(s, t) involves a sum over all previous events (eq 5), the log-likelihood is a double sum that’s O(n^2) in the number of events.
I implemented maximum likelihood estimation using EM and a bunch of custom parallelized code to speed up that sum. Now I’m trying to implement the model in Stan. I increment target
directly using the sum:
transformed parameters {
// intensity experienced by each event
vector[n] tents;
real lp;
tents = exp(event_covariates * beta);
for (i in 2:n) {
tents[i] += ((theta / lengthscaleT) *
dot_product(exp(timeD[i, 1:(i - 1)] / lengthscaleT),
gaussian(spaceD2[i, 1:(i - 1)], lengthscaleS)));
}
// full log-likelihood
lp = sum(log(tents)) - time_window * dot_product(cell_areas, exp(cell_covariates * beta)) +
theta * (sum(exp(timeD2 / lengthscaleT)) - n);
}
model {
target += lp;
lengthscaleS ~ uniform(0, 1000);
// Gamma(shape, rate), so mean is shape/rate, variance shape/rate^2
// measured in seconds
lengthscaleT ~ gamma(1, 1.0 / (60.0 * 60 * 24 * 60));
theta ~ uniform(0, 1);
beta[1] ~ normal(-30, 5);
for (i in 2:p) {
beta[i] ~ normal(0, 5);
}
}
(complete model code is below)
There are a bunch of equivalent ways to write that double sum – nested for
loops, rearrange the arithmetic expressions – but I’m new to Stan and don’t have any intuition as to which way is best. Is there an obviously better way to write the loop to make it most efficient for Stan?
(Right now, for reasonable datasets of about 2000 events, 2000 draws take about 10 hours on a fast machine. I can implement the method with naive Metropolis-Hastings and my custom parallelized code and draw in 30 minutes, but with much worse mixing.)
Full model code:
// Adapted from https://github.com/flaxter/gunshot-contagion
// by Charles Loeffler and Seth Flaxman
functions {
row_vector gaussian(row_vector dist2s, real lengthscale) {
return(exp(-0.5 * dist2s / lengthscale^2) / (2 * pi() * lengthscale^2));
}
}
data {
// Total number of events
int<lower=1> n;
// Total number of covariates
int<lower=1> p;
// Total number of covariate cells
int<lower=1> C;
// Event coordinates (in feet)
matrix[n,2] Space;
// Event times (in seconds)
// NB: these must be sorted from smallest to largest!
vector[n] Time;
// Total length of time observed
real time_window;
// Covariate value at each event
matrix[n,p] event_covariates;
// Covariate values in each covariate cell
matrix[C,p] cell_covariates;
// Total land area of each covariate cell
vector[C] cell_areas;
}
transformed data {
// Time elapsed between each event and predecessors
matrix[n,n] timeD;
// Time between each event and the end of the observation period
vector[n] timeD2;
// Squared Euclidean distance between each event
matrix[n,n] spaceD2;
for (i in 1:n) {
for (j in 1:n) {
timeD[i,j] = -(Time[i] - Time[j]);
spaceD2[i,j] = distance(Space[i], Space[j])^2;
}
timeD2[i] = -(time_window - Time[i]);
}
}
parameters {
// Spatial decay distance (sigma, feet)
real<lower=0> lengthscaleS;
// Temporal decay time (omega, seconds)
real<lower=0> lengthscaleT;
// Self-excitation amount
real<lower=0> theta;
// Background covariate coefficients
vector[p] beta;
}
transformed parameters {
// intensity experienced by each event
vector[n] tents;
real lp;
tents = exp(event_covariates * beta);
for (i in 2:n) {
tents[i] += ((theta / lengthscaleT) *
dot_product(exp(timeD[i, 1:(i - 1)] / lengthscaleT),
gaussian(spaceD2[i, 1:(i - 1)], lengthscaleS)));
}
// full log-likelihood
lp = sum(log(tents)) - time_window * dot_product(cell_areas, exp(cell_covariates * beta)) +
theta * (sum(exp(timeD2 / lengthscaleT)) - n);
}
model {
target += lp;
lengthscaleS ~ uniform(0, 1000);
// Gamma(shape, rate), so mean is shape/rate, variance shape/rate^2
// measured in seconds
lengthscaleT ~ gamma(1, 1.0 / (60.0 * 60 * 24 * 60));
theta ~ uniform(0, 1);
beta[1] ~ normal(-30, 5);
for (i in 2:p) {
beta[i] ~ normal(0, 5);
}
}
generated quantities {
real lengthscale_days;
lengthscale_days = lengthscaleT / 60 / 60 / 24;
}