Hey all,
I’m trying to fit a model to a daily time series. Yearly seasonality is strikingly visible in the data so I need to make sure that’s represented, but my attempts have yielded unacceptably slow sampling. I’ve tried dramatically simplifying the model by removing other components (e.g., a stochastic slope and a multilevel component) without success. The simplest model I can come up with (shown below) takes several hours to run 500 iterations.
I’d be grateful if anyone could take a look at my code and see how I might speed things up.
I’m representing yearly seasonality using a for-loop, which was an attempt to recreate an equation \gamma_t \text{ ~ } \mathcal{N}(\sum_{j=1}^{s-1} \gamma_{t-j},\sigma_{seas}) shown on page 10 of these slides that I’ve also seen elsewhere:
data {
int<lower=1> n; // n = 2632
int y[n]; // number of shootings per day
}
parameters {
vector[n] mu;
vector[n] y_seasonal; // yearly seasonality
real<lower=0> sigma_level;
real<lower=0> sigma_yday;
}
transformed parameters {
vector[n] yhat;
for(t in 1:n) {
yhat[t] = mu[t];
}
}
model {
real draw_from;
for (t in 365:n){
draw_from = 0;
for (j in 1:364){
draw_from = draw_from - y_seasonal[t - j];
}
y_seasonal[t] ~ normal(draw_from, sigma_yday);
}
sigma_yday ~ inv_gamma(.001,.001);
for(t in 2:n)
mu[t] ~ normal(mu[t-1], sigma_level);
for(t in 1:n)
y[t] ~ poisson_log(yhat[t] + y_seasonal[t]);
}
Here is how to get the data I’m working with and put it into a list that stan can read (using R):
library(tidyverse)
# Download dataset containing all crimes observed in Baltimore since 2012
# There is one row for each incident
bpd <- read_csv("https://raw.githubusercontent.com/peterphalen/ceasefire/master/BPD_Part_1_Victim_Based_Crime_Data.csv")
# subset to shootings or homicides with a firearm
bpd <- subset(bpd, Description == "SHOOTING" |
(Description == "HOMICIDE" & Weapon == "FIREARM"))
bpd$CrimeDate <- as.Date(bpd$CrimeDate, format = "%m/%d/%Y")
# there are many crimes per day. collapse to daily counts
daily <- bpd %>% group_by(CrimeDate) %>% summarise(shootings = n())
# fill missing dates, because some had no shootings
full.ts <- data.frame(CrimeDate = seq(daily$CrimeDate[1],
daily$CrimeDate[nrow(daily)],
by="day"))
daily <- full_join(full.ts,daily)
daily <- daily %>%
group_by(CrimeDate) %>%
mutate_all(funs(ifelse(is.na(.),0,.)))
stan_data <- list(
y <- daily$shootings, # roughly poisson distributed
n <- nrow(daily) # n = 2632
)
Running ~50 iterations of this model takes about 20 minutes on my machine :
fit <- stan(model_code=model_code, data=stan_data,
cores=2, chains=2, iter = 50, control=list(adapt_delta=.95))
FWIW, this simple model is not my end goal. I want to be able to add in an additional (weekly) seasonal component, a stochastic slope, and a marker for an intervention called Baltimore Ceasefire 365. I’m dreading the prospect of fitting this model multiple times with increasing complexity.
Anyone have any thoughts/ideas on how to speed this up? I’d be very grateful.
Thank you!