Structural time series with seasonality, sampling very slowly

Hey all,

I’m trying to fit a model to a daily time series. Yearly seasonality is strikingly visible in the data so I need to make sure that’s represented, but my attempts have yielded unacceptably slow sampling. I’ve tried dramatically simplifying the model by removing other components (e.g., a stochastic slope and a multilevel component) without success. The simplest model I can come up with (shown below) takes several hours to run 500 iterations.

I’d be grateful if anyone could take a look at my code and see how I might speed things up.


I’m representing yearly seasonality using a for-loop, which was an attempt to recreate an equation \gamma_t \text{ ~ } \mathcal{N}(\sum_{j=1}^{s-1} \gamma_{t-j},\sigma_{seas}) shown on page 10 of these slides that I’ve also seen elsewhere:

data {
  int<lower=1> n;   // n = 2632
  int y[n]; // number of shootings per day
}
parameters {
  vector[n] mu;
  vector[n] y_seasonal; // yearly seasonality
  real<lower=0> sigma_level;
  real<lower=0> sigma_yday;
}
transformed parameters {
  vector[n] yhat;
  for(t in 1:n) {
    yhat[t] = mu[t];
  }
}
model {
  real draw_from;
  
  for (t in 365:n){
    draw_from = 0;
    for (j in 1:364){
      draw_from = draw_from - y_seasonal[t - j];
    }
    y_seasonal[t] ~ normal(draw_from, sigma_yday);
  }
  
  sigma_yday ~ inv_gamma(.001,.001);

  for(t in 2:n)
    mu[t] ~ normal(mu[t-1], sigma_level);
  for(t in 1:n)
    y[t] ~ poisson_log(yhat[t] + y_seasonal[t]);
} 

Here is how to get the data I’m working with and put it into a list that stan can read (using R):

library(tidyverse)

# Download dataset containing all crimes observed in Baltimore since 2012
# There is one row for each incident
bpd <- read_csv("https://raw.githubusercontent.com/peterphalen/ceasefire/master/BPD_Part_1_Victim_Based_Crime_Data.csv")

# subset to shootings or homicides with a firearm
bpd <- subset(bpd, Description == "SHOOTING" |
                (Description == "HOMICIDE" & Weapon == "FIREARM"))

bpd$CrimeDate <- as.Date(bpd$CrimeDate, format = "%m/%d/%Y")

# there are many crimes per day. collapse to daily counts
daily <- bpd %>% group_by(CrimeDate) %>% summarise(shootings = n())

# fill missing dates, because some had no shootings
full.ts <- data.frame(CrimeDate = seq(daily$CrimeDate[1], 
                                      daily$CrimeDate[nrow(daily)], 
                                      by="day"))
daily <- full_join(full.ts,daily)
daily <- daily %>% 
    group_by(CrimeDate) %>% 
    mutate_all(funs(ifelse(is.na(.),0,.)))

stan_data <- list(
                  y <- daily$shootings, # roughly poisson distributed
                  n <- nrow(daily) # n = 2632
                  )

Running ~50 iterations of this model takes about 20 minutes on my machine :

fit <- stan(model_code=model_code, data=stan_data, 
cores=2, chains=2, iter = 50, control=list(adapt_delta=.95))

FWIW, this simple model is not my end goal. I want to be able to add in an additional (weekly) seasonal component, a stochastic slope, and a marker for an intervention called Baltimore Ceasefire 365. I’m dreading the prospect of fitting this model multiple times with increasing complexity.

Anyone have any thoughts/ideas on how to speed this up? I’d be very grateful.

Thank you!

Hmm. Looks like replacing this:

  real draw_from;
  for (t in 365:n){
    draw_from = 0;
    for (j in 1:364){
      draw_from = draw_from - y_seasonal[t - j];
    }
    y_seasonal[t] ~ normal(draw_from, sigma_yday);
  }

With the following, which I think is equivalent, makes things sample almost twice as fast!

  for (t in 365:n){
    y_seasonal[t] ~ normal(-sum(y_seasonal[(t-364):(t-1)]), sigma_yday);
  }

Still very slow though…

Vectorization is key in Stan. The last two for loops should be vectorized as well…I think that can be done with a bit of index juggling. If you get that done, then it will be a good bit faster.

1 Like

Take a look at https://mc-stan.org/docs/2_19/stan-users-guide/parameterizing-centered-vectors.html for the seasonal term, if it needs to be zero-mean.

It is bed time here and I don’t really understand the model, but probably the double loop (or its equivalent), quadratic in time, is unnecessary and killing the sampling.

1 Like

Thank you! Interestingly, the seasonal component as written does already come out zero-sum/mean. I don’t understand why, to be honest, but that’s the consistent result.

Thank you @wds15 . I’ll try to think about how to eliminate other for-loops. I definitely need to optimize further–I just added a weekly seasonal component and I’m back up to a 12-hour run time…

This

for(t in 2:n) mu[t] ~ normal(mu[t-1], sigma_level);

Can be written as

mu[2:n] ~ normal(mu[1:(n-1)], sigma_level)

And this one

for(t in 1:n) y[t] ~ poisson_log(yhat[t] + y_seasonal[t]);

As

y ~ poisson_log(yhat + y_seasonal);

Much easier to read and a lot faster.

However, as said here already…reparametrizing the model is another very useful way to improve performance. You want to let Stan sample zero mean unit normal densities . So with some shift and scale tricks you get very far. Also, you should avoid duplicating mu as I recall. Not sure why that was done, but I may have overlooked something.

1 Like

With control=list(max_treedepth=16) the model below runs 2000 iterations (incl. warmup) in 577 seconds and seems to give sensible results. (But it may not be quite equivalent to the original model which I don’t understand.) There is an interesting jump in shootings around index 1200.

In my latest run I had quite many divergent transitions, but didn’t have adapt_delta changed from the default. The variance parameters (sigmas) also have lots of autocorrelation, so you need to run a longer chain. (I don’t see an obvious funnel behaviour. If there is, you need to reparametrize by using standardised autocorrelative series and then multiplying by sigma.)

It may make sense to aggregate more, to a weekly or month level for example, if fine-grained temporal effects are not interesting. You would then have more evidence per observation, in a sense less autocorrelation in parameters (over time points of weak evidence) and shorter arrays, so the model would be faster to estimate.

data {
  int<lower=1> n;   // n = 2632
  int y[n]; // number of shootings per day
}
transformed data {
  int<lower=1,upper=365> yday[n];
  for (i in 1:n) yday[i] = i % 365 + 1;
}
parameters {
  vector[n] mu;
  vector[364] y_seasonal_short; // yearly seasonality minus last day
  real<lower=0> sigma_level;
  real<lower=0> sigma_yday;
}
transformed parameters {
  // zero-mean seasonal term
  vector[365] y_seasonal = append_row(y_seasonal_short, -sum(y_seasonal_short));
}
model {
  y_seasonal[2:365] ~ normal(y_seasonal[1:(365-1)], sigma_yday);
  y_seasonal[1] ~ normal(y_seasonal[365], sigma_yday); // circularity for seasonality
  
  mu[2:n] ~ normal(mu[1:(n-1)], sigma_level);
  y ~ poisson_log(mu + y_seasonal[yday]);
  sigma_level ~ lognormal(-3.5, 2);
  sigma_yday ~ lognormal(-3.5, 2);
} 
1 Like

A lognormal prior with mean -3.5 corresponds to mean 0.03 … which seems very small. Recall that Stan expects things (by default) to have unit variance. Thus, if you want to make it easier for Stan’s adaption you scale your things a priori to something closer to unity. Moreover, I am not sure if sigma_level and sigma_yday should vary a priori on the same scale. Shouldn’t these have different scales?

From my own experience, the default adapt_delta of 0.8 is too low for most of my models. Often I use 0.95 or even higher if needed. In addition I also set very often the stepsize to 0.01 (or even 0.001) to avoid divergencies which I am getting otherwise.

1 Like

You are amazing! I would aggregate weekly but I’m trying to measure the effect of a series of weekend-long interventions. (see here < http://peterphalen.github.io/ceasefire/ > for my attempt at doing so using splines, if you’re interested.) I’ll run your solution which seems totally reasonable. The fact that you’re noticing a jump around 2015 suggests it’s working great, since that jump is well-documented. Thank you so much!

1 Like

Yes @wds15 you are right in that rescaling closer to unit variance would be better. I got the scales from a run with no priors, the sigmas are small because there are too many time points with too little evidence, so the underlying latent time series changes slowly. That is also why it is so hard to sample, for there is a strong correlation over the dependency chain (of successive days).

It could make sense to try whitening the latent series by scaling to unit variance and by decorrelating: Sample independent ‘innovations’, and cumsum over them in the transformed parameters block.

I tried adapt_delta=0.95 after writing my reply above. That erased the divergent transitions from the chains. Thanks for the tip with stepsize.

@peopletrees, consider the whitening approach suggested above if you start doing serious inference. The current model is slow to run, even while being faster than the original. ;)

Oops, there is a typo in the code: y_seasonal[165] should be y_seasonal[365] obviously.
(And sorry, I messed up this reply while trying to edit it.)

1 Like

The whitened model below mixes well with default sampler parameters, 1000 iterations in 168 seconds.

(This was an interesting experiment in that I have my own AR model with gaussian likelihood but lots of missing data. It would benefit of the same treatment. ;))

data {
  int<lower=1> n;   // n = 2632
  int y[n]; // number of shootings per day
}
transformed data {
  int<lower=1,upper=365> yday[n];
  for (i in 1:n) yday[i] = i % 365 + 1;
}
parameters {
  vector[n] mu_innovations;
  vector[365] seasonal_innovations; // yearly seasonality 
  real<lower=0> sigma_mu;
  real<lower=0> sigma_yday;
  real baseline;
}
transformed parameters {
  // zero-mean seasonal term
  vector[365] y_seasonal;
  vector[n] mu;
  { vector[365] seasonal_with_trend;
    real trend;
    seasonal_with_trend = cumulative_sum(seasonal_innovations);
    trend = seasonal_with_trend[365];
    for (i in 1:365)
        y_seasonal[i] = sigma_yday/100 * (seasonal_with_trend[i] - trend * i/365.0); }

  mu = sigma_mu/100 * cumulative_sum(mu_innovations);
}
model {
  seasonal_innovations ~ normal(0, 1);
  mu_innovations ~ normal(0, 1);
  y ~ poisson_log(baseline + mu + y_seasonal[yday]);
  sigma_mu ~ lognormal(-3.5 + log(100), 2);
  sigma_yday ~ lognormal(-3.5 + log(100), 2);
} 


1 Like

Wow, no divergent transitions or warnings. This is really excellent.

Interestingly, the measured effect of Baltimore Ceasefire is almost identical between my earlier GAM/spline model and your state-space version (with added weekly seasonality and a marker for Ceasefire). Both models measure a ~60% reduction in shootings on Ceasefire days.

1 Like

The last model seems to benefit from an overdispersion term, btw.
Something like https://gist.github.com/euxoa/96271ea83fdde8d6b0e073d5491ea75d

This is because the magnitudes of innovations determine the extent of the random walk, but don’t allow pointwise excess deviations from the latent mean (and the likelihood is Poisson with no variance parameter). The model estimates a standard deviation of ≈0.45 for the extra variation, quite high compared to other scales.

You could get the same effect with a negative binomial, of course, maybe more efficiently.

The change in 2015 is huge! I wonder how the results would be with t-distributed innovations. (Of course the substance matter is interesting as well, such a persistent effect from the tragedy.)

Thanks for pointing out that this is a state-space model. I had not realised it… :D

@“t-distributed innovations” I’m not sure, but I will say that I fit an earlier version of this model that included a stochastic slope in addition to the stochastic level, and it showed an even steeper upward spike in 2015 that then dropped down a little bit to the “new baseline” that you see from 2016 onwards.

Post-uprising, I wonder if there was a change in how the police were reporting crimes including shootings. There would have been a perceived incentive to report more crime following the Baltimore uprising.

Thank you for the tip to include the overdispersion term.

(Sorry for reviving the topic, I just want to add something for those who may use this model in the future)

I changed your model a little (here I have a more generic implementation of it) and, when using the Poisson likelihood, it seems to get a 1.4x speedup (I tested it only with the yearly and weekly seasonal components). I did the following:

  • Vectorized some operations – this is where most of the speedup comes from.

  • The “_seasonal” components are centered on zero, meaning that I subtract the seasonal trend by the mean of the innovations instead of the sum of the innovations. This gives it a speedup, allows you to estimate the uncertainty of the last seasonal parameter correctly (you cannot do that with the sum of innovations), it seems to solve some issues of multimodality when mu is small, and the sampler ‘complains’ less.

And @scellus was right: allowing for overdispersion makes a huge difference. I tested the model on 3 data sets, and in all of them, despite having an additional parameter to estimate, the model runs faster and with fewer warnings when using a negative binomial. I have yet to add student-t innovations, but that will probably help a lot with estimating the yearly seasonality.

Beyond that, I refactored the code a bit and gave an example of how you would add “sporadic covariates” (like holidays). Although it works well enough, I am not sure I did that in the best way, so if you have any suggestions hit me up.

3 Likes

Is still slow?