Fit time monotonically slower with progressive simulation iterations

Hi everyone,
I am currently running a simulation that involves fitting a hierarchical model (as part of a multilevel regression and poststratification [MRP] protocol) to a sample drawn from a simulated population. This is the relevant workflow:

  1. Generate the population data
  2. For i in 1:100
    a. Draw a sample from the population data
    b. Fit a hierarchical model to the sample
    c. Extract estimates from model
    d. Do more non-Stan related things; delete fitted Stan model
  3. Summarize results

I am running rstan with 4 chains, each with 2000 iterations (1900 warmup, 100 sampling).
The issue is that the amount of time required for rstan to complete its sampling monotonically increases with i. Specifically, rstan’s runtime is about 10 seconds per chain when i=1, and steadily increases to 70 seconds per chain when i = 4. See the graph below, where I plotted the runtime of rstan against i. (dropped the first observation since it didn’t record the time correctly).

This is surprising, as I would expect the runtime to be independent of i. It may be relevant that I am using stratified cluster sampling in this simulation with 5 strata, 80 cluster per strata, and 400 clusters total with around 8 clusters per strata sampled; rstan did not give me issues when I ran a very similar model-fitting procedure on samples drawn from a non-complex sampling design from a separate experiment.

Why might this be happening? I can provide more details and/or try to write a reproducible example if needed. Rstan code included at the end of this post. Thank you in advance!


The data generating process depends on a continuous X_1, binary X_2, their interaction, and a cluster-specific effect. For cluster i and subject j:

\begin{align*}Y_{ij} &= \alpha_0 + \alpha_1 X_{1ij} + \alpha_{X2[ij]} + \alpha_{X1*X2[ij]}X_{1ij} + \alpha_{clus[i]} + \epsilon_{ij}\\ &X_{1ij} \sim N(0,1)\\ &X_{2ij} \sim Bernoulli(expit(0.6X_{1ij}))\\ &\alpha_{clus[i]} \sim N(0,1), \quad i \in \{1, \ldots, N\}\\ &\epsilon_{ij}\sim N(0,1)\end{align*}
Because my endgoal is to use MRP, I use a categorical version of X_1 (X1_{cat}) in the outcome model fitted by rstan:
\begin{align*} \theta_u &= \beta_0 + \beta_{X1cat[u]} + \beta_{X2[u]} + \eta_{X12[u]} + \eta_{clus[u]}\\ Y_{ij} &\sim N(\theta_{u[ij]}, \sigma_y^2)\\ \beta_{var} &\sim Uniform(-\infty, \infty)\\ \eta_{var} &\sim N(0, \sigma^2_{var})\\ \sigma^2_{var}, \sigma^2_y &\sim N(0,1) \end{align*}


data{ 
  int<lower=0> Max2; // number of X1:X2 groups (2x2 = 4)
  int<lower=0> Mu; // number of sampled clusters
  int<lower=0> J_stan; // number of cellmeans
  int<lower=0> n; // total sample size
  real y[n];      // continuous outcome
  vector<lower=0, upper=1>[J_stan] agroup; // X1 index at cell level
  vector<lower=0, upper=1>[J_stan] x2group; // X2 index at cell level
  int<lower=1, upper=Max2> ax2group[J_stan]; //A:X2 index at cell level
  int<lower=1> ugroup[J_stan]; // cluster effect index at cell level
  int<lower=1, upper= J_stan> cell_label[n];
} 

parameters{
  vector[3] beta; //global intercept,  X1a, X2
  vector[Max2] ax2z;
  vector[Mu] u;
  real<lower=0> sigmay; // sd of the observations
  real<lower=0> sigma_ax2; //sd of X1:X2 interaction
  real<lower=0> sigma_u; // sd of cluster effects
}

transformed parameters{
  vector[Max2] alphaax2 = ax2z * sigma_ax2; // X1:X2 interaction
  vector[Mu] alphau = u * sigma_u;
vector[J_stan] cellmean = beta[1] +  beta[2]*agroup  + beta[3]*x2group + alphaax2[ax2group]+alphau[ugroup];
}
 
model{
  ax2z ~std_normal();
  u ~ std_normal();
  sigma_ax2 ~std_normal();
  sigma_u ~std_normal();
  sigmay ~ std_normal();
  y~ normal(cellmean[cell_label], sigmay);
}


1 Like

So you’re drawing the same sample size from the population and fitting the same model, but the runtime is increasing?

There are a couple things I can think of. If you’re on Mac or Linux, you might be running out of RAM and RStan starts memory swapping. Your CPU might be thermal throttling, where the first runs are heating up the CPU so your system reduces the clock speed to stop overheating.

However, a simpler cause might just be in your model fitting, if you’re drawing an increasingly larger sample size at each iteration. Reproducible code would be the easiest way for us to debug this

2 Likes

Ah, I feel foolish because it was indeed because I left out a crucial line of code that would clear out the sample indicators after every iteration, leading to the sample size increasing. Thank you!

It’s always the smallest bit of code that messes everything up, been there more than once! Glad to hear that it’s all working now

2 Likes