Fit time monotonically slower with progressive simulation iterations

likat · November 20, 2020, 11:52pm

Hi everyone,
I am currently running a simulation that involves fitting a hierarchical model (as part of a multilevel regression and poststratification [MRP] protocol) to a sample drawn from a simulated population. This is the relevant workflow:

Generate the population data
For i in 1:100
a. Draw a sample from the population data
b. Fit a hierarchical model to the sample
c. Extract estimates from model
d. Do more non-Stan related things; delete fitted Stan model
Summarize results

I am running rstan with 4 chains, each with 2000 iterations (1900 warmup, 100 sampling).
The issue is that the amount of time required for rstan to complete its sampling monotonically increases with i. Specifically, rstan’s runtime is about 10 seconds per chain when i=1, and steadily increases to 70 seconds per chain when i = 4. See the graph below, where I plotted the runtime of rstan against i. (dropped the first observation since it didn’t record the time correctly).

This is surprising, as I would expect the runtime to be independent of i. It may be relevant that I am using stratified cluster sampling in this simulation with 5 strata, 80 cluster per strata, and 400 clusters total with around 8 clusters per strata sampled; rstan did not give me issues when I ran a very similar model-fitting procedure on samples drawn from a non-complex sampling design from a separate experiment.

Why might this be happening? I can provide more details and/or try to write a reproducible example if needed. Rstan code included at the end of this post. Thank you in advance!

The data generating process depends on a continuous X_1, binary X_2, their interaction, and a cluster-specific effect. For cluster i and subject j:

\begin{align*}Y_{ij} &= \alpha_0 + \alpha_1 X_{1ij} + \alpha_{X2[ij]} + \alpha_{X1*X2[ij]}X_{1ij} + \alpha_{clus[i]} + \epsilon_{ij}\\ &X_{1ij} \sim N(0,1)\\ &X_{2ij} \sim Bernoulli(expit(0.6X_{1ij}))\\ &\alpha_{clus[i]} \sim N(0,1), \quad i \in \{1, \ldots, N\}\\ &\epsilon_{ij}\sim N(0,1)\end{align*}
Because my endgoal is to use MRP, I use a categorical version of X_1 (X1_{cat}) in the outcome model fitted by rstan:
\begin{align*} \theta_u &= \beta_0 + \beta_{X1cat[u]} + \beta_{X2[u]} + \eta_{X12[u]} + \eta_{clus[u]}\\ Y_{ij} &\sim N(\theta_{u[ij]}, \sigma_y^2)\\ \beta_{var} &\sim Uniform(-\infty, \infty)\\ \eta_{var} &\sim N(0, \sigma^2_{var})\\ \sigma^2_{var}, \sigma^2_y &\sim N(0,1) \end{align*}

data{ 
  int<lower=0> Max2; // number of X1:X2 groups (2x2 = 4)
  int<lower=0> Mu; // number of sampled clusters
  int<lower=0> J_stan; // number of cellmeans
  int<lower=0> n; // total sample size
  real y[n];      // continuous outcome
  vector<lower=0, upper=1>[J_stan] agroup; // X1 index at cell level
  vector<lower=0, upper=1>[J_stan] x2group; // X2 index at cell level
  int<lower=1, upper=Max2> ax2group[J_stan]; //A:X2 index at cell level
  int<lower=1> ugroup[J_stan]; // cluster effect index at cell level
  int<lower=1, upper= J_stan> cell_label[n];
} 

parameters{
  vector[3] beta; //global intercept,  X1a, X2
  vector[Max2] ax2z;
  vector[Mu] u;
  real<lower=0> sigmay; // sd of the observations
  real<lower=0> sigma_ax2; //sd of X1:X2 interaction
  real<lower=0> sigma_u; // sd of cluster effects
}

transformed parameters{
  vector[Max2] alphaax2 = ax2z * sigma_ax2; // X1:X2 interaction
  vector[Mu] alphau = u * sigma_u;
vector[J_stan] cellmean = beta[1] +  beta[2]*agroup  + beta[3]*x2group + alphaax2[ax2group]+alphau[ugroup];
}
 
model{
  ax2z ~std_normal();
  u ~ std_normal();
  sigma_ax2 ~std_normal();
  sigma_u ~std_normal();
  sigmay ~ std_normal();
  y~ normal(cellmean[cell_label], sigmay);
}

andrjohns · November 21, 2020, 1:53am

So you’re drawing the same sample size from the population and fitting the same model, but the runtime is increasing?

There are a couple things I can think of. If you’re on Mac or Linux, you might be running out of RAM and RStan starts memory swapping. Your CPU might be thermal throttling, where the first runs are heating up the CPU so your system reduces the clock speed to stop overheating.

However, a simpler cause might just be in your model fitting, if you’re drawing an increasingly larger sample size at each iteration. Reproducible code would be the easiest way for us to debug this

likat · November 21, 2020, 2:46am

Ah, I feel foolish because it was indeed because I left out a crucial line of code that would clear out the sample indicators after every iteration, leading to the sample size increasing. Thank you!

andrjohns · November 21, 2020, 6:46am

It’s always the smallest bit of code that messes everything up, been there more than once! Glad to hear that it’s all working now

Topic		Replies	Views
From fast to slow sampling on cluster after reset and older rstan version installed General	8	674	January 27, 2021
Theoretical lowest runtime – hierarchical linear model Modeling fitting-issues , specification , performance	14	1394	September 22, 2019
How to speed up my Stan code and sampling in rstan? Modeling rstan , fitting-issues , performance	8	1162	May 28, 2021
Within-chain parallelization in Stan Developers rstan , fitting-issues	2	1145	June 19, 2021
How to speed up sampling in rstan? Modeling rstan , performance , hierarchical-model	6	6013	July 24, 2020

Fit time monotonically slower with progressive simulation iterations

Related topics