Crash when using exponential_rng() to generate predictions

Hi
I’m working with the following model and I would like to use the generated quantities section to generate predictions

data{
    int N;
    int adopted[N];
    vector[N] days_to_event;
    int color_id[N];
}
parameters{
    vector[2] a;
}
model{
    vector[N] lambda;
    vector[N] mu;
    a ~ normal( 0 , 1 );
    for ( i in 1:N ) {
        mu[i] = a[color_id[i]];
        mu[i] = exp(mu[i]);
    }
    for ( i in 1:N ) {
        lambda[i] = 1/mu[i];
    }
    for ( i in 1:N ) 
        if ( adopted[i] == 0 ) target += exponential_lccdf(days_to_event[i] | lambda[i]);
    for ( i in 1:N ) 
        if ( adopted[i] == 1 ) days_to_event[i] ~ exponential( lambda[i] );
}

I tried to add this :

generated quantities{
  real pred[N];
  for(i in 1:N){
    pred[i]=exponential_rng(lambda);
  }
}

but Stan says:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Variable "lambda" does not exist.

So, I altered the code as follows:

data{
    int N;
    int adopted[N];
    vector[N] days_to_event;
    int color_id[N];
}
parameters{
    vector[2] a;
}
transformed parameters{
  vector[N] lambda;
  vector[N] mu;
    for ( i in 1:N ) {
        mu[i] = a[color_id[i]];
        mu[i] = exp(mu[i]);
    }
    for ( i in 1:N ) {
        lambda[i] = 1/mu[i];
    }
}
model{
    a ~ normal( 0 , 1 );
    for ( i in 1:N ) 
        if ( adopted[i] == 0 ) target += exponential_lccdf(days_to_event[i] | lambda[i]);
    for ( i in 1:N ) 
        if ( adopted[i] == 1 ) days_to_event[i] ~ exponential( lambda[i] );
}
generated quantities{
  real pred[N];
  for(i in 1:N){
    pred[i]=exponential_rng(lambda[i]);
  }
}

but when I run it RStudio crashes. I guess I made a mistake somewhere in the code, but I don’t know where.

I’m attaching the dataset and code to reproduce the analysis.

d <- AustinCats
d$adopt <- ifelse( d$out_event=="Adoption" , 1L , 0L )
dat <- list(
  N=nrow(d),
  days_to_event = as.numeric( d$days_to_event ),
  color_id = ifelse( d$color=="Black" , 1L , 2L ) ,
  adopted = d$adopt
)

AustinCats.csv (2.8 MB)

1 Like

I don’t notice anything in your Stan code that’s necessarily wrong. The syntax looks fine after you made the switch to transformed parameters instead of the model block (variables declared in the model block are local so that’s why it couldn’t find lambda originally).

Are you running this with RStan or CmdStanR? My guess is the former because we haven’t seen many (or maybe any) examples of CmdStanR crashing RStudio (I ran your model with CmdStanR and it didn’t crash for me).

If you’re using RStan then is it possible that you’re running out of memory? Your dataset has N \approx 22,000 and you’re saving lambda, mu, and pred, all of which are length N and they’re each saved for every MCMC iteration for each chain. Running 4 chains of 2000 iterations and saving all of that brings the combined size of the 4 CSV files I obtained from CmdStanR to multiple gigabytes. But I didn’t even save the warmup iterations when I ran it in CmdStanR and RStan does save warmup by default, so that would drastically increase the size as well. Unlike CmdStanR, which writes to CSV, RStan keeps all of this in memory which is why it could be more of a problem with RStan.

I’m not 100% sure that memory is what is leading to the crash, but here’s a version of your Stan program that you can try that should be much more memory friendly by avoiding saving all of mu and lambda (at the expense of computing each lambda value twice, once in model and once in generated quantities):

data{
  int N;
  int adopted[N];
  vector[N] days_to_event;
  int color_id[N];
}
parameters{
  vector[2] a;
}
model{
  a ~ normal(0, 1);
  for (i in 1:N) {
    real lambda_i = inv(exp(a[color_id[i]]));
    if ( adopted[i] == 0 ) 
      target += exponential_lccdf(days_to_event[i] | lambda_i);
    else 
      days_to_event[i] ~ exponential( lambda_i );
  }
}
generated quantities{
  real pred[N];
  for(i in 1:N){
    real lambda_i = inv(exp(a[color_id[i]]));
    pred[i] = exponential_rng(lambda_i);
  }
}

P.S. Big fan of your dataset!

1 Like

Thanks @jonah . No crash and it runs in about 40 seconds.

The dataset is from the rethinking package from Statistical Rethinking. This particular example didn’t make it to the 2nd edition but it’s discussed in minute 31:00: https://www.youtube.com/watch?v=p7g-CgGCS34&list=WL&index=19

1 Like