Sampling error "bad alloc" when trying to fit a hierarchical model

Hello,

I’m trying to fit a hand-coded hierarchical regression mode with random intercepts for subject and item. I have a model structure which samples fine without the subject- and item-specific offset to the intercept for “shift”, but when I run it with these enrichments I eventually get a “bad alloc” error:

Chain 1: Exception: std::bad_alloc (in ‘model688c6fe52ad2_5_interceptsforsubjectsanditems’ at line 45)
[origin: bad_alloc]
[1] “Error in sampler$call_sampler(args_list[[i]]) : "
[2] " Exception: std::bad_alloc (in ‘model688c6fe52ad2_5_interceptsforsubjectsanditems’ at line 45)”
[3] " [origin: bad_alloc]"
[1] “error occurred during calling the sampler; sampling not done”

I’ve looked around on these forums and followed the advice I saw re: setting the iter to 1 etc and closing all other open programs, but my memory is still maxing out eventually before I get any samples back. Below is the stan code:

data { 
  int<lower = 1> N_trials;
  int<lower = 1,upper = 4> w_ans[N_trials];
  
  
  int<lower = 0, upper = 1> heavy_syll_where_stress_shifts_to[N_trials];
  real rem_freq[N_trials];
  
  int<lower = 1> N_subject;
  int<lower = 1> N_item;
  int<lower = 1, upper = N_subject> subject[N_trials];
  int<lower = 1, upper = N_item> item[N_trials];

}
parameters {
  real<lower = 0, upper = 1> know_remote;
  vector[N_subject] shift_subject_intercept_adjustor;
  vector[N_subject] shift_subject_intercept_adjustment_z;
    vector[N_item] shift_item_intercept_adjustor;

  vector[N_item] shift_item_intercept_adjustment_z;
  
  real BETA_heavy_syll_where_stress_shifts_to;
  real BETA_rem_freq;
  
  real INTERCEPT_shift_when_remote_known;
  real INTERCEPT_shift_when_remote_unknown;

} 
transformed parameters {
  simplex[4] theta[N_trials];
  for (n in 1:N_trials){
  for (s in 1:N_subject){
  for (i in 1:N_item){
    real overall_intercept_when_remote_known = INTERCEPT_shift_when_remote_known+(shift_subject_intercept_adjustment_z[s]*shift_subject_intercept_adjustor[s])+(shift_item_intercept_adjustment_z[i]*shift_item_intercept_adjustor[i]);
    real overall_intercept_when_remote_unknown =  INTERCEPT_shift_when_remote_unknown+(shift_subject_intercept_adjustment_z[s]*shift_subject_intercept_adjustor[s])+(shift_item_intercept_adjustment_z[i]*shift_item_intercept_adjustor[i]);

    real shift_when_remote_known = inv_logit(overall_intercept_when_remote_known+ BETA_heavy_syll_where_stress_shifts_to*heavy_syll_where_stress_shifts_to[n] + BETA_rem_freq*rem_freq[n]);
    real shift_when_remote_unknown = inv_logit(overall_intercept_when_remote_unknown + BETA_heavy_syll_where_stress_shifts_to*heavy_syll_where_stress_shifts_to[n]);

    theta[n, 1] = know_remote * shift_when_remote_known ; //yes remote, yes shift
    theta[n, 2] = know_remote * (1-shift_when_remote_known); //yes remote, no shift
    theta[n, 3] = (1 - know_remote) * shift_when_remote_unknown;  // no remote, yes shift 
    theta[n, 4] = (1 - know_remote) * (1-shift_when_remote_unknown) ; //no remote, no shift
    
  }
}}}
model {
  target += beta_lpdf(know_remote | 1, 1); 
    target += normal_lpdf(shift_subject_intercept_adjustment_z|0, 1 ); 
        target += normal_lpdf(shift_subject_intercept_adjustor|0, 1 );  

    target += normal_lpdf(shift_item_intercept_adjustment_z|0, 1 );  

    target += normal_lpdf(shift_item_intercept_adjustor|0, 1 );  


  target += normal_lpdf(INTERCEPT_shift_when_remote_known|0, 1 );  
    target += normal_lpdf(INTERCEPT_shift_when_remote_unknown|0, 1 );  
  target += normal_lpdf(BETA_heavy_syll_where_stress_shifts_to|0, 1 );  
  target += normal_lpdf(BETA_rem_freq|0, 1 );  


  
  for(n in 1:N_trials)
    target +=  categorical_lpmf(w_ans[n] | theta[n]);
}

Any tips for specifying the model in such a way that I can avoid this error? I have 16gb of memory on my local machine, and would really like to be able to run this model. Alternatively if there’s another error I’m missing that is causing the issue, I’d be very glad of any advice in fixing that also.

Below is the R code I’m using:

library(rstanarm)
library(rstantools)
library(tidyverse)
simplified_data <- read_csv("dataforstanforums.csv")

n_subjects <- simplified_data %>% 
  select(BetterSubject) %>% 
  distinct() %>% 
  summarise(
    count = n()
  )

n_items <- simplified_data %>% 
  select(LocalBaseID) %>% 
  distinct() %>% 
  summarise(
    count = n()
  )

data_list_5 <-  list(N_trials = nrow(simplified_data),
         w_ans = simplified_data$MPT_answer,
         N_subject = n_subjects$count,
        N_item = n_items$count,

         subject = simplified_data$BetterSubject,
         item = simplified_data$LocalBaseID,
         heavy_syll_where_stress_shifts_to = simplified_data$HeavySyllWhereStressShiftsTo,
         rem_freq = simplified_data$rem_freq
         #subject = simplified_data$Subject
         )

fit_one <- stan(file = '5_interceptsforsubjectsanditems.stan', data = data_list_5, chains = 1, warmup = 0, iter = 1, verbose = T, cores = 4, control = list(adapt_delta = .9)) 

dataforstanforums.csv (96.4 KB) is the anonymized dataset I’m using.

Any help with this would be very much appreciated!

Can you try moving everything in transformed parameters to the start of the model block?

All that stuff gets saved in memory in R. If N_trials was super super large, I think it’d be possible for this to cause out of memory stuff.

I see like 7000 rows in that CSV, so I assume N_trials = 7000. In which case this doesn’t make sense but that’s the first thing to try.

Thanks very much for your quick response - I tried doing what you suggested, and I’m getting an error that, while I suspect is trivial, I am not sure how to fix:

data { 
  int<lower = 1> N_trials;
  int<lower = 1,upper = 4> w_ans[N_trials];
  
  
  int<lower = 0, upper = 1> heavy_syll_where_stress_shifts_to[N_trials];
  real rem_freq[N_trials]; // so I guess I put covariates here?
  
  int<lower = 1> N_subject;
  int<lower = 1> N_item;
  int<lower = 1, upper = N_subject> subject[N_trials];
  int<lower = 1, upper = N_item> item[N_trials];

}
parameters {
  real<lower = 0, upper = 1> know_remote;
  vector[N_subject] shift_subject_intercept_adjustor;
  vector[N_subject] shift_subject_intercept_adjustment_z;
    vector[N_item] shift_item_intercept_adjustor;

  vector[N_item] shift_item_intercept_adjustment_z;
  
  real BETA_heavy_syll_where_stress_shifts_to;
  real BETA_rem_freq;
  
  real INTERCEPT_shift_when_remote_known;
  real INTERCEPT_shift_when_remote_unknown;

  //real<lower = 0, upper = 1> be_meu;

  //real a_alpha;
  //real<lower = 0> sigma_a_alpha_subject;
  //vector[N_subject] a_alpha_subject_tilde;
  //real f_alpha;
  //real f_beta;
} 
transformed parameters {
  }
model {
  
  simplex[4] theta[N_trials];
  for (n in 1:N_trials){
  for (s in 1:N_subject){
  for (i in 1:N_item){
    real overall_intercept_when_remote_known = INTERCEPT_shift_when_remote_known+(shift_subject_intercept_adjustment_z[s]*shift_subject_intercept_adjustor[s])+(shift_item_intercept_adjustment_z[i]*shift_item_intercept_adjustor[i]);
    real overall_intercept_when_remote_unknown =  INTERCEPT_shift_when_remote_unknown+(shift_subject_intercept_adjustment_z[s]*shift_subject_intercept_adjustor[s])+(shift_item_intercept_adjustment_z[i]*shift_item_intercept_adjustor[i]);

    real shift_when_remote_known = inv_logit(overall_intercept_when_remote_known+ BETA_heavy_syll_where_stress_shifts_to*heavy_syll_where_stress_shifts_to[n] + BETA_rem_freq*rem_freq[n]);
    real shift_when_remote_unknown = inv_logit(overall_intercept_when_remote_unknown + BETA_heavy_syll_where_stress_shifts_to*heavy_syll_where_stress_shifts_to[n]);

    //real spirantize_general = inv_logit(f_alpha + complexity[n] * f_beta);
    theta[n, 1] = know_remote * shift_when_remote_known ; //yes remote, yes shift
    theta[n, 2] = know_remote * (1-shift_when_remote_known); //yes remote, no shift
    theta[n, 3] = (1 - know_remote) * shift_when_remote_unknown;  // no remote, yes shift 
    theta[n, 4] = (1 - know_remote) * (1-shift_when_remote_unknown) ; //no remote, no shift
    
  }
}}
  target += beta_lpdf(know_remote | 1, 1); 
    target += normal_lpdf(shift_subject_intercept_adjustment_z|0, 1 ); 
        target += normal_lpdf(shift_subject_intercept_adjustor|0, 1 );  

    target += normal_lpdf(shift_item_intercept_adjustment_z|0, 1 );  

    target += normal_lpdf(shift_item_intercept_adjustor|0, 1 );  


  target += normal_lpdf(INTERCEPT_shift_when_remote_known|0, 1 );  
    target += normal_lpdf(INTERCEPT_shift_when_remote_unknown|0, 1 );  
  target += normal_lpdf(BETA_heavy_syll_where_stress_shifts_to|0, 1 );  
  target += normal_lpdf(BETA_rem_freq|0, 1 );  

   //target += beta_lpdf(shift_when_remote_known | 1, 1);  

  //target += beta_lpdf(spirantize_general | 2, 2);  

  //target += normal_lpdf(f_alpha | 0, 2);  
 // target += normal_lpdf(f_beta | 0, 2);  
 // target += normal_lpdf(a_alpha | 0, 2);  
 // target += normal_lpdf(a_alpha_subject_tilde | 0, 1);

  for(n in 1:N_trials)
    target +=  categorical_lpmf(w_ans[n] | theta[n]);
}
//generated quantities{
 // int<lower = 1, upper = 6> pred_w_ans[N_trials];
 // for(n in 1:N_trials) 
 //   pred_w_ans[n] = categorical_rng(theta[n]);
//


It says variable simplex does not exist, unsure where to declare it since I didn’t have to before. Also, I do only have about 7k datapoints, but 30 subjects and 117 items, so maybe it’s something to do with all the loops? Unsure, would you mind suggesting how to go about correcting the above code so that it compiles so I can run it?

Thanks very much!

Hmm, a simplex may be only a parameter/transformed parameter type, not a variable type.

Since you’re assigning to all four elements of theta[i] how about just doing:

vector[4] theta[N_trials]

That isn’t gonna check that things sum to 1.0, but you could debug that your code to guarantee that condition with some test data.

I’ll try this and post how it went, thanks very much.

Also, just for reference, I removed the hierarchy for the by-subject random intercept, and things compiled and sampled with the structure I had originally (extremely slowly, but still…). This makes me suspect that there are unrelated memory / efficiency issues in my code; any advice on these would also be very appreciated.

Start small with simulated data. Could be issues with coding, or it could be issues with the posterior, which could be issues with the prior or another modeling assumptions. Lots of moving parts to these things.