Rstan error during sampling for large dataset

I build a simple hierarchical stan model to retrieve the probability theta of binomial distribution (code below).

My input data is a table with the information of Number of all (N) and successful (n) trials, and metadata to which group it belongs (x:0-1, t:1-225, g:1-4 and c:1-4)

N n x t g c
123 10 1 1 1 2
531 500 1 0 1 1
12 6 0 1 2 1

Running it with rstan, I get the following error during/after sampling:

SAMPLING FOR MODEL 'Modle_roadmap_20210428' NOW (CHAIN 4).                    
Error in FUN(X[[i]], ...) :                                                   
  trying to get slot "mode" from an object (class "try-error") that is not an 
S4 object                                                                     
Calls: stan ... sampling -> sampling -> .local -> sapply -> lapply -> FUN     
In addition: Warning message:                                                 
In parallel::mclapply(1:chains, FUN = callFun, mc.preschedule = FALSE,  :     
  4 function calls resulted in an error                                       
Execution halted  

Occasionally it gets a full run, when I only use 1 chain and I increase the stack size limit in my terminal (I dont hit a memory or CPU limit). And it always works when I reduce the datasets (The full set are 250,000 observations/rows).

I am unsure if this datasets is just to big, or if there is a way to optimize the code to make it work (ideally with an even bigger datasets). I read the documentation on optimization and implemented the brute force vectorization approach described in chapter 21.8, however this did not effect the error, or performance (for smaller models).

data {
  int<lower=1> Nr; // 
  int<lower=1> Nt;//
  int<lower=1> Nxtgc; // 
  int<lower=1> xtgc[Nr]; // 
  int<lower=1> N[Nr]; // 
  int<lower=0> n[Nr]; // 
  int<lower=1> tLookup[Nxtgc]; // lookuptable for entry which t it belongs to
}

parameters {
  real a_t[Nt];
  real a_xtgc[Nxtgc];
  real <lower=0>sigma_xtgc;
  real <lower=0>simga_t;
}

transformed parameters {
  vector[Nr] theta; // binomial probabilities
  
  for (rowIndex in 1:Nr) { // linear model
    theta[rowIndex] = inv_logit(a_xtgc[xtgc[rowIndex]]);
  }
}

model {
  vector[Nxtgc] a_t_ii;
  simga_t ~ exponential(0.01);
  a_t ~ normal(-4,simga_t);
  sigma_xtgc ~ exponential(0.01);
  for (Idx in 1:Nxtgc) { // brute force vectorization  approch according to stan user guide (21.8)
    a_t_ii[Idx] = a_t[tLookup[Idx]];
  }
  a_xtgc ~  normal(a_t_ii,sigma_xtgc);  

  n ~ binomial(N, theta);
}

Thanks a lot in advance.

I got a solution:

Apparently there are to many (unnecessary) intermediate variables, which then crashes the program. You can get rid of theta by substituting “binomial()” with “binomial_logit()”. And the for-loop can be also substituted with a more elegant alternative.

data {
  int<lower=1> Nr; // 
  int<lower=1> Nt;//
  int<lower=1> Nxtgc; // 
  int<lower=1> xtgc[Nr]; // 
  int<lower=1> N[Nr]; // 
  int<lower=0> n[Nr]; // 
  int<lower=1> tLookup[Nxtgc]; // lookuptable for combination which t it belongs to
}

parameters {
  real a_t[Nt];
  real a_xtgc[Nxtgc];
  real <lower=0>sigma_xtgc;
  real <lower=0>simga_t;
}

model {
  simga_t ~ exponential(0.01);
  a_t ~ normal(-4,simga_t);
  sigma_xtgc ~ exponential(0.01);
  a_xtgc ~  normal(a_t[tLookup],sigma_xtgc);  

  n ~ binomial_logit(N, a_xtgc[xtgc]);
}
1 Like