Rstan error during sampling for large dataset

keks · April 30, 2021, 8:06am

I build a simple hierarchical stan model to retrieve the probability theta of binomial distribution (code below).

My input data is a table with the information of Number of all (N) and successful (n) trials, and metadata to which group it belongs (x:0-1, t:1-225, g:1-4 and c:1-4)

N	n	x	t	g	c
123	10	1	1	1	2
531	500	1	0	1	1
12	6	0	1	2	1

Running it with rstan, I get the following error during/after sampling:

SAMPLING FOR MODEL 'Modle_roadmap_20210428' NOW (CHAIN 4).                    
Error in FUN(X[[i]], ...) :                                                   
  trying to get slot "mode" from an object (class "try-error") that is not an 
S4 object                                                                     
Calls: stan ... sampling -> sampling -> .local -> sapply -> lapply -> FUN     
In addition: Warning message:                                                 
In parallel::mclapply(1:chains, FUN = callFun, mc.preschedule = FALSE,  :     
  4 function calls resulted in an error                                       
Execution halted

Occasionally it gets a full run, when I only use 1 chain and I increase the stack size limit in my terminal (I dont hit a memory or CPU limit). And it always works when I reduce the datasets (The full set are 250,000 observations/rows).

I am unsure if this datasets is just to big, or if there is a way to optimize the code to make it work (ideally with an even bigger datasets). I read the documentation on optimization and implemented the brute force vectorization approach described in chapter 21.8, however this did not effect the error, or performance (for smaller models).

data {
  int<lower=1> Nr; // 
  int<lower=1> Nt;//
  int<lower=1> Nxtgc; // 
  int<lower=1> xtgc[Nr]; // 
  int<lower=1> N[Nr]; // 
  int<lower=0> n[Nr]; // 
  int<lower=1> tLookup[Nxtgc]; // lookuptable for entry which t it belongs to
}

parameters {
  real a_t[Nt];
  real a_xtgc[Nxtgc];
  real <lower=0>sigma_xtgc;
  real <lower=0>simga_t;
}

transformed parameters {
  vector[Nr] theta; // binomial probabilities
  
  for (rowIndex in 1:Nr) { // linear model
    theta[rowIndex] = inv_logit(a_xtgc[xtgc[rowIndex]]);
  }
}

model {
  vector[Nxtgc] a_t_ii;
  simga_t ~ exponential(0.01);
  a_t ~ normal(-4,simga_t);
  sigma_xtgc ~ exponential(0.01);
  for (Idx in 1:Nxtgc) { // brute force vectorization  approch according to stan user guide (21.8)
    a_t_ii[Idx] = a_t[tLookup[Idx]];
  }
  a_xtgc ~  normal(a_t_ii,sigma_xtgc);  

  n ~ binomial(N, theta);
}

Thanks a lot in advance.

keks · May 5, 2021, 2:16pm

I got a solution:

Apparently there are to many (unnecessary) intermediate variables, which then crashes the program. You can get rid of theta by substituting “binomial()” with “binomial_logit()”. And the for-loop can be also substituted with a more elegant alternative.

data {
  int<lower=1> Nr; // 
  int<lower=1> Nt;//
  int<lower=1> Nxtgc; // 
  int<lower=1> xtgc[Nr]; // 
  int<lower=1> N[Nr]; // 
  int<lower=0> n[Nr]; // 
  int<lower=1> tLookup[Nxtgc]; // lookuptable for combination which t it belongs to
}

parameters {
  real a_t[Nt];
  real a_xtgc[Nxtgc];
  real <lower=0>sigma_xtgc;
  real <lower=0>simga_t;
}

model {
  simga_t ~ exponential(0.01);
  a_t ~ normal(-4,simga_t);
  sigma_xtgc ~ exponential(0.01);
  a_xtgc ~  normal(a_t[tLookup],sigma_xtgc);  

  n ~ binomial_logit(N, a_xtgc[xtgc]);
}

Topic		Replies	Views
Error triggered by changing sample size? Modeling	11	1409	May 18, 2018
Strange error running rstan RStan	17	3592	January 3, 2018
Sampler error in rstan on Mac that doesn't reproduce on other computers General	26	2457	July 13, 2017
Stan abort, R crash, or other error when sampling relatively simple model RStan rstan	4	1586	September 26, 2019
Error calling sampler when fitting hierarchical model RStan specification , hierarchical-model	17	606	June 5, 2021

Rstan error during sampling for large dataset

Related topics