How to speed up Rstan model


#1

Hi all,
I’m new to stan.
The purpose of my analysis is to detect how the neighbor crowding influence the survival of focal tree. The number of survival observations is about 20,000.
This is some results of the model, I set iter = 500, chains = 1 and algorithm=“HMC”.


                mean se_mean     sd     2.5%      25%      50%      75%    97.5% n_eff Rhat
bet[1]          0.51    0.02   0.28     0.03     0.31     0.48     0.71     1.05   250 1.00
bet[2]          1.41    0.07   1.04    -0.53     0.76     1.36     2.06     3.81   250 1.00
bet[3]          9.08    6.39 101.02  -165.10   -59.97     1.92    69.82   239.96   250 1.00
bet[4]          2.67   12.49  80.92  -153.52   -57.21    -4.51    54.76   175.49    42 1.01
bet[5]         13.75    5.93  93.81  -156.52   -55.79     3.64    78.66   192.58   250 1.00
b[1,1]          1.15    0.04   0.69    -0.19     0.65     1.20     1.62     2.45   250 1.00
b[1,2]          7.02    3.66   9.03    -5.46     0.76     4.28    11.58    28.70     6 1.14
b[1,3]         -0.49    1.70   4.69   -10.67    -2.88     0.35     2.74     7.77     8 1.14
b[1,4]         -6.64    2.03   5.36   -19.14   -10.09    -5.39    -2.76     1.03     7 1.08

My questions:

  1. why are the n_eff so different, ranging from 6 to 250? I search for the meaning of n_eff and find nothing.
  2. I have tried to increase the number of iteration and chain, but the running time is quite long. Is there anything suggestion to speed up the model?

Following is the stan model I used.


data { #these are all data elements
    int<lower=0> nplot; #number of unique plot
    int<lower=0> nspe; #number of species
    int<lower=0> ncensus; #number of census
    int<lower=0> n; #number of survival observations
    int<lower=0> ncov; #number of covariates
    int<lower=0> plot[n]; #plot for each survival observation
    int<lower=0> spe[n]; #species of each survival observation
    int<lower=0> census[n]; #census for each survival observation
    vector[nspe] meanT;  #trait values for each species
    matrix[n,ncov] COV; #covariate values for each observation, where first column is all equal to ‘1’ for intercept, second column is NC, third column is NCI, fourth column is NCIH, fifth column is log.DBH
    int<lower=0,upper=1> status[n]; #survival observations 
}

parameters {
    real bet[5]; 
    real b[nspe,ncov]; 
    real int2[nplot]; 
    real int3[ncensus]; 
    real <lower=0.00001> sigmab[ncov]; 
    real int1[ncov]; 
    real <lower=0.00001> sigmaplot; 
    real <lower=0.00001> sigmacensus; 
}

model {

sigmacensus~gamma(100,100); 
sigmaplot~gamma(100,100); 

for(i in 1:4)
{
bet[i]~normal(0,100); 
}

for(i in 1:ncov)
{

int1[i]~normal(0,100); 
sigmab[i]~gamma(100,100);  

}

    for(j in 1:nspe)
    {
        b[j,1]~normal(int1[1]+bet[1]*meanT[j],sigmab[1]); 
        b[j,2]~normal(int1[2]+bet[2]*meanT[j],sigmab[2]); 
        b[j,3]~normal(int1[3],sigmab[3]); 
        b[j,4]~normal(int1[4],sigmab[4]); 
		b[j,4]~normal(int1[4],sigmab[4]);
    }
    
    for(i in 1:nplot)
    {
        int2[i]~normal(0,sigmaplot); 
    }
   
    for(i in 1:ncensus)
    {
        int3[i]~normal(0,sigmacensus); 
    }
    
    for(i in 1:n)
    {
        status[i]~bernoulli_logit(b[spe[i],1]*COV[i,1]+b[spe[i],2]*COV[i,2]+b[spe[i],3]*COV[i,3]+b[spe[i],4]*COV[i,4]+b[spe[i],5]*COV[i,5]
		int2[plot[i]]+
		int3[census[i]]) ;  
		
    } 

} 
 


#2

n_eff is for effective sample size. How it’s calculated is explained in the manual in the chapter on MCMC. We estimate it a little more conservatively than most systems.

Your model probably hasn’t converged yet with an effective sample size of 6. Try running it for 2000 iterations.

Is your model consistent with your data? Do you really expect sigmacensus to be very close to 1?

For speed, you need to vectorize all those sampling statements and vector multiplies (meanT will have to be declared as a vector), e.g.,

b[ , 1] ~ normal(int[1] + beta[1] * meanT, sigmab[1]);

Given the big expression for the mean in the bernoulli-logit, I’d be careful to check for identifiability. That can cause problems for convergence.

All of this is discussed at length in the manual.


#3

Hi Bob, thank you so much for your reply! I will read the manual again.