How to speed up Rstan model

ShuaiFang51009 · May 24, 2018, 2:56am

Hi all,
I’m new to stan.
The purpose of my analysis is to detect how the neighbor crowding influence the survival of focal tree. The number of survival observations is about 20,000.
This is some results of the model, I set iter = 500, chains = 1 and algorithm=“HMC”.


                mean se_mean     sd     2.5%      25%      50%      75%    97.5% n_eff Rhat
bet[1]          0.51    0.02   0.28     0.03     0.31     0.48     0.71     1.05   250 1.00
bet[2]          1.41    0.07   1.04    -0.53     0.76     1.36     2.06     3.81   250 1.00
bet[3]          9.08    6.39 101.02  -165.10   -59.97     1.92    69.82   239.96   250 1.00
bet[4]          2.67   12.49  80.92  -153.52   -57.21    -4.51    54.76   175.49    42 1.01
bet[5]         13.75    5.93  93.81  -156.52   -55.79     3.64    78.66   192.58   250 1.00
b[1,1]          1.15    0.04   0.69    -0.19     0.65     1.20     1.62     2.45   250 1.00
b[1,2]          7.02    3.66   9.03    -5.46     0.76     4.28    11.58    28.70     6 1.14
b[1,3]         -0.49    1.70   4.69   -10.67    -2.88     0.35     2.74     7.77     8 1.14
b[1,4]         -6.64    2.03   5.36   -19.14   -10.09    -5.39    -2.76     1.03     7 1.08

My questions:

why are the n_eff so different, ranging from 6 to 250? I search for the meaning of n_eff and find nothing.
I have tried to increase the number of iteration and chain, but the running time is quite long. Is there anything suggestion to speed up the model?

Following is the stan model I used.


data { #these are all data elements
    int<lower=0> nplot; #number of unique plot
    int<lower=0> nspe; #number of species
    int<lower=0> ncensus; #number of census
    int<lower=0> n; #number of survival observations
    int<lower=0> ncov; #number of covariates
    int<lower=0> plot[n]; #plot for each survival observation
    int<lower=0> spe[n]; #species of each survival observation
    int<lower=0> census[n]; #census for each survival observation
    vector[nspe] meanT;  #trait values for each species
    matrix[n,ncov] COV; #covariate values for each observation, where first column is all equal to ‘1’ for intercept, second column is NC, third column is NCI, fourth column is NCIH, fifth column is log.DBH
    int<lower=0,upper=1> status[n]; #survival observations 
}

parameters {
    real bet[5]; 
    real b[nspe,ncov]; 
    real int2[nplot]; 
    real int3[ncensus]; 
    real <lower=0.00001> sigmab[ncov]; 
    real int1[ncov]; 
    real <lower=0.00001> sigmaplot; 
    real <lower=0.00001> sigmacensus; 
}

model {

sigmacensus~gamma(100,100); 
sigmaplot~gamma(100,100); 

for(i in 1:4)
{
bet[i]~normal(0,100); 
}

for(i in 1:ncov)
{

int1[i]~normal(0,100); 
sigmab[i]~gamma(100,100);  

}

    for(j in 1:nspe)
    {
        b[j,1]~normal(int1[1]+bet[1]*meanT[j],sigmab[1]); 
        b[j,2]~normal(int1[2]+bet[2]*meanT[j],sigmab[2]); 
        b[j,3]~normal(int1[3],sigmab[3]); 
        b[j,4]~normal(int1[4],sigmab[4]); 
		b[j,4]~normal(int1[4],sigmab[4]);
    }
    
    for(i in 1:nplot)
    {
        int2[i]~normal(0,sigmaplot); 
    }
   
    for(i in 1:ncensus)
    {
        int3[i]~normal(0,sigmacensus); 
    }
    
    for(i in 1:n)
    {
        status[i]~bernoulli_logit(b[spe[i],1]*COV[i,1]+b[spe[i],2]*COV[i,2]+b[spe[i],3]*COV[i,3]+b[spe[i],4]*COV[i,4]+b[spe[i],5]*COV[i,5]
		int2[plot[i]]+
		int3[census[i]]) ;  
		
    } 

}

Bob_Carpenter · May 24, 2018, 4:48am

n_eff is for effective sample size. How it’s calculated is explained in the manual in the chapter on MCMC. We estimate it a little more conservatively than most systems.

Your model probably hasn’t converged yet with an effective sample size of 6. Try running it for 2000 iterations.

Is your model consistent with your data? Do you really expect sigmacensus to be very close to 1?

For speed, you need to vectorize all those sampling statements and vector multiplies (meanT will have to be declared as a vector), e.g.,

b[ , 1] ~ normal(int[1] + beta[1] * meanT, sigmab[1]);

Given the big expression for the mean in the bernoulli-logit, I’d be careful to check for identifiability. That can cause problems for convergence.

All of this is discussed at length in the manual.

ShuaiFang51009 · June 3, 2018, 12:43am

Hi Bob, thank you so much for your reply! I will read the manual again.

Topic		Replies	Views
N_eff greater than number of iterations? RStan rstan	1	3375	January 2, 2019
Slow Stan Code Modeling fitting-issues , performance	33	3171	June 27, 2018
My first Stan model - hierarchical logistic regression Modeling cognitive-science	28	8417	June 6, 2017
Rstan output warning Modeling	6	776	June 14, 2020
How to speed up my Stan code and sampling in rstan? Modeling rstan , fitting-issues , performance	8	1198	May 28, 2021

How to speed up Rstan model

Related topics