How to speed up Rstan model

ShuaiFang51009 · May 24, 2018, 2:56am

Hi all,
I’m new to stan.
The purpose of my analysis is to detect how the neighbor crowding influence the survival of focal tree. The number of survival observations is about 20,000.
This is some results of the model, I set iter = 500, chains = 1 and algorithm=“HMC”.


                mean se_mean     sd     2.5%      25%      50%      75%    97.5% n_eff Rhat
bet[1]          0.51    0.02   0.28     0.03     0.31     0.48     0.71     1.05   250 1.00
bet[2]          1.41    0.07   1.04    -0.53     0.76     1.36     2.06     3.81   250 1.00
bet[3]          9.08    6.39 101.02  -165.10   -59.97     1.92    69.82   239.96   250 1.00
bet[4]          2.67   12.49  80.92  -153.52   -57.21    -4.51    54.76   175.49    42 1.01
bet[5]         13.75    5.93  93.81  -156.52   -55.79     3.64    78.66   192.58   250 1.00
b[1,1]          1.15    0.04   0.69    -0.19     0.65     1.20     1.62     2.45   250 1.00
b[1,2]          7.02    3.66   9.03    -5.46     0.76     4.28    11.58    28.70     6 1.14
b[1,3]         -0.49    1.70   4.69   -10.67    -2.88     0.35     2.74     7.77     8 1.14
b[1,4]         -6.64    2.03   5.36   -19.14   -10.09    -5.39    -2.76     1.03     7 1.08

My questions:

why are the n_eff so different, ranging from 6 to 250? I search for the meaning of n_eff and find nothing.
I have tried to increase the number of iteration and chain, but the running time is quite long. Is there anything suggestion to speed up the model?

Following is the stan model I used.


data { #these are all data elements
    int<lower=0> nplot; #number of unique plot
    int<lower=0> nspe; #number of species
    int<lower=0> ncensus; #number of census
    int<lower=0> n; #number of survival observations
    int<lower=0> ncov; #number of covariates
    int<lower=0> plot[n]; #plot for each survival observation
    int<lower=0> spe[n]; #species of each survival observation
    int<lower=0> census[n]; #census for each survival observation
    vector[nspe] meanT;  #trait values for each species
    matrix[n,ncov] COV; #covariate values for each observation, where first column is all equal to ‘1’ for intercept, second column is NC, third column is NCI, fourth column is NCIH, fifth column is log.DBH
    int<lower=0,upper=1> status[n]; #survival observations 
}

parameters {
    real bet[5]; 
    real b[nspe,ncov]; 
    real int2[nplot]; 
    real int3[ncensus]; 
    real <lower=0.00001> sigmab[ncov]; 
    real int1[ncov]; 
    real <lower=0.00001> sigmaplot; 
    real <lower=0.00001> sigmacensus; 
}

model {

sigmacensus~gamma(100,100); 
sigmaplot~gamma(100,100); 

for(i in 1:4)
{
bet[i]~normal(0,100); 
}

for(i in 1:ncov)
{

int1[i]~normal(0,100); 
sigmab[i]~gamma(100,100);  

}

    for(j in 1:nspe)
    {
        b[j,1]~normal(int1[1]+bet[1]*meanT[j],sigmab[1]); 
        b[j,2]~normal(int1[2]+bet[2]*meanT[j],sigmab[2]); 
        b[j,3]~normal(int1[3],sigmab[3]); 
        b[j,4]~normal(int1[4],sigmab[4]); 
		b[j,4]~normal(int1[4],sigmab[4]);
    }
    
    for(i in 1:nplot)
    {
        int2[i]~normal(0,sigmaplot); 
    }
   
    for(i in 1:ncensus)
    {
        int3[i]~normal(0,sigmacensus); 
    }
    
    for(i in 1:n)
    {
        status[i]~bernoulli_logit(b[spe[i],1]*COV[i,1]+b[spe[i],2]*COV[i,2]+b[spe[i],3]*COV[i,3]+b[spe[i],4]*COV[i,4]+b[spe[i],5]*COV[i,5]
		int2[plot[i]]+
		int3[census[i]]) ;  
		
    } 

}

Bob_Carpenter · May 24, 2018, 4:48am

n_eff is for effective sample size. How it’s calculated is explained in the manual in the chapter on MCMC. We estimate it a little more conservatively than most systems.

Your model probably hasn’t converged yet with an effective sample size of 6. Try running it for 2000 iterations.

Is your model consistent with your data? Do you really expect sigmacensus to be very close to 1?

For speed, you need to vectorize all those sampling statements and vector multiplies (meanT will have to be declared as a vector), e.g.,

b[ , 1] ~ normal(int[1] + beta[1] * meanT, sigmab[1]);

Given the big expression for the mean in the bernoulli-logit, I’d be careful to check for identifiability. That can cause problems for convergence.

All of this is discussed at length in the manual.

ShuaiFang51009 · June 3, 2018, 12:43am

Hi Bob, thank you so much for your reply! I will read the manual again.

Topic		Replies	Views
How to speed up my Stan code? Modeling rstan , fitting-issues	8	601	June 14, 2021
How to speed up my Stan code and sampling in rstan? Modeling rstan , fitting-issues , performance	8	980	May 28, 2021
How to speed up sampling in STAN when fitting a spatial survival model on a large data set? Developers rstan , techniques , fitting-issues	2	387	June 3, 2021
How to speed up model? Modeling rstan , techniques , performance	7	609	September 9, 2020
Within-chain parallelization in Stan Developers rstan , fitting-issues	2	928	June 19, 2021

How to speed up Rstan model

Related Topics