# How to speed up Rstan model

Hi all,
I’m new to stan.
The purpose of my analysis is to detect how the neighbor crowding influence the survival of focal tree. The number of survival observations is about 20,000.
This is some results of the model, I set iter = 500, chains = 1 and algorithm=“HMC”.

``````
mean se_mean     sd     2.5%      25%      50%      75%    97.5% n_eff Rhat
bet[1]          0.51    0.02   0.28     0.03     0.31     0.48     0.71     1.05   250 1.00
bet[2]          1.41    0.07   1.04    -0.53     0.76     1.36     2.06     3.81   250 1.00
bet[3]          9.08    6.39 101.02  -165.10   -59.97     1.92    69.82   239.96   250 1.00
bet[4]          2.67   12.49  80.92  -153.52   -57.21    -4.51    54.76   175.49    42 1.01
bet[5]         13.75    5.93  93.81  -156.52   -55.79     3.64    78.66   192.58   250 1.00
b[1,1]          1.15    0.04   0.69    -0.19     0.65     1.20     1.62     2.45   250 1.00
b[1,2]          7.02    3.66   9.03    -5.46     0.76     4.28    11.58    28.70     6 1.14
b[1,3]         -0.49    1.70   4.69   -10.67    -2.88     0.35     2.74     7.77     8 1.14
b[1,4]         -6.64    2.03   5.36   -19.14   -10.09    -5.39    -2.76     1.03     7 1.08
``````

My questions:

1. why are the n_eff so different, ranging from 6 to 250? I search for the meaning of n_eff and find nothing.
2. I have tried to increase the number of iteration and chain, but the running time is quite long. Is there anything suggestion to speed up the model?

Following is the stan model I used.

``````
data { #these are all data elements
int<lower=0> nplot; #number of unique plot
int<lower=0> nspe; #number of species
int<lower=0> ncensus; #number of census
int<lower=0> n; #number of survival observations
int<lower=0> ncov; #number of covariates
int<lower=0> plot[n]; #plot for each survival observation
int<lower=0> spe[n]; #species of each survival observation
int<lower=0> census[n]; #census for each survival observation
vector[nspe] meanT;  #trait values for each species
matrix[n,ncov] COV; #covariate values for each observation, where first column is all equal to ‘1’ for intercept, second column is NC, third column is NCI, fourth column is NCIH, fifth column is log.DBH
int<lower=0,upper=1> status[n]; #survival observations
}

parameters {
real bet[5];
real b[nspe,ncov];
real int2[nplot];
real int3[ncensus];
real <lower=0.00001> sigmab[ncov];
real int1[ncov];
real <lower=0.00001> sigmaplot;
real <lower=0.00001> sigmacensus;
}

model {

sigmacensus~gamma(100,100);
sigmaplot~gamma(100,100);

for(i in 1:4)
{
bet[i]~normal(0,100);
}

for(i in 1:ncov)
{

int1[i]~normal(0,100);
sigmab[i]~gamma(100,100);

}

for(j in 1:nspe)
{
b[j,1]~normal(int1[1]+bet[1]*meanT[j],sigmab[1]);
b[j,2]~normal(int1[2]+bet[2]*meanT[j],sigmab[2]);
b[j,3]~normal(int1[3],sigmab[3]);
b[j,4]~normal(int1[4],sigmab[4]);
b[j,4]~normal(int1[4],sigmab[4]);
}

for(i in 1:nplot)
{
int2[i]~normal(0,sigmaplot);
}

for(i in 1:ncensus)
{
int3[i]~normal(0,sigmacensus);
}

for(i in 1:n)
{
status[i]~bernoulli_logit(b[spe[i],1]*COV[i,1]+b[spe[i],2]*COV[i,2]+b[spe[i],3]*COV[i,3]+b[spe[i],4]*COV[i,4]+b[spe[i],5]*COV[i,5]
int2[plot[i]]+
int3[census[i]]) ;

}

}

``````

`n_eff` is for effective sample size. How it’s calculated is explained in the manual in the chapter on MCMC. We estimate it a little more conservatively than most systems.

Your model probably hasn’t converged yet with an effective sample size of 6. Try running it for 2000 iterations.

Is your model consistent with your data? Do you really expect sigmacensus to be very close to 1?

For speed, you need to vectorize all those sampling statements and vector multiplies (`meanT` will have to be declared as a vector), e.g.,

``````b[ , 1] ~ normal(int[1] + beta[1] * meanT, sigmab[1]);
``````

Given the big expression for the mean in the bernoulli-logit, I’d be careful to check for identifiability. That can cause problems for convergence.

All of this is discussed at length in the manual.