Thanks for the advice!
I have used a quite simplified model (the model block is as below), where I only remained ar/br and the flat priors of them were also shrunk.
model{
vector[N] sigma1;
vector[N] xi1;
vector[N] sigma2;
vector[N] xi2;
vector[N] rho12;
vector[N] alpha1;
vector[N] alpha2;
rho12parameter ~ uniform(-1,1);
Bc ~ normal( 0 , 10 );
Bs ~ normal( 0 , 10 );
br_norm ~ normal( 0 , 1 );
bg ~ normal( 0 , 10 );
Ac ~ normal( 0 , 10 );
As ~ normal( 0 , 10 );
ar_norm ~ normal( 0 , 1);
ag ~ normal( 0 , 100 );
Bc2 ~ normal( 0 , 10 );
Bs2 ~ normal( 0 , 10 );
bg2 ~ normal( 0 , 10 );
Ac2 ~ normal( 0 , 10 );
As2 ~ normal( 0 , 10 );
ag2 ~ uniform( 0 , 300 );//
Bc3 ~ normal( 0 , 10 );
Bs3 ~ normal( 0 , 10 );
bg3 ~ normal( 0 , 10 );
Ac3 ~ normal( 0 , 10 );
As3 ~ normal( 0 , 10 );
ag3 ~ uniform( 0 , 300 );//
xi1 = ag + ar[Rj] + As * Is + Ac * Ic + (bg + br[Rj] +Bs * Is + Bc * Ic).* mid_year_corrected ;
xi2 = ag2 + As2 * Is + Ac2 * Ic + (bg2 +Bs2 * Is + Bc2 * Ic).* mid_year_corrected ;
sigma1 = ag3 + As3 * Is + Ac3 * Ic + (bg3 +Bs3 * Is + Bc3 * Ic).* mid_year_corrected ;
sigma2 = sigma2parameter * mid_year_corrected ;
for ( i in 1:N ) {
rho12[i] = rho12parameter;
}
for ( i in 1:N ) {
alpha1[i] = alpha1parameter;
}
for ( i in 1:N ) {
alpha2[i] = alpha2parameter;
}
BMIheight~ two_d_st(xi1,xi2,sigma1,sigma2,rho12,alpha1,alpha2);
}
This model works well in my eyes, and I did some workflows for it in R:
> check_treedepth(stanfit)
0 of 1500 iterations saturated the maximum tree depth of 15.
> check_energy(stanfit)
E-BFMI indicated no pathological behavior.
> check_divergences(stanfit)
0 of 1500 iterations ended with a divergence.
I feel that solving the divergence is the root key of speeding things up, and it may be more significantly effective than any other speeding up strategies.
I am afraid that I may have to use the initial model, so now back to the bloody original model…
Regarding to hitting super deep trees, there are also kind person also saying in a similar way:
Sorry I do not know well what the correlations in posterior really means. Since all parameters in posterior generally will be correlated (they basically cannot be independent in probability), I think the correlation of these parameter samples is not surprising. Or does it mean that, from the algorithm point of view, in the stage of exploring/sampling some parameters cannot explore ‘freely’ or ‘appropriately’ because of the ‘constraint’ from the correlation with other parameters? Hence, what is the main reason causing the correlation?
The stan user guide used the Neal’s Funnel example: Stan User’s Guide, in which the stan model is
parameters {
real y;
vector[9] x;
}
model {
y ~ normal(0, 3);
x ~ normal(0, exp(y/2));
}
In this case the posterior=prior*likelihood=prior (as no likelihood defined), and we can see x and y in this case are highly correlated, and our stepsize cannot satisfy both high-density region and low-density region simultaneously well, so the model may not converge. Is it also the main similar reason for my model? However, I have tried to avoid using correlated expression between each parameter’ priors, though I cannot do anything for the likelihood part.
For the badly-fitted initial model, the pairs of some parameters and the four chains for ad_ar are:
Quite strange results. Many MCMC samples are even piling up on the edge of priors unif(0,300) (The green chain in traceplot is not constant, but moving in a very small size). Any good suggestion here? Thanks a lot.
Besides, putting tighten on some priors can improve the divergence, but cannot speed up sampling significantly because the sampler is still hitting the maximum tree depth very often. However, the treaceplot (see below) looks better (one strange chain is constant, but at least quite better than the figure above), and perhaps I need adjust priors further. Thank you for your previous help.