Hi!
This is a follow-up to the earlier discussion in google groups on divergent transitions in hierarchical mpt in rstan:
https://groups.google.com/g/stan-users/c/ZTiu_ij_bC8
It concerns the divergent transitions of this type of models in rstan:
Please note: The model that I use is a for a different data set and has different mpt parameters but it follows the structure of the case study.
The solution to the problem of divergent transitions that was suggested in that thread was to set the control parameters (adapt_delta, stepsize, treedepth) to extreme values which makes the sampling really slow and does not always remove divergent transitions in complex models.
I have therefore been wondering whether there are other solutions such as change of 1) priors, 2) change of initial values or 3) reparameterization.
Concerning 1):
The MPT_5_Stan.R example model (https://github.com/stan-dev/example-models/tree/master/Bayesian_Cognitive_Modeling/CaseStudies/MPT) uses a sigma ~ cauchy(0, 2.5) prior for sigma. Since the domain restriction enforces positive values, it is really a half-cauchy: vector<lower=0>[nparams] sigma.
I noticed when sampling from the prior predictive distribution of a stan model that follows MPT_5_Stan.R that the sigma ~ cauchy(0, 2.5) prior included the value 54 in its 95 % credible interval and that even the prior predictive fitting (without the likelihood function and the constraint by the data) included divergent transitions.
I eventually got rid of the divergent transitions for the prior predictive distribution by using sigma ~ exponential(1) instead. But this alone did not completely remove the divergent transitions for the posterior predictive distribution.
Concerning 2):
I replaced the initial values in the case studies with the following to ensure that the constraints of the Cholesky decomposition were satisified for the initial values of L_Omega. Again this did not completely remove the divergent transitions but did lead to an improvement.
’
library(Matrix)
generate_valid_L_Omega ← function(nparams) {
random_matrix ← matrix(runif(nparams^2), nparams, nparams)
sym_matrix ← (random_matrix + t(random_matrix)) / 2
diag(sym_matrix) ← 1
pd_matrix ← nearPD(sym_matrix, corr = TRUE)$mat
L_Omega ← t(chol(pd_matrix))
return(L_Omega)
}
init_set ← function(nsubjs, nparams) {
L_Omega_init ← generate_valid_L_Omega(nparams)
L_init ← list(
deltahat_tilde = matrix(rnorm(nsubjs * nparams), nsubjs, nparams),
L_Omega = L_Omega_init,
sigma = runif(nparams, 0.1, 2),
mu_c_hat = rnorm(1), mu_n_hat = rnorm(1), mu_S_hat = rnorm(1),
mu_I_hat = rnorm(1), mu_ca_hat = rnorm(1), mu_na_hat = rnorm(1)
)
return(L_init)
}
’
Concerning 3):
I am considering returning to the centered parameterization instead of using the non-centered paramerization in https://github.com/stan-dev/example-models/tree/master/Bayesian_Cognitive_Modeling/CaseStudies/MPT to see whether this makes a different for a large data set.
I here upload attachment of the divergent transitions in one of the models. It does not appear that they take the form of a funnel and so I was wondering how Michael Betancourt @betanalpha would classify them - and whether there are further suggestions for how to remove them?
Finally, here are the number of leapfrog steps:
Leapfrog Steps samples_1 Counts
7 1
19 1
21 1
22 1
23 1
28 1
31 1
34 1
43 2
47 1
48 1
51 1
56 1
58 1
62 1
65 1
67 1
73 1
78 1
83 1
85 1
87 1
91 1
98 1
99 1
104 1
107 1
108 1
109 1
110 1
112 1
113 1
115 1
116 1
121 1
122 2
123 1
124 1
127 150804
133 1
138 1
142 1
151 1
172 1
179 1
185 1
188 1
191 1
224 1
225 1
226 1
235 2
241 2
253 1
255 149139