Hi all.
I’m trying to fit a computational model of a Bayesian state space model, that includes three parameters per participants - tau (corresponding with 1-process noise), gamma (corresponding with 1-observation noise) and w (corresponding with the probability for random responding). The model is quite complex, and so I attach only the parameters and model parts (the model itself is within a function).
Here is the thing - I’m using a non-centered parameterization, but am still getting some divergent transitions (~40, using 3 chains of 1000 iterations, 400 of which are warmup. Note this this already takes a few days to finish sampling, so any larger numbers would take even more time).
However, posterior predictive checks show that the model is working. And when I run the computational model for individual participants (in a non-hierarchical model) no divergences occur.
Even stranger, when examining the parallel coordinates plots (attached herewith) there is no evidence that divergences are concentrated in a specific range of any parameter. I was also unable to find any evidence for bad geometry using pair plots (I can’t attach all of them but a sample including the immediate suspects can be found in the file). Finally, I also attach the plot comparing lp and acceptance for divergent vs. non-divergent transitions (I’m not sure exactly how to interpret it, but I can’t see any clear distinction here too).
Divergences plots.pdf (806.1 KB)
Another thing I can say is that it seems that chains with a lower stepsize (stepsizes ranging from 0.137 to 0.183) also have less divergences - Is this evidence that the problem can be solved by lowering initial step size and increasing adapt_delta? More generally, my question is - how can I proceed? Are these real divergences, or are they a sort of a false alarm?
One last thing I think I should note - while trying to find init values, I often get the warning “error evaluation log probability at initial value - Exception: bernoulli_lpmf probability parameter is 1, but most be in the internval [0,1]”. This happens because in some combinations of parameters pRight can be equal to 1 or 0. I’m not sure whether this might be causing the divergences - but I would assume that if this was the case, we would see them also in non-hierachical models ran for individual participants.
Also - a more technical question - is there any way to let the sampler record the parameter values of the divergent transitions themselves (rather than only their starting point?). I though that this might help.
Any advice would be much appreciated!
Itzik,
parameters{
vector [nSubj] tau0;
vector [nSubj] gamma0;
vector [nSubj] w0;
real TAU0;
real GAMMA0;
real W0;
real<lower=0> TAU0_SD;
real<lower=0> GAMMA0_SD;
real<lower=0> W0_SD;
}
transformed parameters{
vector[nSubj] tau;
vector[nSubj] gamma;
vector[nSubj] w;
real TAU;
real GAMMA;
real W;
tau=Phi_approx(tau0*TAU0_SD+TAU0);
gamma=Phi_approx(gamma0*GAMMA0_SD+GAMMA0)/2+0.5; //the additional transformation is used to keep gamma from 0.5 to 1
w=Phi_approx(w0*W0_SD+W0);
GAMMA=Phi_approx(GAMMA0)/2+0.5; //the additional transformation is used to keep gamma from 0.5 to 1
TAU=Phi_approx(TAU0);
W=Phi_approx(W0);
}
model{
vector [nTrials] pRight[nSubj];
tau0~normal(0,1);
gamma0~normal(0,1);
w0~normal(0,1);
TAU0~normal(0,1);
GAMMA0~normal(0,1);
W0~normal(0,1);
TAU0_SD~normal(0,5);
GAMMA0_SD~normal(0,5);
W0_SD~normal(0,5);
for (s in 1:nSubj){
pRight[s]=Bmodel1(cues[s],targets[s],nTrials,gamma[s],tau[s],w[s])[,1];
for (t in 1:nTrials){
if(response[s,t]!=999){ //This is just used to ignore missing responses marked by 999
response[s,t]~bernoulli(pRight[s,t]);
}
}
}
}