# Help with all iterations ended with a divergence and high R-hat

We want to create a joint distribution by User-Defined Functions then caculate by stan. But the warning messages in the end “all iterations ended with a divergence and high R-hat”. We also try to increasing the `adapt_delta` parameter and iterations and it seems to be work(R-hat decreased but it still larger than 1.1). We think chains have not mixed still. Some expertor said “Reparameterization” is first. But in
stan-users-guide, it is only example for Beta and Dirichlet Priors or Normal Distribution. For user-defined distribution, what should we do or could we transform priors(probit and logit). I would be really grateful in case you have any advice here.
Thanks a lot!

``````functions{
real joint_lpdf(matrix p_i, vector alpha, vector beta, real omega){
}
data{
int N;
int Nt;
int Ns;

int TP[N];
int Dis[N];
int TN[N];
int NDis[N];
int Study[N];
int Test[N];
int Total[N]; //n_ijk
}
parameters{
vector<lower=0, upper=1> MU[Nt]; //mu_jk
vector<lower=0, upper=1> delta[Nt]; //delta_jk
vector<lower=0, upper=1> theta; //theta_j
matrix<lower=0, upper=1>[Ns, 3] p_i[Nt]; //pi_ijk
vector[Nt] omega; //omega_k
}
transformed parameters{
vector<lower=0> alpha[Nt]; //alpha_jk
vector<lower=0> beta[Nt]; //beta_jk

for(k in 1:Nt){
alpha[k] = (MU[k]).*((1 - (theta).*(delta[k]))./((theta).*(delta[k])));
beta[k] = (1 - MU[k]).*((1 - (theta).*(delta[k]))./((theta).*(delta[k])));
}
}
model{
//Priors
for (k in 1:Nt){
MU[k] ~ uniform(0, 1);
delta[k] ~ uniform(0, 1);
}
theta ~ uniform(0, 1);
omega ~ normal(0, 5);

for (k in 1:Nt)
p_i[k] ~ joint(alpha[k], beta[k], omega[k]);

for (n in 1:N){
TP[n] ~ binomial(Dis[n], p_i[Test[n]][Study[n], 1]);
TN[n] ~ binomial(NDis[n], p_i[Test[n]][Study[n], 2]);
Dis[n] ~ binomial(Total[n], p_i[Test[n]][Study[n], 3]);
}

}
generated quantities{
vector[3*N] loglik;

for (n in 1:N)
loglik[n] = binomial_lpmf(TN[n]| NDis[n], p_i[Test[n]][Study[n], 1]);

for (n in (N+1):(2*N))
loglik[n] = binomial_lpmf(TN[n-N]| NDis[n-N], p_i[Test[n-N]][Study[n-N], 2]);

for (n in (2*N+1):(3*N))
loglik[n] = binomial_lpmf(TN[n-2*N]| NDis[n-2*N], p_i[Test[n-2*N]][Study[n-2*N], 3]);

}
"

``````

I would be really grateful in case you have any advice here.
Thanks a lot!

Sorry to see you have problems.

You seem to not have posted the actual code for `joint_lpdf` so it is hard to make guesses. In any case the recommended approach is not to start with such complex models - build a simpler model and make sure you can fit it to simulated data and only then add complexity. (e.g. think what would be the simplest possible model involving `joint_lpdf` and start with that)

Best of luck with your model!

Thank you, Sir ! We have began with a simpler model, it seems to be well. But the model becomes problematic when we add a variable in function block. Then to understand the bias, we consider first a short Markov chain like " the example of eight schools". For this lone chain, split R^ is already high,the biggest is 5.3. And all iterations ended with a divergence, the effective sample size per iteration is not reasonable. please, what should we do next? the plot of autocorrelation indicates not conververgence and 100% divergence, is that means `joint_lpdf` has problems?
I would be really grateful in case you have any advice here.
Thanks a lot!

Please share full code for both the simpler model that works and the model that doesn’t work. If possible, also share data and R/Python/… code you use to actually prepare data and run Stan. I would really need to see the source code for `joint_lpdf` which you omitted in your first post. Without that, I don’t think I can provide better advice than “think hard”. If all iterations are divergent, the most likely cause is a programming bug (e.g. non-initialized values, indexing mismatches, …). I also tried to enumerate some general strategies for this sort of problems at https://www.martinmodrak.cz/2018/02/19/taming-divergences-in-stan-models/

Thank you very much for your help!
simple.txt (2.5 KB) problem.txt (3.4 KB) R.csv (2.3 KB)
The simple model code and data are in “simple.R” and “R.csv”, another R script is the problem model. In the Function block, we want to add a variable (“f5”), so the mathematical formula will be changed. In problem model, the prompt of “non-intination” will appear in the viewer at first. And after hard work, we figured out that the problem is “f8” by looking for the problem of vectors one by one. Then rewrite it in the current code, and there’s no “log(0)” or “not finite”, but still 100% divergent. Despite we change the “adapt_delta” and “max_treedepth”, 100% divergent becomes 80% divergent at most, it doesn’t make much sense to us, and there should still be problems with the model, we think. Thanks for sharing Blog, we also thought about what we have done. Firstly, code is rewritten on the basis of a simple model. I think it should be worked. And the two models used the same data and parameters, we think it should not be a problem of parameters. Secondly, considering the posterior distribution in shinystan , we guess is it a multimodal distribution? But when we reduced data, the plot seems unimodal distribution, it still 100% divergent. Currently we are considering how to reparameterize or non-centered parametrization. We know how to reparameterize the normal distribution, Beta or Dirichlet distribution by manual, could binomial distribution be reparameterized and how to do it? We are still looking for relevant information.

We are very eager for your suggestions, I don’t know whether our ideas are feasible. Thank you very much!

I tried looking into this, but I have to admit I can’t follow the math. The `frank_lpdf` function is definitely the most suspicious (and complex) part of the model. I frequently make mistakes in implementing my own lpdfs, so I wouldn’t rule out the function computes something different than what you hope it computes

There is also definitely some potential for numerical issues with

``````    for (i in 1:r){
f3[i] = (omega*(1 - exp(-omega))*exp(-omega*(beta_cdf(p_i[i,1], alpha, beta) + beta_cdf(p_i[i,2], alpha, beta))));
f4[i] = ((1 - exp(-omega)) - (1 - exp(-omega*beta_cdf(p_i[i,1], alpha, beta)))*(1 - exp(-omega*beta_cdf(p_i[i,2], alpha, beta))));
}
return (f1 + f2 + sum(log((f3)./((f4).*(f4)))));
``````

It is usually preferable to work on the log scale always. Check the functions reference for `log_sum_exp`, `log_diff_exp` , `log1p_exp`, `beta_lcdf` and similar functions that should make it easier for you to compute `log_f3` and `log_f4` in a more stable way directly on the log scale, without ever calling `exp`. And then have

`````` return (f1 + f2 + sum(log_f3 - 2 * log_f4));
``````

(please double check that my math is correct).

If that doesn’t help, I would start with would be to have a model that tests just `frank_lpdf` to make sure it is the source of the problems, i.e. something like:

``````data {
int Nt;
int Ns;
matrix<lower=0, upper=1>[Ns, 2] p_i[Nt];
}

parameters{
vector<lower=0> alpha[Nt]; //alpha_jk
vector<lower=0> beta[Nt]; //beta_jk
vector[Nt] omega; //omega_k
}
model{

//Priors
//is this sensible for alpha and beta?
for(k in 1:Nt) {
alpha[k] ~ lognormal(0, 1);
beta[k] ~ lognormal(0, 1);
}
omega ~ normal(0, 5);

for (k in 1:Nt)
p_i[k] ~ frank(alpha[k], beta[k], omega[k]);
}
}
``````

The data for this should be built by simulating the distribution (i.e. draw `alpha` and `beta` from the lognormal, `omega` from normal, then according to whatever `frank` is, simulate `p_i` than feed this to the model. You could also extend the model to have multiple `p_i` drawn from the same `alpha`, `beta`, …

If this still diverges, I would proceed by:

• Treating some of `alpha`, `beta` or `omega` as known (passing it as data)
• Reducing to `Nt = 1` for simplicity.
• Further simplifying `frank`

Best of luck with your model!