Help with all iterations ended with a divergence and high R-hat

shary · June 1, 2020, 10:50am

We want to create a joint distribution by User-Defined Functions then caculate by stan. But the warning messages in the end “all iterations ended with a divergence and high R-hat”. We also try to increasing the adapt_delta parameter and iterations and it seems to be work(R-hat decreased but it still larger than 1.1). We think chains have not mixed still. Some expertor said “Reparameterization” is first. But in
stan-users-guide, it is only example for Beta and Dirichlet Priors or Normal Distribution. For user-defined distribution, what should we do or could we transform priors(probit and logit). I would be really grateful in case you have any advice here.
Thanks a lot!

functions{
    real joint_lpdf(matrix p_i, vector alpha, vector beta, real omega){
}
data{
    int N;
    int Nt;
    int Ns;

    int TP[N];
    int Dis[N];
    int TN[N];
    int NDis[N];
    int Study[N];
    int Test[N];
    int Total[N]; //n_ijk 
}
parameters{
    vector<lower=0, upper=1>[3] MU[Nt]; //mu_jk
    vector<lower=0, upper=1>[3] delta[Nt]; //delta_jk
    vector<lower=0, upper=1>[3] theta; //theta_j
    matrix<lower=0, upper=1>[Ns, 3] p_i[Nt]; //pi_ijk
    vector[Nt] omega; //omega_k
}
transformed parameters{
	vector<lower=0>[3] alpha[Nt]; //alpha_jk
	vector<lower=0>[3] beta[Nt]; //beta_jk
 

	for(k in 1:Nt){
		alpha[k] = (MU[k]).*((1 - (theta).*(delta[k]))./((theta).*(delta[k])));
		beta[k] = (1 - MU[k]).*((1 - (theta).*(delta[k]))./((theta).*(delta[k])));
	}	
}
model{
	//Priors
    for (k in 1:Nt){
        MU[k] ~ uniform(0, 1);
        delta[k] ~ uniform(0, 1);
    }
    theta ~ uniform(0, 1);
	omega ~ normal(0, 5);

	for (k in 1:Nt)
		p_i[k] ~ joint(alpha[k], beta[k], omega[k]);

    for (n in 1:N){
        TP[n] ~ binomial(Dis[n], p_i[Test[n]][Study[n], 1]);
        TN[n] ~ binomial(NDis[n], p_i[Test[n]][Study[n], 2]);
        Dis[n] ~ binomial(Total[n], p_i[Test[n]][Study[n], 3]);
    }

}
generated quantities{
    vector[3*N] loglik;

    for (n in 1:N)
        loglik[n] = binomial_lpmf(TN[n]| NDis[n], p_i[Test[n]][Study[n], 1]);

    for (n in (N+1):(2*N))
        loglik[n] = binomial_lpmf(TN[n-N]| NDis[n-N], p_i[Test[n-N]][Study[n-N], 2]);

for (n in (2*N+1):(3*N))
        loglik[n] = binomial_lpmf(TN[n-2*N]| NDis[n-2*N], p_i[Test[n-2*N]][Study[n-2*N], 3]);

}
"

I would be really grateful in case you have any advice here.
Thanks a lot!

martinmodrak · June 6, 2020, 1:06pm

Sorry to see you have problems.

You seem to not have posted the actual code for joint_lpdf so it is hard to make guesses. In any case the recommended approach is not to start with such complex models - build a simpler model and make sure you can fit it to simulated data and only then add complexity. (e.g. think what would be the simplest possible model involving joint_lpdf and start with that)

Best of luck with your model!

shary · June 8, 2020, 5:57am

Thank you, Sir ! We have began with a simpler model, it seems to be well. But the model becomes problematic when we add a variable in function block. Then to understand the bias, we consider first a short Markov chain like " the example of eight schools". For this lone chain, split R^ is already high,the biggest is 5.3. And all iterations ended with a divergence, the effective sample size per iteration is not reasonable. please, what should we do next? the plot of autocorrelation indicates not conververgence and 100% divergence, is that means joint_lpdf has problems?
I would be really grateful in case you have any advice here.
Thanks a lot!

martinmodrak · June 8, 2020, 10:52am

Please share full code for both the simpler model that works and the model that doesn’t work. If possible, also share data and R/Python/… code you use to actually prepare data and run Stan. I would really need to see the source code for joint_lpdf which you omitted in your first post. Without that, I don’t think I can provide better advice than “think hard”. If all iterations are divergent, the most likely cause is a programming bug (e.g. non-initialized values, indexing mismatches, …). I also tried to enumerate some general strategies for this sort of problems at https://www.martinmodrak.cz/2018/02/19/taming-divergences-in-stan-models/

shary · June 18, 2020, 3:45am

Thank you very much for your help!
simple.txt (2.5 KB) problem.txt (3.4 KB) R.csv (2.3 KB)
The simple model code and data are in “simple.R” and “R.csv”, another R script is the problem model. In the Function block, we want to add a variable (“f5”), so the mathematical formula will be changed. In problem model, the prompt of “non-intination” will appear in the viewer at first. And after hard work, we figured out that the problem is “f8” by looking for the problem of vectors one by one. Then rewrite it in the current code, and there’s no “log(0)” or “not finite”, but still 100% divergent. Despite we change the “adapt_delta” and “max_treedepth”, 100% divergent becomes 80% divergent at most, it doesn’t make much sense to us, and there should still be problems with the model, we think. Thanks for sharing Blog, we also thought about what we have done. Firstly, code is rewritten on the basis of a simple model. I think it should be worked. And the two models used the same data and parameters, we think it should not be a problem of parameters. Secondly, considering the posterior distribution in shinystan , we guess is it a multimodal distribution? But when we reduced data, the plot seems unimodal distribution, it still 100% divergent. Currently we are considering how to reparameterize or non-centered parametrization. We know how to reparameterize the normal distribution, Beta or Dirichlet distribution by manual, could binomial distribution be reparameterized and how to do it? We are still looking for relevant information.

We are very eager for your suggestions, I don’t know whether our ideas are feasible. Thank you very much!

martinmodrak · June 23, 2020, 1:45pm

I tried looking into this, but I have to admit I can’t follow the math. The frank_lpdf function is definitely the most suspicious (and complex) part of the model. I frequently make mistakes in implementing my own lpdfs, so I wouldn’t rule out the function computes something different than what you hope it computes

There is also definitely some potential for numerical issues with

    for (i in 1:r){
      f3[i] = (omega*(1 - exp(-omega))*exp(-omega*(beta_cdf(p_i[i,1], alpha[1], beta[1]) + beta_cdf(p_i[i,2], alpha[2], beta[2]))));
      f4[i] = ((1 - exp(-omega)) - (1 - exp(-omega*beta_cdf(p_i[i,1], alpha[1], beta[1])))*(1 - exp(-omega*beta_cdf(p_i[i,2], alpha[2], beta[2]))));
    }
    return (f1 + f2 + sum(log((f3)./((f4).*(f4)))));

It is usually preferable to work on the log scale always. Check the functions reference for log_sum_exp, log_diff_exp , log1p_exp, beta_lcdf and similar functions that should make it easier for you to compute log_f3 and log_f4 in a more stable way directly on the log scale, without ever calling exp. And then have

 return (f1 + f2 + sum(log_f3 - 2 * log_f4));

(please double check that my math is correct).

If that doesn’t help, I would start with would be to have a model that tests just frank_lpdf to make sure it is the source of the problems, i.e. something like:

data {
  int Nt;
  int Ns;
  matrix<lower=0, upper=1>[Ns, 2] p_i[Nt];
}

parameters{
  vector<lower=0>[2] alpha[Nt]; //alpha_jk
  vector<lower=0>[2] beta[Nt]; //beta_jk
  vector[Nt] omega; //omega_k
}
model{

  //Priors
  //is this sensible for alpha and beta?
  for(k in 1:Nt) {
    alpha[k] ~ lognormal(0, 1);
    beta[k] ~ lognormal(0, 1);
  }
  omega ~ normal(0, 5);
  

  for (k in 1:Nt)
    p_i[k] ~ frank(alpha[k], beta[k], omega[k]);
  }
}

The data for this should be built by simulating the distribution (i.e. draw alpha and beta from the lognormal, omega from normal, then according to whatever frank is, simulate p_i than feed this to the model. You could also extend the model to have multiple p_i drawn from the same alpha, beta, …

If this still diverges, I would proceed by:

Treating some of alpha, beta or omega as known (passing it as data)
Reducing to Nt = 1 for simplicity.
Further simplifying frank

Best of luck with your model!

Topic		Replies	Views
Too more divergent transitions and too large R-hat in bayesian hierarchical generalized extreme value model Modeling rstan	8	161	May 5, 2025
Divergent transitions in Beta model General fitting-issues , specification , performance , divergences	7	1280	February 7, 2022
Divergent Transitions with R_hat < 1.1 Modeling fitting-issues , performance , shinystan	14	2381	June 13, 2019
R package based on rstan -- user-defined prior distributions General rstan , rstanarm	0	404	May 2, 2023
Convergence of multiple chains Modeling rstan , fitting-issues	5	348	March 27, 2024

Help with all iterations ended with a divergence and high R-hat

Related topics