Divergent Transitions

shloksobti · July 28, 2021, 8:43pm

Hello community,
I’m trying to infer intervals (lo, hi) from data. I’m getting reasonable inferences, however I get pretty high divergences. Any help would be appreciated.

data {
  int<lower=0> N;
  real<lower=-10, upper=10> Y[N];
}

parameters {
  real<lower=-10, upper=10> center;
  real<lower=0, upper=20> width;

  real<lower=-10, upper=10> lo_gaus;
  real<lower=lo_gaus, upper=10> hi_gaus;
}

transformed parameters {
  real lo = center - width/2;
  real hi = center + width/2;
}

model {
  lo_gaus ~ normal(lo, 0.2);
  hi_gaus ~ normal(hi, 0.2);

  Y ~ uniform(lo_gaus, hi_gaus);
}

jsocolar · July 28, 2021, 9:42pm

What information do you want to infer from this model besides the fact that lo must be somewhere below the lowest observation and hi must be somewhere above the highest observation?

Edit: there’s a typo in the above paragraph. lo should say lo_gaus and hi should say hi_gaus.

If you think this model is telling you something useful, you might be able to tame the divergences by putting an upper bound on the declaration of lo_gaus equal to the minimum value of Y in the data, and a lower bound on the declaration of hi_gaus equal to the maximum value of Y in the data. This won’t change the model at all, since it already sees zero posterior probability outside of these bounds. Once you’ve done that, you can also eliminate the data Y as well and replace Y ~ uniform(lo_gaus, hi_gaus); with target += N*log(1/(hi_gaus - lo_gaus)). That will again be literally the same model, and it might not produce divergences. But again, I’m skeptical that this model gives you the inference you want, because its behavior is so heavily influenced by the your choice of prior, which is encoded non-generatively by the sampling statements plus the constraints in your parameters block.

Edited to add: if you have a lot of data then the behavior won’t be dominated by the prior, but the likelihood will just tell you that lo_gaus is very close to the lowest data point and high_gaus is very close to the highest data point.

Edit 2: See my final post on this thread: I think I was overly pessimistic about this model in this post due to some sloppy thinking on my part. I do still encourage formulating principled priors, however.

shloksobti · July 28, 2021, 9:59pm

Thanks! I do agree with most of what you say.

Just a little confusion, I don’t quite follow why lo MUST be below the data, hi be above the data. Could you explain how you reached that conclusion?

jsocolar · July 28, 2021, 10:01pm

Sure. Suppose the data is -5. If if lo_gaus > -5 (say lo_gaus = -4) then we have -5 ~ uniform(-4, hi_gauss). The likelihood is zero; we can’t see data of -5 if lo_gaus is -4.

shloksobti · July 28, 2021, 10:03pm

Yes, lo_gaus will be below the data. But not lo. Is the right?

jsocolar · July 28, 2021, 10:04pm

Yes, sorry! Typo in my original post!

shloksobti · July 28, 2021, 10:06pm

Thanks, you’ve been really helpful!
Conceptually, do you believe that any Bayesian Approximation for such interval approximation is overkill? One thing I want to do is include sigma parameters instead of using 0.2, such that to estimate the ‘spread’ of each interval.

jsocolar · July 28, 2021, 10:17pm

I think that there is literally no information in the data about the size of the sigma parameters, apart from the fact that large sigma values will become less likely when the spread of Y pushes out near the limits of the implied prior on hi or lo.

Bayesian estimation for the interval isn’t necessarily overkill. The estimator clearly has the right broad outlines in terms of its properties: bounded to be outside the range of the data, and pulled closer to the extremes of the data when there is increasingly more data. Of course it displays these properties in the right amount only when the data are truly uniform. In particular, it is going to be very sensitive to outliers, so you need to be sure that your data are uniform even in the tails before using this method to understand where the vast majority of the data lie.
But I do think that it’s worth thinking very carefully about the prior here to understand what the implied prior model for the bounds is. Like, if you retained the same max and min values for Y, but eliminated the sampling statement, so that the estimated bounds are still constrained by the range of the data but are not at all regularized to be near the limits of the data, does that look like a reasonable prior model?

shloksobti · July 28, 2021, 10:20pm

Thank you! :)

jsocolar · July 28, 2021, 10:31pm

Also just to say, I think I was overly pessimistic about this model and your prior in my original post. I think that with a reasonable amount of truly uniform data, your original model (with modifications to avoid divergences) would be a fine way to estimate the bounds, particularly if you want/need to carry uncertainty in the positions of the bounds forward into downstream analysis. I was initially overlooking that the uniform distribution itself regularizes its estimated bounds to be near its extremes, and this built-in regularization is exactly what you want.

Topic		Replies	Views
Divergences in a non-centered computational model Modeling fitting-issues	21	1321	October 30, 2019
Divergent transitions in hierarchical model Modeling fitting-issues	26	1914	November 7, 2019
Help reparameterize GP model to remove divergent transitions Modeling rstan , techniques , fitting-issues , performance	33	1803	February 22, 2022
Divergent transitions for hierarchical model for binomial proportion Modeling	1	345	February 21, 2023
Divergent Transitions when Scaling Hierarchical Model Modeling	3	401	October 18, 2022

Divergent Transitions

Related topics