Beta Distribution HMC sampling, Divergent transition, Bulk and Tail ESS

I’m defining a really simple Bayesian model, with ikelihood function a Beta distribution parametrized in terms of mean \mu = a /(a+b) and size of sample s = a+ b, if a and b are a standard Beta distribution parameters.


I provide the data in R

p1 = c(23.5, 23.0, 23.1, 23.1, 23.5, 23.0, 21.8, 22.5, 23.7, 24.0)/100
p2 = c(NA, NA, NA, NA, NA, 16.7, 16.9, 18.2, 19.0, 18.9)/100
p3 = c(17.94, 18.51, 18.74, 18.33, 19.57, 17.31, 17.81, 19.08, 19.66,19.73)/100
p4 = c(18.0, 17.6, 21.0, 20.8, 21.2, 19.5, 20.8, 20.8, 24.9, NA)/100


TT = 10
N = 4
p = rbind(p1, p2, p3, p4)

I have N rows and T observations for each row. There are cells which have NA however I treat them in the rstan code hopefully correctly. So, my rstan code is the following

stan_model_code = "
data {
  int<lower=1> TT;
  int<lower=1> N;
  matrix<lower=0,upper=1>[N, TT] p;
}

parameters {
  matrix<lower=0,upper=1>[N, TT] mu_t;
  matrix<lower=0>[N, TT] s;
}

transformed parameters {
  matrix<lower=0>[N,TT] a_t;
  matrix<lower=0>[N,TT] b_t;
  for(n in 1:N){
    for(t in 1:TT){
    if(p[n,t]!=0){
       a_t[n,t] = mu_t[n,t] * s[n,t];
       b_t[n,t] = (1-mu_t[n,t]) * s[n,t];
    }else{
       a_t[n,t] = 0;
       b_t[n,t] = 0;
    }
   }
  }
}  

model {
  for(n in 1:N){
    for(t in 1:TT){
      if(p[n,t]!=0){
        target += beta_lpdf(mu_t[n,t] | 1, 1);
        target += beta_lpdf(p[n,t] | a_t[n,t],b_t[n,t]);
        target += normal_lpdf(s[n,t] |0,1);
      }
    }
  }
}
"

library(rstan)
p[is.na(p)] = 0
s[is.na(s)] = 0
stan_model <- stan_model(model_code = stan_model_code)

fit <- sampling(stan_model, data = list(TT = TT, N = N, p = p), iter = 5000,
                chains = 2)

The model looks simple, however, when I run it, the results go crazy :D

Warning messages:
1: There were 5000 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them. 
2: Examine the pairs() plot to diagnose sampling problems
 
3: The largest R-hat is NA, indicating chains have not mixed.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#r-hat 
4: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess 
5: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#tail-ess 

The Rhat is NA, it is because of the IF statement that I use in the rstan code to treat the values p[n,t]==0, i.e. to exclude cases which correspond to NA. Is it clear to anyone what this piece of code has such a weird behaviour?

The model has parameters mu_t and s for every cell, even those with p=NA. These extra parameters are not harmful per se but omitting the priors in p=NA case means they have implicit improper prior.
Try moving the priors outside the if block:

model {
  for(n in 1:N){
    for(t in 1:TT){
      target += beta_lpdf(mu_t[n,t] | 1, 1);
      if(p[n,t]!=0){
        target += beta_lpdf(p[n,t] | a_t[n,t],b_t[n,t]);
      }
      target += normal_lpdf(s[n,t] |0,1);
    }
  }
}
1 Like

With a model like this, I would expect to see a strong hierarchical prior on mu and s (an example like this, done this way, is literally the first example of a hierarchical model in Gelman et al.'s textbook BDA—chapter 5).

In your case, you don’t need to do the transform back to a and b. We have the mean/concentration parameterization built-in as beta_proportion():

https://mc-stan.org/docs/functions-reference/continuous_distributions_on_0_1.html#beta-proportion-distribution

Ideally, you’d calculate the pairs of (n, t) for which p[n, t] != 0 ahead of time in transformed data and then put them in a an array of pairs nts and then use something like this:

for (nt in nts) {
  int n = nt[1];
  int t = nt[2];
  p[n, t] ~ beta_proportion(a[n, t], b[n, t]);
}

The beta(1, 1) is uniform and a no-op in Stan other than to check that all arguments satisfy required bounds, so you can drop it.

The other thing to do would be to use a zero-inflated beta to model the number of times p[n, t] == 0. Otherwise, there’s no way to turn this into a properly generative model as there’s not a model for the p being 0 anywhere.