Sanity check: Am I covering up a problem, or just tuning the parameters?

I’m trying to understand some aberrant estimates from my model. I’m working through my workflow with the simplest model on simulated data (below). I can make the problem “go away” with more tightly defined priors, but I don’t know if that’s the right move. I’d eventually like to make a hierarchical model, so I thought it worth trying to understand before adding more complexity.

I want to fit the model:

y\sim\text{Lognormal}(\mu,\sigma) \\

such that

E[y] = \frac{(g-\gamma)}{(r-\gamma)}

where r, g, and \gamma are (non-negative) parameters to be estimated.

By the properties of the log-normal distribution:

E[y] = \exp\left( \mu+\frac{\sigma^2}{2} \right)

Therefore,

\mu = \ln \left( \frac{g-\gamma}{r-\gamma}\right)-\frac{\sigma^2}{2}

Clearly, \mu is defined only when the argument to the logarithm is positive. In other words, r,g>\gamma. I will constrain those in the parameters block with <lower=gamma>. (I think this is relevant.)

The Stan code below simulates 1,000 observations in the transformed data block, and attempts to recover the seeded parameter values (\gamma=0.05,g=0.10, r=0.15) in the model block.

transformed data {
  int N = 1000;

  // Seed parameter values
  real gamma_ = 0.05;
  real g_ = 0.10;
  real r_ = 0.15;
  real sigma_ = 1;
  
  // Expected value
  real y_bar = (g_-gamma_)/(r_-gamma_);

  // Draw observations from lognormal
  vector[N] y; for (n in 1:N) y[n] = lognormal_rng(log(y_bar)-sigma_^2/2,sigma_);
}

parameters {
  real<lower=0> gamma;
  real<lower=gamma> g;
  real<lower=gamma> r;
  real<lower=0> sigma;
}

transformed parameters {
  real<lower=0> mu_hat = (g-gamma)/(r-gamma) ;
}

model {
  // Priors
  gamma ~ gamma(1,1);
  g ~ gamma(1,1);
  r ~ gamma(1,1);
  sigma ~ gamma(1,1);
  
  // Likelihood
  y ~ lognormal(log(mu_hat)-sigma^2/2,sigma);
}

In the above, the priors are defined such that the prior mean is 1. I would hope the posterior mean converges on the seeded values. The results are not great:

4 chains, each with iter=1000; warmup=500; thin=1; 
post-warmup draws per chain=500, total post-warmup draws=2000.

      mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
gamma 0.33    0.01 0.33 0.01 0.10 0.23 0.46  1.20  1005    1
g     0.99    0.02 0.57 0.18 0.57 0.89 1.32  2.37   758    1
r     1.68    0.04 1.02 0.31 0.94 1.49 2.20  4.20   715    1
sigma 1.00    0.00 0.02 0.96 0.99 1.00 1.02  1.05  1172    1

The posterior for \gamma at least moved in the right direction. The posterior for g not so much. But the posterior for r has moved further away from the true value… by a lot. That’s what startles me.

The pairs plot indicates high collinearity between g and r:

I can improve the estimates with more tightly defined priors.

model {
  // Priors
  gamma ~ gamma(1,10);
  g ~ gamma(1,10);
  r ~ gamma(1,10);
  sigma ~ gamma(1,1);
  // ...

Then the results look much better (the pairs plot looks the same), but a similar patter emerges. The priors are defined such that the prior mean is 0.10. The posterior for \gamma moved down in the right direction. The posterior for g didn’t really need to move. The posterior for r again increased, but at least this time towards the true value.

4 chains, each with iter=1000; warmup=500; thin=1; 
post-warmup draws per chain=500, total post-warmup draws=2000.

      mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
gamma 0.03       0 0.03 0.00 0.01 0.02 0.05  0.12  1399 1.00
g     0.10       0 0.06 0.02 0.06 0.09 0.13  0.24   828 1.01
r     0.17       0 0.10 0.03 0.10 0.15 0.22  0.42   710 1.00
sigma 0.98       0 0.02 0.94 0.97 0.98 1.00  1.02  1029 1.00

Is this just a matter of “tuning” the priors? Or am I covering up a problem, since I didn’t get the results I wanted, so I just ratchet the priors 'till I get there?

If I had to guess I’d say it looks like there’s some non identifiability with g and r going on – you can see the ridge in the pairs plot and I think with E(y) = (g-\gamma)/(r-\gamma) any pair of g, r that keep the ratio the same will produce the same likelihood.

The beauty of Bayesian inference is that you can get inference for non-identifiable and otherwise degenerate likelihoods by stabilizing them with cleverly chosen priors. The downside is, well, the inference depends somewhat on the choice of prior.

4 Likes

As @js592 mentions, this looks like classical model degeneracy. There are many combinations of your parameters that are consistent with your observations. See Michael Betancourt’s excellent case study on model degeneracy for some helpful identification and resolution strategies and for some examples that are similar to yours here Identity Crisis

Your options might include: providing more information for g, \gamma, and r via more informative priors if you can do so based upon domain expertise, not simply using the priors as some sort of tuning knob; bringing in additional data that is specific to those individual parameters, such that you can isolate them within separate likelihood within your model, for example; or perhaps you can’t learn much about the individual parameters at all but rather need to combine them into a couple of or single parameter.

3 Likes

Thank you!! Yeah, it’s obvious now that for any value of \mu (and \sigma), g, r and \gamma are not unique. Fortunately, I can model r and g with additional data specific to those parameters to help isolate them, so I have a path forward. Let’s suppose I then revise the priors:

r\sim f_r(x_1;\theta_1) \\ g\sim f_g(x_2;\theta_2)

for some additional data x_1 and x_2, and additional parameters \theta_1 and \theta_2. Can I please confirm that I coded the constraints correctly? That, in other words, I just add <lower=gamma> in the parameters block like I did above?