Issue with regression model

I have the following model:

\begin{align} d &\sim \text{gamma}(\alpha,\alpha/\mu) \\ x &= \text{gamma\_pdf}(d | \alpha,\alpha/\mu) \\ y &\sim \text{bernoullilogit}(x*\beta) \end{align}

(I am omitting priors on \alpha, \mu and \beta for simplicity).

This model actually works fine. However, if I add an intercept a:

\begin{align} d &\sim \text{gamma}(\alpha,\alpha/\mu) \\ x &= \text{gamma\_pdf}(d | \alpha,\alpha/\mu) \\ a &\sim \mathcal{N}(0,\text{a\_sd}) \\ y &\sim \text{bernoullilogit}(a + x*\beta) \end{align}

Then now the posterior SD of \beta does not go down with increasing sample size. Initially I thought that this could be an identifiability issue but I set a>0 and it did not make a difference.

Does this make sense to anybody why it would happen and how to resolve it?

I think your intuition is right—the problem is in identifiability.

If you’re estimating them, you want to constrain both \alpha and \mu to be positive if you’re coding in Stan. This isn’t the root cause of your problem, but allowing negative values for \mu will cause them to get rejected, which is super inefficient and can cause initialization to fail.

My guess is that your issue is that x and \beta are only identified through the prior given that they only show up in product form. In the first example, you have at least constrained x \cdot \beta to be close to \textrm{logit}(\textrm{mean}(y)). In the second example, the intercept a can do all that work leaving x \cdot \beta purely non-identified and thus not informed by data.

You have to read the fine print in the Bernstein-von Mises theorem to figure out when you can expect posterior concentration. My guess is that you’re running into singular Fisher information through the non-identifiability, but I’d have to work out the details to be sure. The usual exclusions are models whose parameters grow at the same rate as the data, like latent time-series models.

Thank you @Bob_Carpenter! Do you have any suggestions on how to overcome this? I am reading the Bernstein-von Mises theorem but it is going slow. I am working on simplifying the problem even more (e.g. fixing \mu and \alpha), but I don’t understand why x*b is different than any other logistic regression term. For example, why isn’t the very same identifiability issue happening with the example on the stan website:

data {
  int<lower=0> N;
  vector[N] x;
  array[N] int<lower=0, upper=1> y;
}
parameters {
  real alpha;
  real beta;
}
model {
  y ~ bernoulli_logit(alpha + beta * x);
}

I will mark this as resolved as it was really that I was using the PDF of gamma to generate the x. See this question.