Hi @jeffreypullin
I still feel fundamentally positive about this–nice work!
I do still worry that the current draft will mislead people in practice. Recall that the users guide is going to be read, and translated into applied statistical practice, by people who don’t necessarily have much background in math or probability. Perhaps I am oversensitive precisely because I very nearly made (a mistake analogous to) the following mistake a while back in my own applied work:
Suppose I am a Stan user and I want to fit (for the sake of a simple example), a zero-inflated Poisson which I write mathematically as:
Z \sim Bernoulli(p)
Y \sim Poisson(Z*\lambda)
Marginalized Stan code (not optimized) is
data {
int N; // number of data
int<lower=0> Y[N] // response
}
parameters {
real<lower = 0, upper = 1> p; // probability that we are *not* in the zero-inflation regime
real<lower = 0> lambda; // Poisson parameter
}
model {
for (i in 1:N) {
if (Y[i] == 0) {
target += log_sum_exp(bernoulli_lpmf(0|p), bernoulli_lpmf(1|p) + poisson_lpmf(0|lambda);
} else {
target += poisson_lpmf(y[i]|lambda) + bernoulli_lpmf(1|p);
}
}
Now suppose the fitted values of p
and lambda
are p = 0.5
and lambda = 10
. And suppose we want to recover a posterior for Z[1]
.
If I’m a user who is uncomfortable with marginalization (and therefore I am reading through the users guide to do this properly), my main takeaway would be that I can estimate Pr(Z = 1) (in your document this is Pr(Y=k)) by evaluating Pr(Z = 1 | p, \lambda). So what do I make of this? I’m accustomed to generative modeling and I know how to simulate data for given values of p and \lambda, and I know that Z is independent of \lambda when I simulate the non-marginalized model from the generative process. So naturally, I take the posterior for Z[1]
to be bernoulli_rng(p)
.
The problem is that this is wrong. In this example, it’s very wrong. If Y[1]
is one, then Z[1]
is one with probability one. And if Y[1]
is zero, then Z[1]
is one with probability near zero.
In my personal experience, this is the main confusion about recovering the marginalized parameters, and I think that without some kind of tweak, this draft deepens, rather than ameliorates, the confusion. What is missing is a reminder (or several) that in order to properly condition on the continuous parameters, we need to also condition on the data. And not just in the sense of conditioning the continuous parameter estimates on the data. (edit to add: We need to notice that our likelihood itself conditions on the data with its if
statement). The conditional expectation for \mathbb{E}(Z|p,\lambda) (as in the current draft), read in a vacuum, is simply p. What we need is \mathbb{E}(Z|p,\lambda,Y). I think that explaining this carefully is the main thing that a users guide section ought to accomplish here.
Again, I think adding this section to the users guide will be really useful, and I feel really positive about the direction you’re taking. Thank you!