New marginalisation section for the Stan User's Guide

jsocolar · June 2, 2021, 2:51pm

Hi @jeffreypullin
I still feel fundamentally positive about this–nice work!

I do still worry that the current draft will mislead people in practice. Recall that the users guide is going to be read, and translated into applied statistical practice, by people who don’t necessarily have much background in math or probability. Perhaps I am oversensitive precisely because I very nearly made (a mistake analogous to) the following mistake a while back in my own applied work:

Suppose I am a Stan user and I want to fit (for the sake of a simple example), a zero-inflated Poisson which I write mathematically as:

Z \sim Bernoulli(p)
Y \sim Poisson(Z*\lambda)

Marginalized Stan code (not optimized) is

data {
  int N; // number of data
  int<lower=0> Y[N] // response
}
parameters {
  real<lower = 0, upper = 1> p;  // probability that we are *not* in the zero-inflation regime
  real<lower = 0> lambda; // Poisson parameter
}
model {
  for (i in 1:N) {
    if (Y[i] == 0) {
      target += log_sum_exp(bernoulli_lpmf(0|p), bernoulli_lpmf(1|p) + poisson_lpmf(0|lambda);
   } else {
      target += poisson_lpmf(y[i]|lambda) + bernoulli_lpmf(1|p);
   }
}

Now suppose the fitted values of p and lambda are p = 0.5 and lambda = 10. And suppose we want to recover a posterior for Z[1].

If I’m a user who is uncomfortable with marginalization (and therefore I am reading through the users guide to do this properly), my main takeaway would be that I can estimate Pr(Z = 1) (in your document this is Pr(Y=k)) by evaluating Pr(Z = 1 | p, \lambda). So what do I make of this? I’m accustomed to generative modeling and I know how to simulate data for given values of p and \lambda, and I know that Z is independent of \lambda when I simulate the non-marginalized model from the generative process. So naturally, I take the posterior for Z[1] to be bernoulli_rng(p).

The problem is that this is wrong. In this example, it’s very wrong. If Y[1] is one, then Z[1] is one with probability one. And if Y[1] is zero, then Z[1] is one with probability near zero.

In my personal experience, this is the main confusion about recovering the marginalized parameters, and I think that without some kind of tweak, this draft deepens, rather than ameliorates, the confusion. What is missing is a reminder (or several) that in order to properly condition on the continuous parameters, we need to also condition on the data. And not just in the sense of conditioning the continuous parameter estimates on the data. (edit to add: We need to notice that our likelihood itself conditions on the data with its if statement). The conditional expectation for \mathbb{E}(Z|p,\lambda) (as in the current draft), read in a vacuum, is simply p. What we need is \mathbb{E}(Z|p,\lambda,Y). I think that explaining this carefully is the main thing that a users guide section ought to accomplish here.

Again, I think adding this section to the users guide will be really useful, and I feel really positive about the direction you’re taking. Thank you!

Topic		Replies	Views
Working Dawid-Skene example with simulation Modeling	2	1337	August 22, 2018
New preprint: Investigating the efficiency of marginalising over discrete parameters in Bayesian computations Publicity rstan	9	1251	September 21, 2022
Marginalization for non-experts Publicity ecology , marginal-likelihood	0	779	April 29, 2020
Estimation, not marginalisation, of discrete parameters: a solution? General discrete-parameters	3	720	May 18, 2022
Continuous binomial, negative binomial and Poisson Modeling techniques	2	930	May 11, 2020

New marginalisation section for the Stan User's Guide

Related topics