Confusion about discrete parameter marginalization in manual Bayesian model averaging

martinmodrak · March 27, 2021, 10:49pm

EDIT: Turns out I really was confused about marginalization. I think I finally figured it out. Edited to only show what I believe is the correct solution. No more answers needed, unless you believe the current solution is wrong.

So I’ve been trying to understand Bayes factors and BMA better. Here’s a model that combines two candidate submodels, indexed by a single binary discrete parameter z with a uniform prior:

\mathbf{y} = \{y_1, ... , y_K\} \\ z \in \{1,2\}; P(z = 1) = P(z = 2) = \frac{1}{2}\\ y_i \sim N(\mu, 1) \\ \mu = \begin{cases} 0 & \mathrm{if} \quad z = 1 \\ \alpha & \mathrm{if} \quad z = 2 \end{cases} \\ \alpha \sim N(0,2)

The posterior probability P(z = 1 | \mathbf{y}) then corresponds to the BMA weight of the “null” submodel (\mu = 0) against an “intercept” submodel (\mu = \alpha).

I’ve thought this would pose no problems as Andrew and others always recommends that the best approach for model selection is just building the supermodel that contains the smaller submodels.

We can’t do discrete parameters in Stan, but a direct way to reparametrize for Stan is to introduce s = P(z = 1) as an additional parameter, so we can have a Stan model:

data {
  int K;
  vector[K] y;
}

parameters {
  real alpha;
}

transformed parameters {
  real log_lik_null = normal_lpdf(y | 0, 1);
  real log_lik_intercept = normal_lpdf(y | alpha, 1);
}

model {
    target += log_mix(0.5, log_lik_null, log_lik_intercept);
    target += normal_lpdf(alpha | 0, 2);
}

Note that we deliberately mix the full likelihoods - either all observations are from model 1 or all are from model 2.

I can then use the Bayes rule and since z and \alpha are a-prior independent, I can assume P(z = 1 | \alpha) = P(z = 1) = \frac{1}{2}. This gives:

P(z = 1 | \mathbf{y}, \alpha) =\\ = \frac{P(z = 1 | \alpha) \times p(\mathbf{y} | z = 1, \alpha)}{P(z = 1 | \alpha) \times p(\mathbf{y} | z = 1, \alpha) + P(z = 2 | \alpha) \times p(\mathbf{y} | z = 2, \alpha)} = \\ =\frac{\frac{1}{2} p(\mathbf{y} | z = 1)}{\frac{1}{2} p(\mathbf{y} | z = 1) + \frac{1}{2} p(\mathbf{y} | z = 2, \alpha)}

So to get the desired P(z = 1 | \mathbf{y}) I need to integrate out \alpha, which I can approximate via posterior samples:

P(z = 1 | \mathbf{y}) = \int P(z = 1 | \mathbf{y}, \alpha) \mathrm{d} \alpha \simeq \frac{1}{M} \sum_m P(z = 1 | \mathbf{y}, \alpha = \alpha_m)

The result is in agreement with the analytical value and other ways to compute the BMA weight and Bayes factor:

y <- c(0.5,0.7, -0.4, 0.1)

full_model <- cmdstan_model("2021-bayes-factors.stan")
fit_full <- full_model$sample(data = list(K = length(y), y = y), 
  refresh = 0, iter_sampling =  10000)

full_samples <-posterior::as_draws_matrix(fit_full$draws())

rel_p_null <- exp(full_samples[,"log_lik_null"])
rel_p_intercept <- exp(full_samples[,"log_lik_intercept"])
p_null <- mean(rel_p_null / (rel_p_null + rel_p_intercept))

p_null
# 0.7863209
p_null / (1 - p_null) # The Bayes factor
# 3.679916

For other seeking more examples, changepoint model from Stan user’s guide proved useful…

martinmodrak · March 28, 2021, 10:02am

OK, I think I manage to resolve the issue and gain clarity on marginalization. Edited the original post to show only what I now believe is correct. Open the edit history to see me confused about basic probability concepts :-D

The bad part is that it means I was really confused previously and gave bad advice to some users on this site previously. So a lesson in humility…

jeffreypullin · March 28, 2021, 11:04am

I think marginalisation in Stan is just confusing unfortunately… A few months ago while working on {rater} (which fits models with discrete parameters through marginalization) I had a very funny meeting with my supervisors where we realized that none of us could explain marginalization to each other!

To fix this and provide a good introduction to marginalization we wrote a long section about it in our write up of rater which we recently put out as a preprint.

Hopefully it makes things a bit clearer for anyone interested in marginalisation in Stan!

Best,
Jeffrey

p.s. Apologies for the naked self promotion :)

Topic		Replies	Views
Marginalizing in simplified discrete model Modeling	0	354	March 7, 2022
Marginalizing over latent discrete parameters Modeling	20	1278	November 11, 2021
Estimation, not marginalisation, of discrete parameters: a solution? General discrete-parameters	3	724	May 18, 2022
New marginalisation section for the Stan User's Guide Modeling	12	1162	June 28, 2021
Priors for binomial trials in generalized linear models Modeling	3	509	July 6, 2018

Confusion about discrete parameter marginalization in manual Bayesian model averaging

Related topics