Calculating model averaged estimates using loo

gunhanb · November 28, 2018, 8:04pm

Hi,

I am trying to use pseudo-BMA as described in “Using Stacking to Average Bayesian Predictive Distributions” by Yao et al 2018.

Following R code is directly from “loo_model_weights” function help page

library(rstan)

# generate fake data from N(0,1).
N <- 100
y <- rnorm(N, 0, 1)

# Suppose we have three models: N(-1, sigma), N(0.5, sigma) and N(0.6,sigma).
stan_code <- "
  data {
    int N;
    vector[N] y;
    real mu_fixed;
  }
  parameters {
    real<lower=0> sigma;
  }
  model {
    sigma ~ exponential(1);
    y ~ normal(mu_fixed, sigma);
  }
  generated quantities {
    vector[N] log_lik;
    for (n in 1:N) log_lik[n] = normal_lpdf(y[n]| mu_fixed, sigma);
  }"

mod <- stan_model(model_code = stan_code)
fit1 <- sampling(mod, data=list(N=N, y=y, mu_fixed=-1))
fit2 <- sampling(mod, data=list(N=N, y=y, mu_fixed=0.5))
fit3 <- sampling(mod, data=list(N=N, y=y, mu_fixed=0.6))
model_list <- list(fit1, fit2, fit3)
log_lik_list <- lapply(model_list, extract_log_lik)

# optional but recommended
r_eff_list <- lapply(model_list, function(x) {
  ll_array <- extract_log_lik(x, merge_chains = FALSE)
  relative_eff(exp(ll_array))
})

# stacking method:
wts1 <- loo_model_weights(
  log_lik_list,
  method = "stacking",
  r_eff_list = r_eff_list,
  optim_control = list(reltol=1e-10)
)
print(wts1)

From this, I obtain the weights for each model.

My question is: How can I calculate the model averaged posterior estimates, in this example “model averaged” estimate of the sigma parameter?

(Although this entry (Implementing Model Averaging) looks similar, unfortunately still I did not understand it.)

Thank you very much for your help,

Best regards,

Burak Kürsad Günhan

avehtari · November 28, 2018, 9:47pm

Please use stacking instead of Pseudo-BMA, as it’s much better.

# stacking method:
wts1 <- loo_model_weights(
  log_lik_list,
  method = "stacking",
  r_eff_list = r_eff_list,
  optim_control = list(reltol=1e-10)
)

The code example you listed is using stacking instead of Pseudo-BMA.

Stacking (and pseudo-BMA) can be used to combine predictive distributions, but it’s sensible for averaging parameters only in some special cases. It’s likely that averaging sigmas doesn’t make sense. Instead you should sample from the combined predictive distribution and focus on the predictions.

gunhanb · November 29, 2018, 10:18am

Sorry that I confused with “stacking” and “pseudo BMA”.

I see, it is about combining predictive distributions.

That means, I need to have “y_rep” in generated quantities of my Stan model, and I need to fit the models again, but this time using “weighted” number of iterations, and then combine all “y_rep” at the end. And this is my combined posterior distribution for “y_rep”, is that right? Is this what you explained here (Implementing Model Averaging), (but for rstanarm mdoels) right?

Thank you again.

avehtari · November 29, 2018, 11:36pm

No problem, there are many things in the paper.

Yes! That is a correct description and correct link.

Aki

gunhanb · November 30, 2018, 7:13am

Great, thank you!

Osupplisson · January 30, 2024, 10:36pm

Sorry to up a rather old post but since my question is directly linked to this comment it seemed more relevant to stay within the same thread.

I was wondering if you could kindly point me towards a reference or a post giving more information about when averaging parameters is a sensible/ok strategy ?

For example, would it be fine to average p_{i}, where p_{i} is one of the parameter from a Binomial regression y_{i} \sim Binomial(p_{i},N_{i}) ? Would it be ok to average \lambda_{i} if y_{i} \sim Negative Binomial(E_{i}\lambda_{i},\theta)? I am asking this in a context where all competing models would have the same explanatory variables, with different way to include them in p or \lambda (say, different version of the adjacency matrix for a spatially structured prior such as BYM2 or different way to specify smoothing splines).

I feel it is ok in these cases as those parameters have a common interpretation across models but I am afraid I am missing something based on your comment.

avehtari · January 31, 2024, 12:50pm

It’s not clear what you mean by “to average” here. If you mean the model ensemble mixture posterior, that is fine. If you mean to average also over the parameter posterior to get a point estimate, that is not giving you the actual predictive distribution.

Osupplisson · January 31, 2024, 2:36pm

Thank you very much for your answer and sorry for the lack of clarity. I’ll try to be clearer this time.

Let’s focus on the Binomial case: y_{i} \vert p_{i,m} \sim Binomial(p_{i,m},N_{i}), p_{i,m}=1/(1+exp(-\mu_{i,m})). All \mu_{m} would always contain the same explanatory variables but would be included in different ways in \mu. For example, \mu_{i,1}=\beta_{0}+f_{1}(x_{i}) and \mu_{i,2}=\beta_{0}+f_{2}(x_{i}) with f_{1} and f_{2} two smoothing splines specified through different bases.

I don’t really care about y_{i} because it depends on N_{i} and therefore I focus on p_{i}. Rather than choosing between one of the two structures, I’d like to get a posterior distribution for p_{i} accounting for model structure uncertainty. I was thinking of using the mixture of the model-specific posteriors for p_{i} with the posterior model probabilities defined using whatever weights (stacking weights?): f(p_{i}|y)=w_{1}*f(p_{i}|y,m_{1})+w_{2}*f(p_{i}|y,m_{2}). Would that approach be ok?

avehtari · January 31, 2024, 5:17pm

Thanks for the clarification

Yes.

Topic		Replies	Views
Documentation of pp_average "loo2" brms loo	3	615	January 9, 2019
Loo_compare vs averaging/weighting via stacking or pseudo-BMA weighting rstanarm fitting-issues , loo	2	892	February 5, 2021
Implementing Model Averaging RStan loo	9	3622	August 17, 2022
Stacked elpd_loo estimation Modeling performance , loo , rstanarm , cross-validation	4	192	April 26, 2024
Model stacking and LOO (brms models) brms loo	11	7160	June 29, 2018

Calculating model averaged estimates using loo

Related topics