Calculating model averaged estimates using loo

Hi,

I am trying to use pseudo-BMA as described in “Using Stacking to Average Bayesian Predictive Distributions” by Yao et al 2018.

Following R code is directly from “loo_model_weights” function help page

library(rstan)

# generate fake data from N(0,1).
N <- 100
y <- rnorm(N, 0, 1)

# Suppose we have three models: N(-1, sigma), N(0.5, sigma) and N(0.6,sigma).
stan_code <- "
  data {
    int N;
    vector[N] y;
    real mu_fixed;
  }
  parameters {
    real<lower=0> sigma;
  }
  model {
    sigma ~ exponential(1);
    y ~ normal(mu_fixed, sigma);
  }
  generated quantities {
    vector[N] log_lik;
    for (n in 1:N) log_lik[n] = normal_lpdf(y[n]| mu_fixed, sigma);
  }"

mod <- stan_model(model_code = stan_code)
fit1 <- sampling(mod, data=list(N=N, y=y, mu_fixed=-1))
fit2 <- sampling(mod, data=list(N=N, y=y, mu_fixed=0.5))
fit3 <- sampling(mod, data=list(N=N, y=y, mu_fixed=0.6))
model_list <- list(fit1, fit2, fit3)
log_lik_list <- lapply(model_list, extract_log_lik)

# optional but recommended
r_eff_list <- lapply(model_list, function(x) {
  ll_array <- extract_log_lik(x, merge_chains = FALSE)
  relative_eff(exp(ll_array))
})

# stacking method:
wts1 <- loo_model_weights(
  log_lik_list,
  method = "stacking",
  r_eff_list = r_eff_list,
  optim_control = list(reltol=1e-10)
)
print(wts1)

From this, I obtain the weights for each model.

My question is: How can I calculate the model averaged posterior estimates, in this example “model averaged” estimate of the sigma parameter?

(Although this entry (Implementing Model Averaging) looks similar, unfortunately still I did not understand it.)

Thank you very much for your help,

Best regards,

Burak Kürsad Günhan

Please use stacking instead of Pseudo-BMA, as it’s much better.

# stacking method:
wts1 <- loo_model_weights(
  log_lik_list,
  method = "stacking",
  r_eff_list = r_eff_list,
  optim_control = list(reltol=1e-10)
)

The code example you listed is using stacking instead of Pseudo-BMA.

Stacking (and pseudo-BMA) can be used to combine predictive distributions, but it’s sensible for averaging parameters only in some special cases. It’s likely that averaging sigmas doesn’t make sense. Instead you should sample from the combined predictive distribution and focus on the predictions.

Sorry that I confused with “stacking” and “pseudo BMA”.

I see, it is about combining predictive distributions.

That means, I need to have “y_rep” in generated quantities of my Stan model, and I need to fit the models again, but this time using “weighted” number of iterations, and then combine all “y_rep” at the end. And this is my combined posterior distribution for “y_rep”, is that right? Is this what you explained here (Implementing Model Averaging), (but for rstanarm mdoels) right?

Thank you again.

2 Likes

No problem, there are many things in the paper.

Yes! That is a correct description and correct link.

Aki

Great, thank you!

1 Like

Sorry to up a rather old post but since my question is directly linked to this comment it seemed more relevant to stay within the same thread.

I was wondering if you could kindly point me towards a reference or a post giving more information about when averaging parameters is a sensible/ok strategy ?

For example, would it be fine to average p_{i}, where p_{i} is one of the parameter from a Binomial regression y_{i} \sim Binomial(p_{i},N_{i}) ? Would it be ok to average \lambda_{i} if y_{i} \sim Negative Binomial(E_{i}\lambda_{i},\theta)? I am asking this in a context where all competing models would have the same explanatory variables, with different way to include them in p or \lambda (say, different version of the adjacency matrix for a spatially structured prior such as BYM2 or different way to specify smoothing splines).

I feel it is ok in these cases as those parameters have a common interpretation across models but I am afraid I am missing something based on your comment.

It’s not clear what you mean by “to average” here. If you mean the model ensemble mixture posterior, that is fine. If you mean to average also over the parameter posterior to get a point estimate, that is not giving you the actual predictive distribution.

Thank you very much for your answer and sorry for the lack of clarity. I’ll try to be clearer this time.

Let’s focus on the Binomial case: y_{i} \vert p_{i,m} \sim Binomial(p_{i,m},N_{i}), p_{i,m}=1/(1+exp(-\mu_{i,m})). All \mu_{m} would always contain the same explanatory variables but would be included in different ways in \mu. For example, \mu_{i,1}=\beta_{0}+f_{1}(x_{i}) and \mu_{i,2}=\beta_{0}+f_{2}(x_{i}) with f_{1} and f_{2} two smoothing splines specified through different bases.

I don’t really care about y_{i} because it depends on N_{i} and therefore I focus on p_{i}. Rather than choosing between one of the two structures, I’d like to get a posterior distribution for p_{i} accounting for model structure uncertainty. I was thinking of using the mixture of the model-specific posteriors for p_{i} with the posterior model probabilities defined using whatever weights (stacking weights?): f(p_{i}|y)=w_{1}*f(p_{i}|y,m_{1})+w_{2}*f(p_{i}|y,m_{2}). Would that approach be ok?

1 Like

Thanks for the clarification

Yes.

2 Likes