Two Conceptual Questions Regarding Bayesian Model Averaging

Hi everyone,

I’m currently working on my first paper using data from European energy firms. The core research question is:

To what extent has it been profitable for energy companies to invest in renewable energy?

One of the main methods I’m using is Bayesian Model Averaging (BMA).

I’ve made some progress, but I have two conceptual questions I’d really appreciate input on:


1) Posterior Inclusion Probabilities (PIPs) with Uneven Data Coverage

After an extensive literature review, I identified 55 potential covariates. For 12 of them, however, I only have data in a subset of the sample — so including them means reducing the number of observations.
I’m facing a trade-off: if I include these additional variables, I have to restrict my sample considerably, which unfortunately means losing a key region — Italy — from the dataset. I’m unsure what’s preferable here:

Under the M-closed assumption, is it generally better to prioritize a richer covariate space (with fewer observations), or to preserve as many observations and geographic diversity as possible (at the cost of excluding some variables)?

Would it be acceptable to report the PIPs of these 12 variables conditional on their respective sample sizes — for example, by including PIPs and effective N side by side in a joint visualization?


2) BMA with Interaction Terms and Reduced Model Space

To include interaction terms, I constrain the BMA algorithm such that interactions are only allowed if all their components are also in the model.
This drastically reduces the percentage of the model space that gets visited. I always end up with 10% of visited models even when I use the reversible jump sampler, instead of birth and death, with a lot of iterations and burns. All the other diagnostics look good. I get a 98% convergence between PMP Exact and MCMC.

Is the fact that I only visit 10% of all possible models a problem?