Hi, all.

I have a question regarding model comparison and specifically some confusion regarding the relative weights from the `compare`

function in `arviz`

. I have a small subset of my data that I am using to learn more about Bayesian modeling and have fit two models: one of which has no covariates (`"null"`

) and the second of which has one I am fairly certain is important (`"host"`

).

When I compare these two models using LOO, I see that the `null`

model is ranked at the top but that the relative weights suggest the `host`

model has a higher weight. From the `az.compare`

documentation:

[weight] can be loosely interpreted as the probability of each model (among the compared model) given the data.

My current thought is that these two models are very close (see this forum post) so it’s difficult to draw conclusions. I guess my question is: given the small difference in elpd, is the difference in weights also uninformative?

## Pointwise LOO for host:

```
Computed from 4000 by 672 log-likelihood matrix
Estimate SE
elpd_loo -2791.65 57.92
p_loo 74.15 -
There has been a warning during the calculation. Please check the results.
------
Pareto k diagnostic values:
Count Pct.
(-Inf, 0.5] (good) 655 97.5%
(0.5, 0.7] (ok) 12 1.8%
(0.7, 1] (bad) 4 0.6%
(1, Inf) (very bad) 1 0.1%
```

## Pointwise LOO for null:

```
Computed from 4000 by 672 log-likelihood matrix
Estimate SE
elpd_loo -2791.56 57.55
p_loo 51.50 -
There has been a warning during the calculation. Please check the results.
------
Pareto k diagnostic values:
Count Pct.
(-Inf, 0.5] (good) 668 99.4%
(0.5, 0.7] (ok) 3 0.4%
(0.7, 1] (bad) 1 0.1%
(1, Inf) (very bad) 0 0.0%
```

## Some information about the data & model

The data is tabular (24 organisms x 28 features) with the `host`

model fitting 3 parameters per feature (`Intercept`

, `host`

, `dispersion`

) and the `null`

model fitting 2 parameters per feature (`Intercept`

, `dispersion`

).

## Stan code

```
data {
int<lower=0> N; // number of samples
int<lower=0> D; // number of dimensions
int<lower=0> p; // number of covariates
real depth[N]; // sequencing depths of microbes
matrix[N, p] x; // covariate matrix
int y[N, D]; // observed microbe abundances
real<lower=0> B_p; // stdev for Beta Normal prior
real<lower=0> phi_s; // scale for dispersion Cauchy prior
}
parameters {
// parameters required for linear regression on the species means
matrix[p, D-1] beta;
vector<lower=0>[D] reciprocal_phi;
}
transformed parameters {
matrix[N, D-1] lam;
matrix[N, D] lam_clr;
vector[N] z;
vector<lower=0>[D] phi;
for (i in 1:D){
phi[i] = 1. / reciprocal_phi[i];
}
z = to_vector(rep_array(0, N));
lam = x * beta;
lam_clr = append_col(z, lam);
}
model {
// setting priors ...
for (i in 1:D){
reciprocal_phi[i] ~ cauchy(0., phi_s);
}
for (i in 1:D-1){
for (j in 1:p){
beta[j, i] ~ normal(0., B_p); // uninformed prior
}
}
// generating counts
for (n in 1:N){
for (i in 1:D){
target += neg_binomial_2_log_lpmf(y[n, i] | depth[n] + lam_clr[n, i], phi[i]);
}
}
}
generated quantities {
matrix[N, D] y_predict;
matrix[N, D] log_lik;
for (n in 1:N){
for (i in 1:D){
y_predict[n, i] = neg_binomial_2_log_rng(depth[n] + lam_clr[n, i], phi[i]);
log_lik[n, i] = neg_binomial_2_log_lpmf(y[n, i] | depth[n] + lam_clr[n, i], phi[i]);
}
}
}
```