Effective number of parameters for a simple Weibull modell

Hi everyone,

I’m trying to estimate a single parameter of the three-parameter Weibull distribution with its two other parameters fixed (known). The fit seems to be ok and the diagnostics do not show any weird behavior. However, the effective number of parameters estimated by the loo package is larger than one (1.7). How is this possible? I thought that it should be between 0 and 1. I obtain even larger effective number of parameters for more data points (below a small subset is given). Probably I am missing something obvious or misunderstand something. Thank you in advance for your help.

The loo results:

Computed from 4000 by 10 log-likelihood matrix

         Estimate   SE
elpd_loo    -63.0  7.8
p_loo         1.7  0.8
looic       126.1 15.6

Pareto k diagnostic values:
                         Count  Pct 
(-Inf, 0.5]   (good)     9     90.0%
 (0.5, 0.7]   (ok)       1     10.0%
   (0.7, 1]   (bad)      0      0.0%
   (1, Inf)   (very bad) 0      0.0%

All Pareto k estimates are ok (k < 0.7)

The R code:

Kmin        = 20
lambda      = 4

Kmat        = c(119.90595,  28.73428,  81.63255,  76.59204,  79.39576, 130.57210, 141.26851, 107.11765,  32.11837,  29.06687)
N           = length(Kmat)

data        = list(N = N,
                Kmat = Kmat,
                Kmin = Kmin,
                lambda = lambda)


stan_mod    = 'built_in_weib.stan'
# stan_mod    = 'custom_weib.stan'

fit = stan(file = stan_mod, data = data,
            iter = 2000, chains = 4)

log_lik_m   = extract_log_lik(fit)
loo_m       = loo(log_lik_m)
print(loo_m)

stan code (built_in_weib.stan):

data {
  int<lower=1> N;       // # observations
  real Kmin;
  real lambda;
  vector[N] Kmat;       // observations
}
parameters {
  real<lower=Kmin> Km;
}
model {
    target += weibull_lpdf(Kmat-Kmin | lambda, Km-Kmin);
}
generated quantities {
  vector[N] log_lik;
  for (n in 1:N) {
    log_lik[n] = weibull_lpdf(Kmat[n]-Kmin | lambda, Km-Kmin);
  }
}

The concept of “the effective number of parameters” is asymptotic property when n->infty, some regularity assumptions for the model, and the posterior converges towards normal distribution.

p_loo is computed as difference between elpd_post and elpd_loo, and p_loo can larger than p if

  • the posterior is far from normal which likely for Weibul
  • if there are highly influential observations, like in your case one khat>0.5 (hey these diagnostics are helpful!)

Similar thing happens with Poisson model for roaches data Bayesian data analysis - roaches cross-validation demo,
where p_loo=301 is larger than p=4 and larger than n=262
and even with Negative-binomial p_loo=6.7 > p=5, but we see there still are couple highly influential observations.

So in non-asymptotic cases “the effective number of parameters” can be misleading name, but the value in p_loo row is still useful if you know how to interpret it.

1 Like