Effective number of parameters for a simple Weibull modell

Hi everyone,

I’m trying to estimate a single parameter of the three-parameter Weibull distribution with its two other parameters fixed (known). The fit seems to be ok and the diagnostics do not show any weird behavior. However, the effective number of parameters estimated by the loo package is larger than one (1.7). How is this possible? I thought that it should be between 0 and 1. I obtain even larger effective number of parameters for more data points (below a small subset is given). Probably I am missing something obvious or misunderstand something. Thank you in advance for your help.

The loo results:

Computed from 4000 by 10 log-likelihood matrix

         Estimate   SE
elpd_loo    -63.0  7.8
p_loo         1.7  0.8
looic       126.1 15.6

Pareto k diagnostic values:
                         Count  Pct 
(-Inf, 0.5]   (good)     9     90.0%
 (0.5, 0.7]   (ok)       1     10.0%
   (0.7, 1]   (bad)      0      0.0%
   (1, Inf)   (very bad) 0      0.0%

All Pareto k estimates are ok (k < 0.7)

The R code:

Kmin        = 20
lambda      = 4

Kmat        = c(119.90595,  28.73428,  81.63255,  76.59204,  79.39576, 130.57210, 141.26851, 107.11765,  32.11837,  29.06687)
N           = length(Kmat)

data        = list(N = N,
                Kmat = Kmat,
                Kmin = Kmin,
                lambda = lambda)


stan_mod    = 'built_in_weib.stan'
# stan_mod    = 'custom_weib.stan'

fit = stan(file = stan_mod, data = data,
            iter = 2000, chains = 4)

log_lik_m   = extract_log_lik(fit)
loo_m       = loo(log_lik_m)
print(loo_m)

stan code (built_in_weib.stan):

data {
  int<lower=1> N;       // # observations
  real Kmin;
  real lambda;
  vector[N] Kmat;       // observations
}
parameters {
  real<lower=Kmin> Km;
}
model {
    target += weibull_lpdf(Kmat-Kmin | lambda, Km-Kmin);
}
generated quantities {
  vector[N] log_lik;
  for (n in 1:N) {
    log_lik[n] = weibull_lpdf(Kmat[n]-Kmin | lambda, Km-Kmin);
  }
}
1 Like

The concept of “the effective number of parameters” is asymptotic property when n->infty, some regularity assumptions for the model, and the posterior converges towards normal distribution.

p_loo is computed as difference between elpd_post and elpd_loo, and p_loo can larger than p if

  • the posterior is far from normal which likely for Weibul
  • if there are highly influential observations, like in your case one khat>0.5 (hey these diagnostics are helpful!)

Similar thing happens with Poisson model for roaches data https://rawgit.com/avehtari/modelselection_tutorial/master/roaches.html,
where p_loo=301 is larger than p=4 and larger than n=262
and even with Negative-binomial p_loo=6.7 > p=5, but we see there still are couple highly influential observations.

So in non-asymptotic cases “the effective number of parameters” can be misleading name, but the value in p_loo row is still useful if you know how to interpret it.

2 Likes

Congratulations this topic is a great discussion.

I would like to know when we have a non-linear model (e.g. sigmoid growth models) in which intrinsically there is a relationship between the parameters. Therefore, the information contained in one parameter helps to know other parameters. What would this relationship between the effective number of parameters (p_loo) and the number of model parameters (p) look like?

Can I expect smaller values for p_loo and incredibly smaller values for p_loo (like half of p for example)?

p_loo gets smaller if the parameters have posterior dependencies. IFor example, in an extreme case if the posterior is a singular line in higher dimensional space, p_loo is less than 1.

1 Like