LOOIC in univariate models

Given is the following GLM: Y_{i} \sim \operatorname{Normal}\left(\alpha + \beta_{j} \cdot X_{ij}, \sigma\right)

  • where:
    Y = quantitative response vector of size N
    X^{N \times J} = design matrix with X_{ij} = 1 or X_{ij} =-1;
    with J \approx 10,000 and N \approx 50

  • \beta_j \sim \operatorname{Normal}(\beta_{\mu}, \beta_{\sigma}) [actually non-centered]

  • and priors:
    \alpha \sim \operatorname{Normal}(0, 10)
    \beta_{\mu} \sim \operatorname{Normal}(0, 5)
    \sigma, \beta_{\sigma} \sim \operatorname{Cauchy}^{+}(0, 5)

With this model I estimate the univariate (but partially-pooled) effects (\beta_j), and generate the log-likelihood matrix -> \text{log_lik}^{P \times N \times J}, where P = Chains \cdot Iterations, N = number of observations, J = number of effects (columns in X)

My main question is how to estimate LOOIC (and to interpret p_loo) in univariate models?

Currently, I transform the original \text{log_lik} into \tilde{\text{log_lik}}^{O \times N} where O = Chain \cdot Iterations \cdot J and use this as input to loo. Any systematic error here?

Posterior predictive checks show that the model has high predictive accuracy (based on the observed data). I can include the model and the loo estimates if needed.

I don’t understand this, as the log_lik size should depend only on size Y and not on size of X

Thank you and I agree. To make log_lik depend on Y I transform the original \text{log_lik}^{P \times N \times J} matrix to \tilde{\text{log_lik}}^{(P \cdot J) \times N} -> input to loo.

I am just unsure about the idea of providing as input (to loo) for each observation the combined log-likelihood values estimated using J different parameters. This would yield an average (across J parameters) out-of-sample predictive accuracy (Right?). Alternatively, I could compute a total (additive for each parameter) elpd. I have no idea which way is more suitable, given the structure of my model?

When I apply the first approach, I have difficulty interpreting p_loo, which is a dramatic underestimate of the actual number of parameters (J), and this is not a consequence of the partial-pooling.