Hi, I am trying to understand the centered vs non-centered parameterizations of hierarchical models. I can understand that non-centered parameterization might work better because it makes the sampled parameters less correlated, avoiding issues such as Neal’s funnel. However, I have trouble accepting that the centering would only affect the geometry of the sampled parameters without affecting the underlying loglikelihood calculations. To me, it seems that the centered parameterization “penalizes” the model correctly for the variance of random effects, whereas the non-centered parameterization does not.
Context: I’m not a trained statistician. I mainly have hands-on experience with frequentist mixed-effects modelling, and some hands-on experience with Bayesian hierarchical modelling.
My initial question is: Is this difference in loglikelihoods a known issue which is not considered meaningful in practice? Any suggestions for reading up on the topic? If this is a widely known issue, then there is no need to go through the lengthy case example below. Otherwise, read on.
I provide a set of minimal working examples below. Although I am interested in estimating all parameters of a hierarchical model, for this minimal case example set it can be assumed that we only have data from one subject i and we only estimate the random effect value b_i for this one subject, together with the variance of random effects \omega. We assume that measurement error \sigma and the population mean \mu are known, and specified as data.
Centered model:
data {
int N;vector[10] y;
real mu;real sigma;
}
parameters {
real b;real logomega;
}
transformed parameters {
vector[N] ipred;
real omega=exp(logomega);
for (i in 1:N) {
ipred[i]=b;
}
}
model {
y ~ normal(ipred,sigma);
b ~ normal(mu,omega);
logomega ~ uniform(-10,10);
}
Non-centered model:
data {
int N;vector[10] y;
real mu;real sigma;
}
parameters {
real braw;real logomega;
}
transformed parameters {
vector[N] ipred;
real omega=exp(logomega);
real b=mu+braw*omega;
for (i in 1:N) {
ipred[i]=b;
}
}
model {
y ~ normal(ipred, sigma);
braw ~ normal(0, 1);
//correction for use of braw, commented out in "traditional" non-centered parameterization
target += log(1/omega);
logomega ~ uniform(-10,10);
}
The issue between specifying b \sim N(\mu,\omega^2) and b_{raw} \sim N(0,1) is that in the first case loglikelihood does not get penalized by log(\omega^{-1}). We can manually include it to the code, as specified above. I refer to this as “corrected non-centered” model. I refer to the non-centered model without the correction as “traditional non-centered model”.
We will now simulate some toy data, and see where the differently parameterized models converge. We use the “optimizing” function for the sake of reducing variability, we just want to see whether the mode of the parameter estimates is affected by parameterization; and the mode can be found out deterministically. The hypothesis is that the centered and “corrected non-centered” models will end up with the same parameter estimates and loglikelihood values, while the “traditional non-centered” will give differing parameter estimates.
library(tidyverse)
library(rstan)
## gen data
set.seed(1)
dat1 <- list(N=10,y=rnorm(10,1),mu=0,sigma=.1)
m1 <- optimizing(stan_model("stan_centered.stan"),data=dat1)
m2 <- optimizing(stan_model("stan_noncentered_trad.stan"),data=dat1)
m3 <- optimizing(stan_model("stan_noncentered_corrected.stan"),data=dat1)
map(list(m1,m2,m3),~signif(c(.$par[c("b","omega")],loglik=.$value),6)) %>%
do.call(rbind,.) %>% cbind(Model=c("centered","trad noncent.","corrected noncent."),.)
Model | b | omega | loglik |
---|---|---|---|
centered | 1.13132 | 1.13129 | -274.815 |
trad noncent. | 1.13218 | 85.3197 | -274.192 |
corrected noncent. | 1.13132 | 1.13134 | -274.815 |
We can see that the centered and corrected non-centered models indeed seem to give identical loglikelihood values, and almost identical parameter estimates. The same cannot be said about the traditional non-centered model, which seems to over-estimate the value of \omega. The model has no “incentive” to try to keep the value of \omega small. However, the traditional centered model did not estimate \omega to be at its maximum allowed value at e^{10}\approx 22026 likely because increasing the value above a certain threshold no longer resulted in meaningful improvements in loglikelihood.
Summary: We established that the loglikelihood specification differs between the centered parameterization and the traditionally used non-centered parameterization. It is possible to apply a correction to the non-centered parameterization so that the loglikelihood specification is identical to that of centered parameterization.
Speculation: I have heard that the non-centered parameterization works well if there are not much data, in the case of hierarchical models. I wonder if the good performance of (traditional, non-penalized) non-centered parameterization in the case of small datasets is more due to the model being able to do “whatever it wants” without being properly penalized for the variances of random effects.
Repeating the initial question: Is this difference in loglikelihood specifications between the parameterizations a known issue? Is it meaningful in practice? Any suggestions for reading up on the topic?