# Is there a need to have a population level intercept when you already have random intercepts?

In my random intercept model, I’m debating getting rid of the population level intercept.

So instead of having something like:

``````data {
int<lower=1> N; //data points
real Y[N]; //dependent variable
int<lower=1> Ng; //num  of groups
int<lower=1, upper=Ng> grpID[N]; // Group Lookup
real x1[N]; //predictor
}
parameters {
//Population
real beta_0; //intercept
real beta_1;
real<lower=0> sigma;
//Group NCP
real grp_mu_raw[Ng];
real grp_mu_bar;
real<lower=0> grp_mu_sigma;
}
transformed parameters {
real grp_mu[Ng]; //intercept for each group
for (i in 1:Ng) {
grp_mu[i] = grp_mu_bar + grp_mu_raw[i] * grp_mu_sigma;
}
}
model {
//priors
beta_0 ~ normal(3, 3);
beta_1 ~ normal(0, 1);
sigma  ~ normal(0, 3);
grp_mu_bar ~ normal(0, 1);
grp_mu_raw ~ normal(0, 1);
grp_mu_sigma ~ normal(0, 1);
//likelihood
real mu[N];
for (i in 1:N) {
mu[i] = beta_0 + grp_mu[grpID[i]] + beta_1 * X1[i];
}
Y ~ normal(mu,sigma);
}
``````

changing it to:

``````data {
int<lower=1> N; //data points
real Y[N]; //dependent variable
int<lower=1> Ng; //num  of groups
int<lower=1, upper=Ng> grpID[N]; // Group Lookup
real x1[N]; //predictor
}
parameters {
//Population
real beta_1;
real<lower=0> sigma;
//Group NCP
real grp_mu_raw[Ng];
real grp_mu_bar;
real<lower=0> grp_mu_sigma;
}
transformed parameters {
real grp_mu[Ng]; //intercept for each group
for (i in 1:Ng) {
grp_mu[i] = grp_mu_bar + grp_mu_raw[i] * grp_mu_sigma;
}
}
model {
//priors
beta_1 ~ normal(0, 1);
sigma  ~ normal(0, 3);
grp_mu_bar ~ normal(3, 3);
grp_mu_raw ~ normal(0, 1);
grp_mu_sigma ~ normal(0, 1);
//likelihood
real mu[N];
for (i in 1:N) {
mu[i] = grp_mu[grpID[i]] + beta_1 * X1[i];
}
Y ~ normal(mu,sigma);
}
``````

Where the population level intercept has been dropped and the `grp_mu_bar` prior has been increased to reflect that.

Are there trade-offs to the different approaches? Or is one strictly better than the other?

I’ve tried running both models and the latter option initially seems to converge better, perhaps because you remove an unnecessary variable `beta_0`, one which will strongly correlate with `grp_mu_bar`.

1 Like

If you make what formerly was the population intercept into the mean of the group-specific intercepts, then the posterior distribution of the other parameters is not affected. But you are in the territory of centered vs. non-centered parameterizations, which can make a substantial difference to the efficiency of NUTS. Removing the population intercept altogether is basically equivalent to a centered parameterization with a point-mass prior on zero.

3 Likes

Thanks Ben, in such a model can I therefore do away with `grp_mu_bar`? It feels like any value it could have it will be ‘taken care of’ by the population-level intercept. I don’t see how it would adversely effect NUTS if I do, if anything I feel like I’m helping by getting rid of a superfluous parameter.

As you can probably tell, I’m still not clear on the whys and whens of non centred parametrisation.

Have you looked at this paper? It might help understanding said whys and whens. I don’t think it covers the plot twist that the non-centred parametrisation’s performance deteriorates with lots of data, but others here in the forum can point out good sources for that.

1 Like

Thanks for the tip, hadn’t seen it before and will give it a read.