Random intercepts Explain Almost All Variance

Hi all,

I am trying to build a model to explain variance in y in a country-year panel data.

However, the variance in y is so small in the 9 years that I have in the dataset.

Therefore, random intercepts for the countries (iso3c) and years (year) explain virtually all the variance (R2 = 0.94).

In my regressions, is it ok, if I avoid random intercepts for countries to make sure the rest of the model (other independent variables) explain the variance rather than only random country intercepts?

I share my formula, code, and data below. I will appreciate any suggestions regarding how to move forward.

df_replicate.csv (53.0 KB)

# y is left censored
# iso3c is country codes
# year is years


df_replicate <- read_csv("df_replicate.csv")

brms_formula <- "y | cens(cens) ~ (1 | iso3c) + (1 | year)"

mod <- brm(bf(brms_formula),
                 data = df_replicate,
                 prior = c(
                   prior(normal(0,1), class = Intercept),
                   prior(exponential(1), class = sigma),
                   prior(exponential(1), class = sd)
                 cores = 4, 
                 chains = 4, 
                 seed = 231024,
                 warmup = 2000,
                 iter = 8000,
                 control = list(adapt_delta = 0.99,
                                max_treedepth = 15))



Here are the results:


Family: gaussian
Links: mu = identity; sigma = identity
Formula: y | cens(cens) ~ (1 | iso3c) + (1 | year)
Data: df_replicate (Number of observations: 1602)
Draws: 4 chains, each with iter = 8000; warmup = 2000; thin = 1;
total post-warmup draws = 24000

Group-Level Effects:
~iso3c (Number of levels: 178)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.58 0.03 0.52 0.65 1.01 532 1582

~year (Number of levels: 9)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.07 0.02 0.04 0.12 1.00 3616 6245

Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -0.03 0.05 -0.12 0.06 1.02 169 521

Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.10 0.00 0.10 0.11 1.00 9786 14244


 Estimate   Est.Error      Q2.5     Q97.5

R2 0.9355031 0.008382389 0.9162093 0.9485666

In general, I would say no, you don’t want to drop the country-level variation. If you introduce predictors into the model that can explain that variation that is currently captured by the random intercepts, then the apparent between-country variation will decline, but the total explanatory strength of the model will remain approximately the same. But you need to account for the fact that observations from one country are not independent of each other (your model assumes conditionally independent residuals). The overall principle is that you should try to use the most complex random effects structure justifiable from the design (https://doi.org/10.1016%2Fj.jml.2012.11.001).

Note that the R^{2} may not be very interpretable for censored responses.


What do you mean by left censored? In your data 17 very small countries (from Antigua & Barbuda to Vianatu) have a flat y = -1.26 for all years and which suggests that these original observations were missing and were filled in with a default value (before all the y’s were standardized somehow). Or perhaps y is a measure of total economy and the tiny nations didn’t register on the scale. In either case, why not exclude them from the analysis?

Instead of ignoring the country information (which is not a good idea as explained by @AWoodward), have you considered grouping the countries into a hierarchy, eg. by continent though there might be a more natural grouping given the nature of your outcome.

A plot to illustrate the cens label and the overall variability in y:


This is a general comment meant to spark further discussion. Strictly speaking I’m not seeing why observations within country are correlated in a fundamental sense. Isn’t it really the case that country has an effect and ignoring that effect will result in a less well-fitting model? Can’t these “correlations” be handled alternatively by modeling country as fixed effects?


There are 162 countries excl. the 17 left censored ones. Here is a line plot by country & continent. On average African counties “perform worse” than the rest.

Another pattern is that many countries start with a dip and from 2014 onward y remains more or less stable. The Bayes R-squared is high because this pattern (+ a country-specific intercept) describes well the majority of countries. It wouldn’t make much difference whether the country effect is fixed or random because both versions can represent the fact the initial y in 2012 varies by country. Neither version can capture that a minority of countries never recover from the 2013 decline or recover very slowly. Perhaps the additional variables can explain that part.

1 Like

I think that this point cuts to the heart of an apparent confusion in the original post, and is worth elaborating on. This can be made intuitive:

Consider a regular Gaussian linear regression. We can think of this regression as containing a random term, namely the residual. Without any predictors at all, this random term always “explains” all of the variation. But when we introduce predictors with explanatory power, the model much prefers to attribute this variation to the predictors and not to the residual. Why is this? It’s because if we can narrow the standard deviation of the residual term while still fitting the data well, we get a higher likelihood. A tall skinny normal distribution places more probability density over its quantiles than does a short wide normal distribution.

The same thing is happening in the random effects model. Suppose we have a model like y ~ x + (1 | A) where there is just one value of x associated with each level of A. Then this model has the form:
\mu_j = a + bx_j + \epsilon_j
y_i = \mu_j + \mathcal{E}_i

where j refers to the level of A corresponding to observation i, and both \epsilon and \mathcal{E} are Gaussian. Notice that the first line has precisely the form of a linear regression. Again, the model “wants” to attribute all the variation it can to a + bx_j so that it can minimize the standard deviation of \epsilon.

The upshot is that even if (1|country) explained literally all of the variation, if there are explanatory covariates with predictive power, the model very much would prefer to attribute the variation to them, for the same reason that a linear regression prefers to attribute variation to covariate effects rather than the residual term.


My own limited understanding is that there’s no right or wrong answer, but it rather depends on the number of clusters (countries) and the goals of the modeling — if the countries are few and their effects are of interest per se, then the fixed-effects approach makes sense. But if there’s so many countries that giving each of them a “full” parameter results in an untenable signal-to-noise ratio, then it makes more sense to model them non-independently of each other.


Good thoughts. Again staying away from the correlation explanation as you did, you can handle the instability problem with fixed country effects by not treating them as exchangeable but by putting skeptical priors on country differences. This is primarily assuming a certain prior variance whereas random effects solve for the variance.