The radon case study and hierarchical multi-level models when individual and group-level covariates correlated

Forgive me if this has been talked about before, but I’m a little confused. The radon case study says that

Note that the model has both indicator variables for each county, plus a county-level covariate. In classical regression, this would result in collinearity. In a multilevel model, the partial pooling of the intercepts towards the expected value of the group-level linear model avoids this.

I was trying to understand this statement and googling around for the topic and came across this white paper called Fitting Multilevel Models When Predictors and Group Effects Correlate that suggests that, actually, this collinearity is a problem but that one way to deal with it is to include the average of all of the individual-level predictors as another group-level covariate. The radon case study goes on to illustrate how to do this, but at this point in the case study they haven’t broached the topic and yet they suggest that the model’s hierarchical nature deals with it automatically.

Which source is correct? I suspect the case study should just have that line removed or a forward-reference inserted…

Just from memory.

  • in the radon case, we have a group level covariate and group level indicators. Without partial pooling of the indicators, the group level covariate will not contain more information than the indicators and your regression is over identified (I.e. Multiple collinearity problems.) This is an estimation problem.

  • in the white paper, we deal with individual level predictors. If the individual level predictor average is different for the different groups, the beta of the individual level predictor is a combination of between group and within group effects (see Andrew’s work on income, state, and voting). This is an interpretation problem.

1 Like

This is another issue where you get a soft solution with the hierarchical model. That’s why we usually code things up the way Chris coded them in the radon case study (following Andrew and Phil’s original model).

The Gelman and Hill book’s the thing to read. You could kill two birds with one stone by reading the draft of the second edition and giving them feedback. This model is one of the sections (as is the similar red-state, blue-state model for binary outcomes).

Appreciate this is coming quite a while after the previous posts but it felt a more natural home than starting a new thread.

In the Stan case study on radon it’s mentioned that:

…having predictors at multiple levels can reveal correlation between individual-level variables and group residuals.

This appears to be following Gelman 2006 where it’s stated that:

However, a complication arises if we consider the possibility of correlation between the individual-level predictor, x, and the county-level error…

My question is two-fold :

  1. For this specific example, aside from fitting the model and observing that the coefficient term for the average number of basements in a county is non-zero, how do we diagnose this as a problem (/is it even a problem?)?
  2. How do we do this in general?

For question 1. above if I plot the county level residuals against either (i) whether a house measurement was in the basement or not or (ii) against the \bar{x}_j (the average number of basements in a county) then I don’t see any obvious correlation.

This is puzzling me - any help greatly appreciated.