One of the advantages of multilevel modeling is of course predictions for new groups. For classical regression with group indicators (fixed effects), prediction for a new group is ill-defined. As a result, I would think that leave-one-out cross validation for such a classical model with groups of size 1 would also be ill-defined when the “one” is the only member of its group.

However, if I fit such a model with stan_glm() and then loo() I get an answer (after following the suggestion to set a k_threshold). So my question is: what is loo() approximating in this case?

Sorry, short on time, so a short answer - in all cases it is approximating Leave-one-out cross validation. This should make sense for most linear models. Could you elaborate on why you don’t think it would make sense for your model? Maybe @jonah has time to explain in more detail.

Thanks @martinmodrak let me clarify with an example. Consider the mtcars dataset:

mpg

carb

Mazda RX4

21.0

4

Mazda RX4 Wag

21.0

4

…

…

…

Maserati Bora

15.0

8

Volvo 142E

21.4

2

We can fit the unpooled varying intercept (fixed effects) model predicting mpg from carb with this formula:

mpg ~ factor(carb) - 1

Now suppose we were doing leave-one-out cross validation and the hold-out was “Maserati Bora” which is the only car in the dataset with carb=8. We’d have an issue:

Error in model.frame.default(Terms, newdata, xlev = object$xlevels) :
factor factor(carb) has new level 8

This is the issue with leave-one-out in this setting. But loo(fit, k_threshold=0.7) works. So my question is how should I think about the loo approximation in this setting?

Thanks for clarifying and sorry for taking so long to get back to you. First thing to note is that you can predict for new levels in a multilevel model - since you fit the sd of the intercepts, you just draw a new intercept from this fitted distribution.

Now you can’t predict for new levels of fixed effects, but I think what loo does will be similar. I would guess that loo assumes that you still have the level 8 in the model (there will be appropriate number of coefficients in the parameters) but you don’t observe any data for it, so it will essentially be drawn from its prior. This is the same thing that would happen if you coded your model manually in Stan and provided no data for one fixed predictor.

I would however ask @avehtari to check my reasoning, if he’s not busy.

Thanks @martinmodrak, your suggestion that the coefficient for the new level’s indicator variable is simply equal to the prior sounds right! The default prior for the coefficient of a centered binary variable in stan_glm() is currently \text{N}(0, 2.5^2\cdot \text{var}(y)). Since the indicator was effectively zero in the training data in this case, perhaps loo() takes zero to be the center… is that right @avehtari?