Regression model with indicators for groups of size 1: what does loo() approximate?

potash · February 2, 2020, 7:45pm

One of the advantages of multilevel modeling is of course predictions for new groups. For classical regression with group indicators (fixed effects), prediction for a new group is ill-defined. As a result, I would think that leave-one-out cross validation for such a classical model with groups of size 1 would also be ill-defined when the “one” is the only member of its group.

However, if I fit such a model with stan_glm() and then loo() I get an answer (after following the suggestion to set a k_threshold). So my question is: what is loo() approximating in this case?

martinmodrak · February 6, 2020, 10:35am

Sorry, short on time, so a short answer - in all cases it is approximating Leave-one-out cross validation. This should make sense for most linear models. Could you elaborate on why you don’t think it would make sense for your model? Maybe @jonah has time to explain in more detail.

potash · February 6, 2020, 6:45pm

Thanks @martinmodrak let me clarify with an example. Consider the mtcars dataset:

	mpg	carb
Mazda RX4	21.0	4
Mazda RX4 Wag	21.0	4
…	…	…
Maserati Bora	15.0	8
Volvo 142E	21.4	2

We can fit the unpooled varying intercept (fixed effects) model predicting mpg from carb with this formula:

mpg ~ factor(carb) - 1

Now suppose we were doing leave-one-out cross validation and the hold-out was “Maserati Bora” which is the only car in the dataset with carb=8. We’d have an issue:

train <- mtcars %>% filter(carb != 8)
test <- mtcars %>% filter(carb == 8)

fit <- stan_glm(mpg ~ factor(carb) - 1, data=train)
posterior_predict(fit, newdata=test)

Which appropriately gives the error:

Error in model.frame.default(Terms, newdata, xlev = object$xlevels) : 
  factor factor(carb) has new level 8

This is the issue with leave-one-out in this setting. But loo(fit, k_threshold=0.7) works. So my question is how should I think about the loo approximation in this setting?

martinmodrak · February 12, 2020, 12:36pm

Thanks for clarifying and sorry for taking so long to get back to you. First thing to note is that you can predict for new levels in a multilevel model - since you fit the sd of the intercepts, you just draw a new intercept from this fitted distribution.

Now you can’t predict for new levels of fixed effects, but I think what loo does will be similar. I would guess that loo assumes that you still have the level 8 in the model (there will be appropriate number of coefficients in the parameters) but you don’t observe any data for it, so it will essentially be drawn from its prior. This is the same thing that would happen if you coded your model manually in Stan and provided no data for one fixed predictor.

I would however ask @avehtari to check my reasoning, if he’s not busy.

potash · February 12, 2020, 8:29pm

Thanks @martinmodrak, your suggestion that the coefficient for the new level’s indicator variable is simply equal to the prior sounds right! The default prior for the coefficient of a centered binary variable in stan_glm() is currently \text{N}(0, 2.5^2\cdot \text{var}(y)). Since the indicator was effectively zero in the training data in this case, perhaps loo() takes zero to be the center… is that right @avehtari?

avehtari · February 13, 2020, 11:11am

Yes, if the likelihood contribution for some parameter is removed then sampling is using just the prior.

Topic		Replies	Views
Leave-group-out LOO question General loo	4	523	June 16, 2020
Using `loo` for clustered data General loo , validation , cross-validation	1	531	September 12, 2022
To use or not to use the assignement to a hierarchical group as a feature for exact LOO and LOO-PSIS Modeling loo	6	910	October 14, 2020
Detection of influential or outlier groups in multilevel modelling General	1	400	November 23, 2022
Cross-validation with group-specific variables Modeling loo , brms	4	424	January 23, 2023

Regression model with indicators for groups of size 1: what does loo() approximate?

Related topics