Specify grouping factor for brms kfold cross-validation

abartonicek · June 1, 2020, 10:04pm

I’m comparing several different statistical & machine learning techniques & want to evaluate all of them on cross-validated predictive performance, with the cross-validation folds being made based on the levels of a variable in the data (= site).

One of the models is a Bayesian “ridge” regression model with a hierarchical Gaussian prior on the predictor slopes. I want to cross-validate it via the kfold.bmsfit function. However, I’m getting the following error when I try to run the function:

Error: Group 'SITE_ID_L' is not a valid grouping factor. Valid groups are: ''

I don’t really understand what that means. I’ve made sure that the variable is in the data when I’m fitting the model, however, since it’s not a predictor, the model object doesn’t contain the variable in the “data” element after it’s fitted. Could that be the issue? Or is it something completely different?

avehtari · June 5, 2020, 12:42pm

Pinging @paul.buerkner who is most likely to be able to answer but has probably missed this (or happens to be busy this week).

paul.buerkner · June 5, 2020, 12:50pm

Hey, sorry I missed this post. Can you provide a minimal reproducible example for the problem?

abartonicek · June 6, 2020, 4:29am

Thank you Paul, here’s a reprex:

library(tidyverse)
library(brms)

df <- map(1:20, ~ rnorm(200, 0, 1)) %>% 
  bind_cols() %>%
  mutate(site = sample(letters[1:5], 200, replace = TRUE))

fit1 <- brm(V1 ~ . - site, data = df)

kfold(fit1, K = 5, folds = 'grouped', group = 'site')

paul.buerkner · June 6, 2020, 8:18am

You can only group by variables that have been in the model. This is why brms complains.

abartonicek · June 6, 2020, 10:14pm

Right, that’s what I thought might be the problem. So does the variable always need to be included as a predictor? Or is there some way of getting around it?

paul.buerkner · June 7, 2020, 12:29am

The variable needs to be included in the model for group to be used. However, you can build your folds manually and pass them via the folds argument. That way, you can specify any partitioning you want.

Topic		Replies	Views
What is the meaning of 'group' in kfold funciton of brms package? brms loo	2	640	October 9, 2019
Grouped kfold return NaN Modeling	4	510	July 1, 2023
Kfold.brmsfit doesn't work when model includes t2 smoothing term Interfaces loo , brms	6	549	February 13, 2023
Cross-validation for cumulative model in brms brms	4	622	May 10, 2023
Cross-validation with group-specific variables Modeling loo , brms	4	423	January 23, 2023

Specify grouping factor for brms kfold cross-validation

Related topics