Cross-validation for cumulative model in brms

I have response categories 1,2,3,4 and a list of predictors. I want to use cross-validation to test my model fit. I’m doing cross validation, rather than loo because I need to leave out all instances belonging to the same participant. This is the first time of me using cross-validation with this type of model, so I would really appreciate if I could get some advice on whether I’m doing the right thing.

Here is the model I’ve used. I did not include a hierarchical term - each participant has the same 4 data points (1 of each of the 4 categories) and I have removed for each predictors the average within each person (because across people the averages were very different). I did this because I’m not interested in the regression weights (e.g. whether they are significant), but only in whether the overall model has predictive power in the test set.

my.formula=as.formula('Categs ~ 1+ a+ b+ c+ d) # to give the gist
fit = brm(formula=my.formula,
              data=df,
              prior=my.priors,
              iter=4000,control=list(adapt_delta=0.9),
              family=cumulative(link='probit'))

And then I tried to implement k-fold cross-validation like this

kf = kfold(fit,folds=df$FoldID)

Where df$FoldID is for each entry which fold they belong to (I’ve generated that manually to make sure I’ve respected the grouping). [for now I just had two participants, but of course will have more…]

I get a result like this:

Based on 2-fold cross-validation

           Estimate  SE
elpd_kfold     -6.6 0.2
p_kfold         0.0 0.1
kfoldic        13.2 0.5

I am now not quite sure how to know what this error means. How is the error computed for 4 monotonic groups? When I tried running predictive_error, it said “Error: Predictive errors are not defined for ordinal or categorical models.” But then I’m not sure what the k-fold cross-validation is computing?

I’m thinking I’ll compute the same for an empty model (intercept only) and then compare the models?

What do you mean by error? The result you posted shows the expected log predictive density and standard error of that estimate. Maybe you want to use some other utility or cost function than log score?

Can you show the code?

Do you have a question here?

You may also find some useful documentation in loo package documentation

k-fold needs to be able to compute a log-likelihood associated with the held-out fold. For some likelihoods, including Gaussian models, the log-likelihood is closely related to the predictive error. For other likelihoods, this relationship is not so straightforward. And in some likelihoods, as with this ordinal model, the predictive error isn’t even well defined… but the log-likelihood still is.

Ah, that’s very helpful. Based on having done 2-class predictions before, I thought the right metric would be the (balanced) accuracy and then I wasn’t sure how that would translate to more than two ordered categories.

Do you know whether the log-likelihood takes into account the ordering? E.g. if the true category is 1 (out of 1<2<3<4), and there is a lot of predictive density (if that’s the word) at 2 vs. at 4, does that affect the log likelihood? (i.e. worse log likelihood for a lot of density at 4 than at 2 because it’s ‘more wrong’)

The log likelihood is affected only by the predictive mass (in a model with a continuous response this would be a predictive density rather than a predictive mass) over the observed response, and is precisely the logarithm of this predictive mass.

1 Like