I have response categories 1,2,3,4 and a list of predictors. I want to use cross-validation to test my model fit. I’m doing cross validation, rather than loo because I need to leave out all instances belonging to the same participant. This is the first time of me using cross-validation with this type of model, so I would really appreciate if I could get some advice on whether I’m doing the right thing.
Here is the model I’ve used. I did not include a hierarchical term - each participant has the same 4 data points (1 of each of the 4 categories) and I have removed for each predictors the average within each person (because across people the averages were very different). I did this because I’m not interested in the regression weights (e.g. whether they are significant), but only in whether the overall model has predictive power in the test set.
my.formula=as.formula('Categs ~ 1+ a+ b+ c+ d) # to give the gist
fit = brm(formula=my.formula,
data=df,
prior=my.priors,
iter=4000,control=list(adapt_delta=0.9),
family=cumulative(link='probit'))
And then I tried to implement k-fold cross-validation like this
kf = kfold(fit,folds=df$FoldID)
Where df$FoldID is for each entry which fold they belong to (I’ve generated that manually to make sure I’ve respected the grouping). [for now I just had two participants, but of course will have more…]
I get a result like this:
Based on 2-fold cross-validation
Estimate SE
elpd_kfold -6.6 0.2
p_kfold 0.0 0.1
kfoldic 13.2 0.5
I am now not quite sure how to know what this error means. How is the error computed for 4 monotonic groups? When I tried running predictive_error, it said “Error: Predictive errors are not defined for ordinal or categorical models.” But then I’m not sure what the k-fold cross-validation is computing?
I’m thinking I’ll compute the same for an empty model (intercept only) and then compare the models?