Hello,
I have use the kfold function in brms, and I don’t know it just split the dataset into 5 folds, and use each fold to fit data;
or work as the kfold cv in machine learning, which split into training dataset and testing dataset? Thanks!
Hello,
I have use the kfold function in brms, and I don’t know it just split the dataset into 5 folds, and use each fold to fit data;
or work as the kfold cv in machine learning, which split into training dataset and testing dataset? Thanks!
When using the kfold method in brms or rstanarm then if K=5 it splits the data into 5 sets and uses each of them as the test data once. That is, it will fit the model 5 times, each time leaving out one of the 5 sets and then evaluating how well it can predict the left out set using the model fit to the rest of the data.
Hope that helps!
Thank you Jonah.
Can I ask another question? If I use kfold_predict, when I select method ‘predicted’, then it uses the holdout data, and when I use method ‘fitted’, then it uses training data, am I right? Thanks
I think the distinction between predicted
and fitted
is different than that. Let’s take linear regression for example. In that case the fitted
values would be alpha + X * beta
(ignoring sigma
), whereas predicted
would draw from normal(alpha + X * beta, sigma)
. In both cases kfold_predict
is presumably building up a set of combined predictions/fitted values by taking predictions/fitted values from each of the K models fit. @paul.buerkner Is that right?
But this is still being done in terms of the log-predictive density, not actual predicted values like one might do in a typical machine learning setting?
Hello, I have checked the source code of the function kfold_predict:
function (x, method = c("predict", "fitted"), resp = NULL, ...)
{
if (!inherits(x, "kfold")) {
stop2("'x' must be a 'kfold' object.")
}
if (!all(c("fits", "data") %in% names(x))) {
stop2("Slots 'fits' and 'data' are required. ", "Please run kfold with 'save_fits = TRUE'.")
}
method <- get(match.arg(method), mode = "function")
resp <- validate_resp(resp, x$fits[[1, "fit"]], multiple = FALSE)
all_predicted <- as.character(sort(unlist(x$fits[, "predicted"])))
npredicted <- length(all_predicted)
nsamples <- nsamples(x$fits[[1, "fit"]])
y <- rep(NA, npredicted)
yrep <- matrix(NA, nrow = nsamples, ncol = npredicted)
names(y) <- colnames(yrep) <- all_predicted
for (k in seq_rows(x$fits)) {
fit_k <- x$fits[[k, "fit"]]
predicted_k <- x$fits[[k, "predicted"]]
obs_names <- as.character(predicted_k)
newdata <- x$data[predicted_k, , drop = FALSE]
y[obs_names] <- get_y(fit_k, resp, newdata = newdata,
...)
yrep[, obs_names] <- method(fit_k, newdata = newdata,
resp = resp, allow_new_levels = TRUE, summary = FALSE,
...)
}
nlist(y, yrep)
}
It seems that all predictions are based on predicted dataset (testing dataset).
And method has no relationship with the dataset predicted…Thanks
[edit: escaped code]
Using log density (or square loss) is a proper scoring rule. That’s a good thing if you care about probabilistic prediction.
0/1 loss is improper. What you often see in ML is systems trained on log loss (penalized MLE or MAP) then evaluated on 0/1 loss, sometimes with sweeping thresholds to give you AUC.