Hi
I’ve successfully fitted two brms models, where I’m modelling canopy area as a function of different non-linear functions (an example is shown below). Each model has converged, with few (if any) divergent iterations, and each parameter has good effective sample sizes.
However, when I try to do a group split kfold validation I’m getting NANs (see below) and I’m not sure why.
packageVersion("brms")
[1] ‘2.18.0’
packageVersion("loo")
[1] ‘2.5.1’
packageVersion("cmdstanr")
[1] ‘0.5.3’
An example of the model I’m fitting looks like this
out <- brms::brm(
bf(
Canopy ~ log(Asym/(1+ exp(-beta * (Growth_years - Tmax)))),
beta ~ 1 + (1|Scientific) + street_tree,
Tmax ~ 1 + (1|Scientific) + street_tree,
Asym ~ 1 + (1|Scientific) + street_tree,
nl = TRUE),
prior =
prior(normal(200, 100),lb=0.001, nlpar ="Asym") +
prior(normal(0.01,100), lb=0.001, nlpar="Tmax") +
prior(normal(0,1), lb = 0, nlpar="beta"),
control = list(adapt_delta = 0.99, max_treedepth = 15),
family = lognormal,
backend = "cmdstanr",
threads = 2,
init = 0,
data = eucalyptus_growth_dat,
chains = 3,
cores = 3)
Where, canopy is the observed canopy area of an individual tree, street_tree is a binary variable and Growth_years is the estimated age of the individual.
As stated above, the model seems to fit with minimal issues (apart from it taking an age to finish sampling; there is over 40,000 rows of data). Parameter estimates also look biologically plausible.
What I want to do is examine the predictive capacity between different model variants by using a split group kfold approach, where the interest is assessing how well the model predicts withheld random effect groups. To do that I specify this using the following:
brms::kfold(x = out,
K = 5,
folds = "grouped",
group = "Scientific")
Which, if I am correct in equivalent to doing something like:
brms::kfold(x = eucalypt_negexp_model,
K = 5,
folds = loo::kfold_split_grouped(K = 5,x = eucalyptus_growth_dat$Scientific))
However, when I run this type of cross validation, I get the following outcome with no error or warning messages:
Based on 5-fold cross-validation
Estimate SE
elpd_kfold NaN NA
p_kfold NaN NA
kfoldic NaN NA
I’m not sure what is causing this… though it might be related to the severe unevenness of the folds.
e.g.
1 2 3 4 5
3084 2504 14119 986 20907
I’ve looked at each of these folds and there appears to be reasonable variability in the other data input parameters (e.g. growth_years & street_tree). I’ve even managed to individually fit each of these fold subsets without running into convergence issues. I’m assuming that under the hood there is a problem with the level of unevenness among folds. Though when I’ve tried to replicate this unevenness in folds in mock datasets I’m not running into this issue. Unfortunately I can’t share the data. Any tips or advice?
Oh and incase your interested this is the part of the pointwise samples
elpd_kfold p_kfold kfoldic
[1,] -4.539832 3.795695e-01 9.079664
[2,] -4.605903 3.846158e-01 9.211806
[3,] -5.111670 2.462674e-01 10.223340
[4,] -5.059898 3.184598e-01 10.119795
[5,] -5.220611 2.450830e-01 10.441223
[6,] -5.311285 2.091175e-01 10.622569
[7,] -4.737715 4.543159e-01 9.475429
[8,] -4.828914 4.463188e-01 9.657829
[9,] -4.673482 4.691005e-01 9.346964
[10,] -5.298323 2.176782e-01 10.596645
[11,] -4.792010 5.030404e-01 9.584020
[12,] -4.881061 4.830266e-01 9.762122
[13,] -4.788471 5.238636e-01 9.576942
[14,] -4.787594 5.329466e-01 9.575187
[15,] -4.811155 5.378390e-01 9.622310
[16,] -4.788929 5.541308e-01 9.577859
[17,] -9.743180 -4.613748e+00 19.486359
[18,] -10.342949 -4.544968e+00 20.685898
[19,] -4.615625 -3.565669e-01 9.231250
[20,] -3.992905 3.770351e-01 7.985810
[21,] -5.231787 2.939448e-01 10.463575
[22,] -5.240638 1.869140e-01 10.481277
[23,] -5.210137 2.735241e-01 10.420273
[24,] -4.907122 5.067121e-01 9.814244
[25,] -5.121841 3.715039e-01 10.243681
[26,] -4.253101 4.649373e-01 8.506202
[27,] -4.337621 5.543479e-01 8.675243
[28,] -4.392189 4.400904e-01 8.784379
[29,] -4.517743 5.398474e-01 9.035485
[30,] -3.475455 2.589708e-01 6.950910
[31,] -4.020786 5.061724e-02 8.041572
[32,] -4.666528 3.760590e-01 9.333056
[33,] -4.961022 3.242280e-01 9.922045
[34,] -4.521691 3.749092e-01 9.043382
[35,] -4.541400 3.485298e-01 9.082800
[36,] -4.621729 4.402935e-01 9.243458
[37,] -5.096600 3.269840e-01 10.193199
[38,] -4.766069 4.856513e-01 9.532138
[39,] -5.971669 -5.074343e-01 11.943338
[40,] -5.367172 7.109470e-02 10.734345
[41,] -4.992893 3.610960e-01 9.985787
[42,] -4.616530 1.087240e-01 9.233060
[43,] -5.919764 -1.681431e-01 11.839528
[44,] -6.410071 -4.884728e-01 12.820142
[45,] -6.070530 -2.773045e-01 12.141059
[46,] -6.957099 -8.632811e-01 13.914197
[47,] NaN NaN NaN
[48,] NaN NaN NaN
[49,] NaN NaN NaN
[50,] NaN NaN NaN
[51,] NaN NaN NaN