Debugging (trace) with loo functions

mtenan · March 5, 2019, 9:53pm

Hello,
When working with stanreg objects in R, and specifically looking at functions in loo, is there a reason why debugging functions such as ‘trace’ shouldn’t work as wrappers?

I’ve been trying to figure out why kfold() is generating a series of very general errors (e.g. ‘eval(predvars, data, env)’), but it’s shooting in the dark without knowing what part of the kfold() function is throwing the actual error. I’m working on a reproducible version of my issue (my data is confidential and has PII), but right now I’m just trying to figure out why I could generate errors but not use traceback to determine what part of kfold has the error. Thanks!

bgoodri · March 5, 2019, 11:29pm

Can you post the output of traceback() after the error happens?

mtenan · March 6, 2019, 12:37am

Thanks for the quick response. Apologies because I was clearly having brain flatulance and having trouble with trace() and then misusing traceback().

Below is the code which is throwing “Error in eval(predvars, data, env) : object ‘EDPI’ not found” and the associated traceback().

fit1a <- stan_gamm4(ASESaffected_total_scr ~ te(pfnc_wt, pnit_wt) + s(ases_days_since) +
GENDER + ageyr ,
family = gaussian(), data = as.data.frame(complete_ASESdat_wt), iter= 4000, adapt_delta = 0.99,
random = ~(1 |EDPI))

dat <- fit1a[[‘data’]]
folds <- kfold_split_stratified(K=10, x = dat$EDPI)
kfold(fit1a, K=10, folds = folds)

traceback()

19: eval(predvars, data, env)
18: eval(predvars, data, env)
17: model.frame.default(data = data, subset = subset, weights = weights,
na.action = na.action, drop.unused.levels = TRUE, formula = ASESaffected_total_scr ~
…
16: stats::model.frame(data = data, subset = subset, weights = weights,
na.action = na.action, drop.unused.levels = TRUE, formula = ASESaffected_total_scr ~
…
15: eval(mf, parent.frame())
14: eval(mf, parent.frame())
13: lme4::lFormula(formula = as.formula(form), data = data, subset = subset,
weights = weights, na.action = na.action, control = make_glmerControl())
12: eval(mc, parent.frame())
11: eval(mc, parent.frame())
10: lme4::glFormula(as.formula(form), data, family = gaussian, subset,
weights, na.action, control = make_glmerControl())
9: stan_gamm4(formula = ASESaffected_total_scr ~ te(pfnc_wt, pnit_wt) +
s(ases_days_since) + GENDER + ageyr, random = ~(1 | EDPI),
…
8: eval(fit_k_call)
7: eval(fit_k_call)
6: eval(expr, pf)
5: eval(expr, pf)
4: withVisible(eval(expr, pf))
3: evalVis(expr)
2: capture.output(fit_k <- eval(fit_k_call))
1: kfold(fit1a, K = 10, folds = folds)

Am I mis-specifying somewhere?

bgoodri · March 6, 2019, 4:06am

Can you re-create your problem with any public example? I was able to do

data(Orthodont, package = "nlme4")
test <- stan_gamm4(distance ~ s(age, k = 2), data = Orthodont) 
library(loo)
folds <- kfold_split_stratified(K = 2, Orthodont$Sex)
test2 <- kfold(test, K = 2, folds = folds)

and it seemed to work apart from warnings about divergent transitions.

I think the problem may be that you are stratifying by the variable used to allow the intercept to vary by, in which case there is no variation on it in the stratified samples. If that is the problem, you could call

fit1a <- stan_gamm4(ASESaffected_total_scr ~ te(pfnc_wt, pnit_wt) + s(ases_days_since) + GENDER + ageyr ,
family = gaussian(), data = as.data.frame(complete_ASESdat_wt), iter= 4000, adapt_delta = 0.99)

without the random term and then call kfold_split_stratified with K equal to the number of levels of EDPI, followed by kfold.

mtenan · March 6, 2019, 11:29am

I really can’t alter the structure of the model itself as you suggest, for theoretical reasons it should be nested with at least a two-level model. This is actually a simplified model before I work with higher-order models. I’m also assessing these stan models with the ‘standard’ gamm4 models I’ve already built.

I’m not re-creating the exact error, but I suspect they may be related. Here is a worked example:

N <- 300 #Number of students per school
sigma <- 200

x1 <- runif(N, 10, 40)
x2 <- runif(N, 25, 55)
x3 <- runif(N, 40, 70)
x4 <- runif(N, 55, 85)
x5 <- runif(N, 70, 100)

y1 <- 20 + 0 * x1 + rnorm(N, 0, sigma)
y2 <- 40 + 10 * x2 + rnorm(N, 0, sigma)
y3 <- 60 + 20 * x3 + rnorm(N, 0, sigma)
y4 <- 80 + 30 * x4 + rnorm(N, 0, sigma)
y5 <- 100 + 40 * x5 + rnorm(N, 0, sigma)

ID <- rep(LETTERS[1:20], each = N)

test <- data.frame(SES = c(x1, x2, x3, x4, x5),
Score = c(y1, y2, y3, y4, y5), ID = ID)

HLM0 <- stan_gamm4(Score ~ s(SES), family = gaussian(), random = ~(1 | ID), data = test)

strat.test <- kfold_split_stratified(K=10, x= test$ID)
kfold(HLM0, K=10, folds = strat.test)

I’ve been unable to find a worked example of a multilevel model which uses kfold(), which would help with my doing it correctly. I can always write my own code to do cross-validation, I just assumed the built in loo functions would be more efficient.

bgoodri · March 6, 2019, 3:10pm

If you K-fold by the ID variable, then it is not going to have any variation within each fold to estimate the ~(1 | ID) terms (or their variance) with. The kfold_split_stratified variable is more intended to be used on a variable constructed to ensure that each of the folds has a sufficiently large number of observations on all levels of ID.

Topic		Replies	Views
Error using kfold on stan_gamm4 object Modeling loo	6	463	May 13, 2020
Help with a very simple example of k-fold cross-valiation with loo Modeling loo	2	829	April 5, 2018
Loo moment matching not working General rstan , loo , r	9	1185	February 2, 2021
Loo throwing error when computing ELPDs for problematic observations rstanarm loo	6	1031	May 16, 2018
Errors when calculating PSIS-LOO CV in loo version 2.2.0 Modeling fitting-issues , loo	7	1080	January 2, 2020

Debugging (trace) with loo functions

Related Topics