I fitted a shifted lognormal model to response times. Adding WAIC and LOO to this model works, albeit with one (or more) datapoints with a high Pareto k.
However, refitting the model without the problematic data points leads to NaN/NA estimates for LOOIC, etc.
m.int <- add_criterion(m.int, c("waic", "loo"), reloo=T)
Computed from 15000 by 6512 log-likelihood matrix
elpd_loo NaN NA
p_loo NaN NA
looic NaN NA
Checking the pointwise values showed that that one particular data point was NA, and it turned out to be the fastest RT. (this was also the datapoint that had a very high Pareto k without reloo).
Perhaps this has something to do with the ndt parameter of the shifted lognormal?
How can I obtain a pointwise value for the model without this datapoint, and thus a LOOIC for the model?
Please also provide the following information in addition to your question:
- Operating System: Linux
- brms Version: 2.10.0
You can use the newdata argument to provide the data you want to get predictions for.
You mean completely excluding this data point when computing loo?
m.int <- add_criterion(m.int, c("waic", "loo"), reloo=T, newdata=mydf[-which.min(mydf$rt),])
You asked how to exclude this data point. This is how you do it. If you can provide a minimal reproducible example I can also check if there is a bug in the code that leads to the NA value.
Thank you. This gives me the NA, tried it several times:
rtdata <- data.frame(rt = c(rlnorm(4999, 5.85, 0.4)+350, 300))
summary(m1 <- brm(rt ~ 1, rtdata, family=shifted_lognormal()))
m1 <- add_criterion(m1, "loo", reloo=T)
Thanks! Will take a look tomorrow.
The problem ist that the shifted lognormal has a hard lower boundary depending on the data, which is the shift parameter. If you refit the model, at the new data is below that data defined lower boundary or below some posterior sample of the shift parameter to be precise, it will cause and NA. I am not entirely sure what to do about that to be honest right now.
Could you make it user defined value?
The shift is a parameter (ndt) which can either be fixed or estimated and whose prior you can manually set. If you force it to be below a certain value using a prior (or fixation) prediction should work just fine.
Thanks. The solutions I’m seeing atm (given my current knowledge) are either a) to remove this one datapoint altogether (the fastest RT), or b) to fit models on all the data, but compute loos without this datapoint. Removing the datapoint leads to an estimated ndt that is quite a bit larger (from 300 to 350ms); better pp_checks; and no problems computing loo. So I’m leaning towards option a).
Thanks for the help, Paul.