LOO NA estimates

jverissimo · December 10, 2019, 8:08pm

I fitted a shifted lognormal model to response times. Adding WAIC and LOO to this model works, albeit with one (or more) datapoints with a high Pareto k.
However, refitting the model without the problematic data points leads to NaN/NA estimates for LOOIC, etc.

m.int <- add_criterion(m.int, c("waic", "loo"), reloo=T)
loo(m.int)

Computed from 15000 by 6512 log-likelihood matrix
          Estimate SE
elpd_loo       NaN NA
p_loo          NaN NA
looic          NaN NA

Checking the pointwise values showed that that one particular data point was NA, and it turned out to be the fastest RT. (this was also the datapoint that had a very high Pareto k without reloo).

> which(is.na(loo(m.int$pointwise[,"looic"]))
[1] 2791
> which.min(mydf$rt)
[1] 2791

Perhaps this has something to do with the ndt parameter of the shifted lognormal?
How can I obtain a pointwise value for the model without this datapoint, and thus a LOOIC for the model?

Please also provide the following information in addition to your question:

Operating System: Linux
brms Version: 2.10.0

paul.buerkner · December 10, 2019, 8:12pm

You can use the newdata argument to provide the data you want to get predictions for.

jverissimo · December 10, 2019, 8:26pm

Thanks, Paul.
You mean completely excluding this data point when computing loo?

m.int <- add_criterion(m.int, c("waic", "loo"), reloo=T, newdata=mydf[-which.min(mydf$rt),])

paul.buerkner · December 10, 2019, 8:28pm

You asked how to exclude this data point. This is how you do it. If you can provide a minimal reproducible example I can also check if there is a bug in the code that leads to the NA value.

jverissimo · December 10, 2019, 9:31pm

Thank you. This gives me the NA, tried it several times:

rtdata <- data.frame(rt = c(rlnorm(4999, 5.85, 0.4)+350, 300))
summary(m1 <- brm(rt ~ 1, rtdata, family=shifted_lognormal()))
m1 <- add_criterion(m1, "loo", reloo=T)
loo(m1)

paul.buerkner · December 10, 2019, 9:35pm

Thanks! Will take a look tomorrow.

paul.buerkner · December 11, 2019, 7:35am

The problem ist that the shifted lognormal has a hard lower boundary depending on the data, which is the shift parameter. If you refit the model, at the new data is below that data defined lower boundary or below some posterior sample of the shift parameter to be precise, it will cause and NA. I am not entirely sure what to do about that to be honest right now.

avehtari · December 11, 2019, 2:03pm

Could you make it user defined value?

paul.buerkner · December 11, 2019, 2:10pm

The shift is a parameter (ndt) which can either be fixed or estimated and whose prior you can manually set. If you force it to be below a certain value using a prior (or fixation) prediction should work just fine.

jverissimo · December 11, 2019, 5:27pm

Thanks. The solutions I’m seeing atm (given my current knowledge) are either a) to remove this one datapoint altogether (the fastest RT), or b) to fit models on all the data, but compute loos without this datapoint. Removing the datapoint leads to an estimated ndt that is quite a bit larger (from 300 to 350ms); better pp_checks; and no problems computing loo. So I’m leaning towards option a).
Thanks for the help, Paul.

Topic		Replies	Views
LOO error brms loo	17	1995	July 24, 2019
Model misspecified or only weakly predictive? Modeling specification , loo , brms	23	1214	May 24, 2023
Model comparison in latent variable models brms loo	1	1049	May 14, 2021
Loo_compare in the presence of high pareto-k brms loo	4	263	June 25, 2024
WAIC or PSIS-Loo when using the optimizing function Modeling loo	10	878	August 9, 2020

LOO NA estimates

Related topics