Error using Loo with moment matching

Loo works fine (without moment matching), but when I run loo with moment matching, I get the following error.

Error in object@.MISC$stan_fit_instance$log_prob(upars, adjust_transform, :
Exception: lognormal_lpdf: Random variable[551] is -3.05044, but must be nonnegative! (in ‘anon_model’, line 78, column 4 to column 50)
Error: Moment matching failed. Perhaps you did not set ‘save_pars = save_pars(all = TRUE)’ when fitting your model?

Data and code for a reprex are below.

I have seen some prior errors on this forum and therefore I was trying to do moment matching in the same r session that the model was built in (Loo moment matching not working), and I was also using the dev versions of loo and brms (Loo moment_match problem).

I just wondered if there was anything obvious that stands out? I am not particularly experienced with brms, loo and stan, hence I expect it could be something basic.

Any advice would be appreciated.

Regards,
Rich

This example only uses approx. 20% of the full dateset and it is only modelling intercepts and variations across participants and items. But hopefully it is sufficient to detail the problem. A sample of the dataset is attached.

Formula & Priors

formula = bf(afc_rtms ~ 1 +
            (1 | pID) +
            (1 | item_left) +
            (1 | item_right),
            ndt ~ (1 | pID))

priors = c(
 set_prior('normal(6.68, 0.5)', class = 'Intercept'),  # 
 set_prior('normal(5.70, 0.5)', class = 'Intercept', dpar = 'ndt'),  # 
 set_prior('normal(0, 0.1)', class = 'sd'), # 
 set_prior('normal(0, 0.1)', class = 'sd', dpar='ndt'),
 set_prior('normal(0, 0.1)', class = 'sigma') #
)

Model and loo

b0 <- brm(formula = formula,
       data = data_sample, family = shifted_lognormal(),
       prior = priors,
       iter = 4000, warmup = 1000, cores = 4, chains = 4,
       control = list(adapt_delta = 0.99, max_treedepth = 15),
       save_pars = save_pars(all=TRUE),
       seed = 123,
       init_r = 0.1,
       file = "models/b0")
summary(b0)

loo0 <- loo(b0) 
This works.

loo0m <- loo(b0, moment_match = TRUE)
This gives me the above error.

[data_sample.csv|attachment](upload://lLfCMVWQ9CTPYgzC84sQUq2C1F2.csv) (84.6 KB) 
  • Operating System: Mac OS 10.15.7
  • brms Version: 2.15.0 / Development version

I don’t seem to be able to download your data.

Hi Britta,

That’s odd. Sorry about that. I hope it works this time.data_sample.csv (84.6 KB)

Thanks for the quick reply

Rich

1 Like

Yes this works, thanks!

I think I have a conceptual idea of why it happens, but cannot offer a solution.
I did encounter a similar problem in my data.

The problem is that the shifted_lognormal distribution is very sensitive to the lowest data point.

[I was too impatient to let your model run for all 4000 iterations, so I only let it run for 500 iterations; I don’t think this changes the arguments I am making].

Let us look at the problematic observations a little closer and add a column to our data with the influence_pareto_k from loo0

pw = as_data_frame(loo0$pointwise)

data_sample$influence = pw$influence_pareto_k

Now plot:

require(ggplot2)

ggplot(data_sample, aes(x = pID, y = afc_rtms, color = influence))+
  geom_jitter(height = 0, width = 0.2, alpha = 0.9)

on the x-axis, it is pID. I don’t know what that stands for, but for each pID, its own ndt is fitted. on the y-axis, we have the variable your model tries to predict. the color is pareto_k influence; the lighter the shade the worse the approximation.

So we can see that all the problematic observarions are data points that are well below the rest of their group.

loo is trying to approximate leave-one-out cross validation.
Leave one data point out, fit the model again, and then see how probable that model thinks the left out data point would be.

With a shifted lognormal model, all data below ndt are impossible. so when we leave out a light blue data point that is far lower then the rest in that group, the ndt in the new model can be higher than the left out data point. When the new model tries to evaluate the probability of the left out point, there is an error, because the point lies outside the range of where the probability distribution is defined.
So, in this case I think, the ‘correct’ answer for loo would be -Inf, because the predictive density of that point is 0, and log(0) is -Inf. I am aware that this answer is not very helpful.

Does this explanation make sense?

As I said, I don’t have a good solution for this.

2 Likes

Thanks so much. That does make sense, even to a beginner like me. I really appreciate the time that you have taken to outline things so clearly. Shame there is no solution though!

Given the lack of solution with moment matching, would you default to using reloo (where possible) and kfold if reloo is not feasible?

And a slightly unrelated point to the initial post. If the number of influential data points is only a relatively small fraction of the dataset, can you report the fraction being low and leave them as they are? Or is that considered poor practice, even if transparent? I guess I’m thinking of a situation where there are only a small number of possible problematic datapoints, but kfold cross-validation takes 30 hours per model or longer, so it could get rather time consuming (depending on the number of models). Or would you recommend just getting on with kfold anyway, as there is no sensible alternative? I hope this makes sense. Sorry if it doesn’t.

Just FYI - ‘pID’ is participant ID. So each pID is a different participant in an experimental dataset.

Thanks again
Rich

It depends. If someone’s life depends on this it’s better to make everything as carefully as possible. If the difference between models is very big anyway and no-one’s life depends on this, then you may argue that there is only a small probability that the error in elpd_loo estimate would be big.

Thanks so much, that’s really helpful.

One follow-up question if I may (apologies in advance for the basic nature of it): if there are more than 10 potentially influential datapoints (even if that is a small fraction of the overall dataset), the recommendation is to use kfold (as I understand it). Is that because the default is 10 folds in kfold and therefore the model will be fit 10 times, which would be preferable to re-fitting the model 10+ times using reloo, once per potentially influential data point?

I’m just trying to understand the decision space for loo, reloo and kfold.

Thanks again.

Regards,
Rich

Computation time for running MCMC for one fold in kfold-CV is about the same as as running MCMC for one fold in leave-one-out-CV. So if computation time is expensive resource, pick the one with less MCMC runs.

Thank you. Very helpful, as always.
Rich

1 Like