Add_criterion with moment_match=TRUE failing even when save_pars(all=TRUE) was set during model fit

joels · April 4, 2023, 5:03pm

I fit an ordinal regression model with brms and ran loo using add_criterion. This returned 132 observations with pareto k values greater than 0.7 (out of 217,883 observations). I had fit the model with save_pars(all=TRUE), so I re-ran loo with moment matching, but got the following error: Error in if (quantities_i$ki < ki) { : missing value where TRUE/FALSE needed.

I don’t know what this means and haven’t found a discussion of this error. I’ve included my code and the warnings and errors below, along with session information. I can’t provide a reproducible example, as the data set is not shareable and too large in any case, but please let me know if there is additional information that would be helpful to have.

I don’t think multiple refits with reloo will be a realistic option, as this model took about 2.7 hours to fit, it is much simpler than the models I actually want to fit, and the data set I actually want to use is larger. Is there a way I can get moment matching to work?

library(brms)
library(cmdstanr)

# Model formula
form = course.grade ~ instruction.mode + incoming.gpa + 
 (1|instructor.id) + (1|student.id)

# Set priors
n.cat = 13
threshold.priors = qlogis(cumsum(rep(1/n.cat, n.cat)))[1:(n.cat-1)] %>% 
                            round(., 3)
threshold.priors = map_df(seq_along(threshold.priors), 
                          ~prior_string(paste0("normal(",threshold.priors[.x],", 1)"), 
                                        class="Intercept", coef=.x))

priors = c(prior(normal(0,0.5), class="b"),
           prior(normal(0,1), class="b", coef="incoming.gpa"),
           threshold.priors)

# Fit model
m = brm(data = mdat, file="models/learning-mode/two-terms/tt1",
        family = cumulative(link="logit", threshold="flexible"),
        formula = form, prior = priors,
        chains = 3, cores = 3, threads=threading(3),
        backend="cmdstanr", sample_prior=TRUE, save_pars=save_pars(all=TRUE))

m = add_criterion(m, criterion="loo", moment_match=FALSE)

Automatically saving the model object in ‘models/learning-mode/two-terms/tt1.rds’
Warning messages:
1: The global prior ‘student_t(3, 0, 2.5)’ of class ‘Intercept’ will not be used in the model as all related coefficients have individual priors already. If you did not set those priors yourself, then maybe brms has assigned default priors. See ?set_prior and ?get_prior for more details.
2: Found 132 observations with a pareto_k > 0.7 in model ‘model’. It is recommended to set ‘moment_match = TRUE’ in order to perform moment matching for problematic observations.

loo(m)

Computed from 3000 by 217883 log-likelihood matrix

          Estimate    SE
elpd_loo -373018.3 497.2
p_loo      27621.2  72.0
looic     746036.6 994.3
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count  Pct.    Min. n_eff
(-Inf, 0.5]   (good)     215836 99.1%   186       
 (0.5, 0.7]   (ok)         1915  0.9%   68        
   (0.7, 1]   (bad)         130  0.1%   88        
   (1, Inf)   (very bad)      2  0.0%   15        
See help('pareto-k-diagnostic') for details.

# Given the warnings above, redo loo with moment_match=TRUE
m = add_criterion(m, criterion="loo", moment_match=TRUE, overwrite=TRUE)

Recompiling the model with ‘rstan’
Recompilation done
Error in if (quantities_i$ki < ki) { :
missing value where TRUE/FALSE needed
In addition: Warning message:
The global prior ‘student_t(3, 0, 2.5)’ of class ‘Intercept’ will not be used in the model as all related coefficients have individual priors already. If you did not set those priors yourself, then maybe brms has assigned default priors. See ?set_prior and ?get_prior for more details.
Error: Moment matching failed. Perhaps you did not set ‘save_pars = save_pars(all = TRUE)’ when fitting your model? If you are running moment matching on another machine than the one used to fit the model, you may need to set recompile = TRUE.

Session Info

Macbook Pro M1 Max, Ventura 13.2.1
R 4.2.2
loo 2.6.0

m$version
$brms
[1] ‘2.19.0’

$rstan
[1] ‘2.26.13’

$stanHeaders
[1] ‘2.26.13’

$cmdstanr
[1] ‘0.5.2’

$cmdstan
[1] ‘2.31.0’

joels · May 13, 2023, 7:19pm

Just bumping this to see if anyone has a suggestion on how to resolve my question. I’m finding that loo with moment matching fails with ordinal models that have random effects, even though I’ve set save_pars(all=TRUE). I also filed an issue with a reproducible example on the loo github repo, but haven’t gotten any responses.

matt-hyde-s · June 20, 2023, 10:54pm

Hi joels,

I was wondering if you resolved this issue. I’m also using brms for an ordinal regression and when I try to set a random intercept based on respondent, moment_match = TRUE fails and I can’t get it to work.

joels · June 21, 2023, 2:27am

I haven’t resolved this issue. @avehtari do you know if this is a bug in loo, or if there’s a workaround for this error? I filed an issue at the loo repo that includes a reproducible example.

avehtari · June 21, 2023, 3:04pm

I don’t know, but pinging @jonah if he would have time to check this

jonah · June 21, 2023, 9:44pm

This is coming from https://github.com/stan-dev/loo/blob/2fec3c006e52dca504f27e7870d121706272a567/R/loo_moment_matching.R#L292, which @topipa is most familiar with. I just tagged him over at the GitHub issue.

jonah · July 3, 2023, 5:02pm

This may be fixed now on the master branch of the loo repo after @topipa’s PR #224

matt-hyde-s · July 5, 2023, 2:25am

Thanks Jonah. I reinstalled loo from github and I tried to moment match two ways, as add_criterion and by setting the moment_match in the loo() function. Both of them abort the current R session.

Here’s an example code:

bay.jag1.5 ← brm(
formula = jag_pop ~ 1 + (1|Name) + past_harm_jags + habitat_protection + Protecting_hunting + RNSC,
data = bayes_data,
family = cumulative(“probit”),
chains = 4,
iter = 10000,
prior(normal(0, 5),
class = Intercept),
init = “0”,
)
summary(bay.jag1.5)
m5loo ← loo(bay.jag1.5, moment_match = TRUE)

I’ve also tried:
add_criterion(bay.jag1.5, criterion = “loo”, moment_match = TRUE)

and got the same result.

joels · July 5, 2023, 5:22pm

Matt, I had a similar problem and posted it at the github issue I opened. After reinstalling R, stanHeaders 2.26 and rstan 2.26 (as described here), based on @topipa’s suggestion, I was able to get loo with moment matching to complete without error. However, the loo results were exactly the same with and without moment matching and took 393 times as long to run (2.73 hours vs. 25 seconds on my Mac M2 Max). I’m not sure if this is expected, so I posted a follow-up query on github.

Can you try reinstalling rstan-related packages and let us know (1) if you’re able to get moment matching to work, and (2) if so, whether including moment matching changes your loo results?

matt-hyde-s · July 5, 2023, 9:03pm

Hi Joel,

I uninstalled and reinstalled all rstan related packages according to the instructions and reinstalled. R still aborts when running with the moment match. This only happens when running the moment_match on two models, and succeeds on three others. I suspect that the two models that don’t run are too complex for my small dataset. I think I’m going to use k-fold-CV like in this tutorial: Roaches cross-validation demo

mtenan · March 13, 2024, 3:37pm

I’ll add that I’m also having this issue, in the case of a multivariate, multilevel Beta regression model. I’ve replicated the issue with all updated (CRAN) packages on both Windows and Mac. To make things even more peculiar, it sometimes errors and sometimes not, even with consecutive clean R sessions on the same computer.

I understand, in looking at the Github issues for ‘loo’ that sometimes there are problems troubleshooting this problem because Devs can’t replicate them, which seems unsurprising since 10 minutes apart on clean sessions I can’t always replicate the error! Is there some sort of stochastic element that would throw the error even when “add_criterion(…, criterion=“loo”, moment_match=TRUE, ‘save_pars = save_pars(all = TRUE))” is used that would cause this inconsistent behavior?

avehtari · March 13, 2024, 4:58pm

Can you try installing loo from github with

remotes::install_github("stan-dev/loo")

and report if that fixes your problem?

Last week I made a PR that has been merged, which might fix your problem

The problem I fixed arises if the log posterior density for any of the transformed draws happens to be -Inf. The behavior then depends on the posterior draws, and rerunning with different seed may work without problems.

mtenan · March 14, 2024, 3:36pm

@avehtari I can work on a reproducible example, but I still get an error in some cases. Let me know if this is related to your fix or something else, please. I can make a reprex as necessary. I’ve confirmed this errors on both my Mac and PC with the Dev ‘loo’ update.

sart_mean_bf_dist <- bf(sartmean1 ~ Environment+Workload + (1|part_id),
                        phi ~ Environment+Workload + (1|part_id)) +   Beta()
sart_eo_bf_dist <- bf(sart_eox ~ Environment+Workload + (1|part_id),
                      phi ~ Environment+Workload + (1|part_id)) +  Beta()
sart_ec_bf_dist <- bf(sart_ecx ~ Environment+Workload + (1|part_id),
                      phi ~ Environment+Workload + (1|part_id)) +  Beta()

sart_fit1_dist <- brm(
  sart_mean_bf_dist + sart_eo_bf_dist + sart_ec_bf_dist + set_rescor(FALSE),
  data = virtra_dat, chains = 4, cores = 4, iter = 10000, warmup = 8000,
  prior = c(prior(normal(0, 3), class=b, resp= sartmean1), prior(gamma(1.5, .7), class=b, resp= sartmean1, dpar= phi, lb=0), 
            prior(normal(0, .5), class=sd, resp= sartmean1), prior(normal(0, .5), class=sd, resp= sartmean1, dpar= phi),
            prior(normal(0, 3), class=b, resp= sarteox), prior(gamma(1.5, .7), class=b, resp= sarteox, dpar= phi, lb=0),
            prior(normal(0, .5), class=sd, resp= sarteox), prior(normal(0, .5), class=sd, resp= sarteox, dpar=phi),
            prior(normal(0, 3), class=b, resp= sartecx), prior(gamma(1.5, .7), class=b, resp= sartecx, dpar= phi, lb=0),
            prior(normal(0, .5), class=sd, resp= sartecx), prior(normal(0, .5), class=sd, resp= sartecx, dpar=phi)),
            
  control = list(adapt_delta = 0.99), save_pars=save_pars(all=TRUE), seed = 711, 
)

sart_fit1_dist <- add_criterion(sart_fit1_dist, "loo", moment_match = TRUE, reloo=TRUE)

It then returns the following error:

Error : Exception: beta_lpdf: Second shape parameter is 0, but must be positive finite! (in 'string', line 226, column 6 to column 111)
Error: Moment matching failed. Perhaps you did not set 'save_pars = save_pars(all = TRUE)' when fitting your model? If you are running moment matching on another machine than the one used to fit the model, you may need to set recompile = TRUE.

What makes it peculiar to me is that the initial model does not error and more complex models with the same data do not error nor do their moment matching. Am I missing something or is it a software bug?

avehtari · March 14, 2024, 6:11pm

It’s not a bug due to wrong code, but it is due to the limitation of the floating point presentation of numbers. When you sample with Stan, it catches these exceptions in log density computation and rejects those proposals. When the moment matching transforms the posterior draws and asks Stan to recompute log density, Stan can return -Inf, which has now special handling, or Stan can cause exception which is not yet handled. These were not handled before because we didn’t see them in our own experiments. Obviously the popularity of moment matching has increased, and then the probability that someone will hit these edge cases increases. More complex model can be better specified and have better behavior. In case of a simpler model, leaving out observation may have a bigger change in the posterior, and moment matching then transform the draws more. The transformation is made in unconstrained space, but now it seems that when the moment matching transformed draw is constrained there is underflow.

I can fix this faster, if I get a complete reproducible example. I hope I can catch that exception, but this will definitely be more challenging to fix than the -Inf case.

mtenan · March 15, 2024, 6:43pm

Here is the reprex. Admittedly, I can only get it to cause the error on my Mac and not the Windows one, which has me completely baffled. (both machines still error with the original code)
Are you able to replicate the error, @avehtari?


library(brms)


#make the data
out1 <- c(0.4259, 0.42855, 0.4193, 0.3835, 0.348, 0.5182, 0.52025, 0.44115, 
          0.4603, 0.44805, 0.3567857, 0.3799, 0.50925, 0.4768, 0.56815, 0.3948421, 
          0.42665, 0.37795, 0.36645, 0.34565, 0.7618947, 0.8046667, 0.42105, 0.40785, 
          0.41405, 0.43255, 0.3467895, 0.4078, 0.3496429, 0.4345263, 0.3606, 0.4466389, 
          0.3419714, 0.3883143, 0.3832778, 0.4014444, 0.3393714, 0.4705, 0.3766, 
          0.3727143, 0.403, 0.3646774, 0.3610571, 0.5300833, 0.4632778, 0.4330556, 
          0.3387568, 0.3698824, 0.3623429, 0.3744412, 0.3373889, 0.4087632, 0.5207692, 
          0.5020857, 0.4230278, 0.3997714, 0.4826571, 0.3942222, 0.3812432, 0.3345143, 
          0.3699737, 0.3783143, 0.615833, 0.582929, 0.59, 0.559, 0.6004, 0.56274, 0.59394, 
          0.59537, 0.5802, 0.599667, 0.55908, 0.57475, 0.5909, 0.58331, 0.584053, 0.553714, 
          0.577391, 0.55479, 0.5277, 0.540563, 0.544611, 0.599187, 0.577923, 0.583842, 
          0.563524, 0.574, 0.542056, 0.54887, 0.583167, 0.595, 0.588, 0.54663, 0.4306, 0.547, 
          0.56587, 0.5517, 0.55697, 0.51808, 0.57569, 0.52975, 0.56571, 0.52133, 0.55158, 
          0.59483, 0.62878, 0.543733, 0.480909, 0.61368, 0.57294, 0.577353, 0.497968, 0.562611, 
          0.626611, 0.591759, 0.592515, 0.538882, 0.5612, 0.553219, 0.57397, 0.539781, 0.562767, 0.556472)

out2 <- c(0.0001, 0.0001, 0.0001, 0.0001, 0.15, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.3, 0.0001, 
          0.0001, 0.0001, 0.0001, 0.05, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 
          0.0001, 0.0001, 0.05, 0.0001, 0.3, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 
          0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.1143, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.02857, 
          0.0001, 0.02857, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 
          0.0001, 0.0001, 0.4444, 0.12, 0.45, 0.1364, 0.25, 0.2083, 0.1, 0.2, 0.08696, 0.3684, 0.2353, 
          0.16, 0.1304, 0.09091, 0.1364, 0.1667, 0.0001, 0.09524, 0.08696, 0.05882, 0.05263, 0.0001, 0.2273, 
          0.0001, 0.08696, 0.2222, 0.0001, 0.08, 0.1176, 0.3125, 0.08333, 0.1892, 0.02703, 0.02857, 0.1351, 
          0.2222, 0.1053, 0.1429, 0.1944, 0.1351, 0.1316, 0.1429, 0.2778, 0.2121, 0.0001, 0.05405, 0.1282, 
          0.05882, 0.05556, 0.07895, 0.1111, 0.05128, 0.0001, 0.2105, 0.07895, 0.05556, 0.06061, 0.05714, 
          0.1316, 0.0001, 0.1429, 0.02632)

out3 <- c(0.5, 0.05, 0.2, 0.1, 0.1, 0.2, 0.05, 0.0001, 0.05, 0.0001, 0.15, 0.2, 0.0001, 0.0001, 0.0001, 
          0.35, 0.15, 0.1, 0.0001, 0.05, 0.1, 0.05, 0.05, 0.05, 0.05, 0.0001, 0.25, 0.15, 0.25, 0.0001, 
          0.1, 0.0001, 0.4, 0.8, 0.5, 0.5, 0.8, 0.0001, 0.0001, 0.4, 0.2, 0.8, 0.6, 0.0001, 0.0001, 0.5, 
          0.9999, 0.2, 0.4, 0.8, 0.25, 0.4, 0.25, 0.0001, 0.25, 0.4, 0.0001, 0.9999, 0.6667, 0.2, 0.0001, 
          0.4, 0.0001, 0.0001, 0.05, 0.1111, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.09524, 0.0001, 
          0.0001, 0.05882, 0.0001, 0.0001, 0.04545, 0.0001, 0.0001, 0.05882, 0.04348, 0.0001, 0.0001, 
          0.05556, 0.0001, 0.0001, 0.0001, 0.0001, 0.1333, 0.0001, 0.0001, 0.1875, 0.0001, 0.0001, 0.0001, 
          0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.5, 0.0001, 0.0001, 0.0001, 0.0001, 0.9999, 
          0.0001, 0.0001, 0.0001, 0.0001, 0.25, 0.0001, 0.0001, 0.0001, 0.0001, 0.25, 0.1429, 0.0001, 
          0.0001, 0.0001, 0.0001, 0.0001)

id <- as.character(c("4f9", "06b", "6b8", "10", "47f", "148", "230", "430", "699", "aa4", "c03", "d4c", "d83", "fc8", 
        "2df", "6ea", "12a", "40a", "aa6", "e25", "51b", "3b6", "263", "aac", "a87", "67d", "39e", "a40", 
        "f0f", "3ac", "ece", "4f9", "06b", "6b8", "10", "47f", "148", "230", "430", "699", "aa4", "c03", 
        "d4c", "d83", "fc8", "2df", "6ea", "12a", "40a", "aa6", "e25", "51b", "3b6", "263", "aac", "a87", 
        "67d", "39e", "a40", "f0f", "3ac", "ece", "4f9", "06b", "6b8", "10", "47f", "148", "230", "430", 
        "699", "aa4", "c03", "d4c", "d83", "fc8", "2df", "6ea", "12a", "40a", "aa6", "e25", "51b", "3b6", 
        "263", "aac", "a87", "67d", "39e", "a40", "f0f", "3ac", "ece", "4f9", "06b", "6b8", "10", "47f", 
        "148", "230", "430", "699", "aa4", "c03", "d4c", "d83", "fc8", "2df", "6ea", "12a", "40a", "aa6", 
        "e25", "51b", "3b6", "263", "aac", "a87", "67d", "39e", "a40", "f0f", "3ac", "ece"))

environment <- as.factor(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
                           1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
                           1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
                           2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
                           2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2))

load <- as.factor(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
                    1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
                    2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
                    1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
                    2, 2, 2, 2))

dat <- data.frame(out1, out2, out3, id, environment, load)


f1 <- bf(out1 ~ environment+load + (1|id),
                        phi ~ environment+load + (1|id)) +   Beta()
f2 <- bf(out2 ~ environment+load + (1|id),
                      phi ~ environment+load + (1|id)) +  Beta()
f3<- bf(out3 ~ environment+load + (1|id),
                      phi ~ environment+load + (1|id)) +  Beta()

fit_dist <- brm(
  f1 + f2 + f3 + set_rescor(FALSE),
  data = dat, chains = 4, cores = 4, iter = 10000, warmup = 8000,
  prior = c(prior(normal(0, 3), class=b, resp= out1), prior(gamma(1.5, .7), class=b, resp= out1, dpar= phi, lb=0), 
            prior(normal(0, .5), class=sd, resp= out1), prior(normal(0, .5), class=sd, resp= out1, dpar= phi),
            prior(normal(0, 3), class=b, resp= out2), prior(gamma(1.5, .7), class=b, resp= out2, dpar= phi, lb=0),
            prior(normal(0, .5), class=sd, resp= out2), prior(normal(0, .5), class=sd, resp= out2, dpar=phi),
            prior(normal(0, 3), class=b, resp= out3), prior(gamma(1.5, .7), class=b, resp= out3, dpar= phi, lb=0),
            prior(normal(0, .5), class=sd, resp= out3), prior(normal(0, .5), class=sd, resp= out3, dpar=phi)),
  control = list(adapt_delta = 0.99), save_pars=save_pars(all=TRUE), seed = 711, 
)

#hopefully make the error
fit_dist <- add_criterion(fit_dist, "loo", moment_match = TRUE)

avehtari · March 18, 2024, 8:14pm

Thanks for the example. The error replicates. I’ll investigate.

avehtari · March 19, 2024, 7:12pm

I’ve made a PR catch Stan log_prob exceptions inside moment matching by avehtari · Pull Request #262 · stan-dev/loo · GitHub for loo package which will catch the error. I guess this will be soon available in github master branch.

As in your example, some Pareto k’s are very big (larger than 1), moment matching is not able to get all Pareto k’s below 0.7, but at least the moment matching is finishing for all other observations.

avehtari · March 21, 2024, 7:45am

The PR has been merged to master. Can you try with github version of loo?

remotes::install_github("stan-dev/loo")

mtenan · April 22, 2024, 4:07pm

Yes, sorry. Just saw your response. I’ll check on it and report back.

mtenan · April 26, 2024, 6:01pm

It is still returning the same error as before on my end ("Error : Exception: beta_lpdf: Second shape parameter is 0, but…) with a moment matching failure.

Should I be expecting something different with the update?

Topic		Replies	Views
Error with add_criterion(..., "loo") brms	12	2076	March 15, 2024
Error with loo_moment_match brms loo	15	2347	August 12, 2021
Brms::loo(moment_match = T) does not appear to work General loo , brms	13	2364	February 6, 2023
Difficulty comparing models with loo brms fitting-issues , loo	24	4949	October 30, 2020
Error: "object 'storage' not found" when running loo with moment matching brms loo	7	1859	August 24, 2020

Add_criterion with moment_match=TRUE failing even when save_pars(all=TRUE) was set during model fit

Session Info

Related topics