Add_criterion with moment_match=TRUE failing even when save_pars(all=TRUE) was set during model fit

I fit an ordinal regression model with brms and ran loo using add_criterion. This returned 132 observations with pareto k values greater than 0.7 (out of 217,883 observations). I had fit the model with save_pars(all=TRUE), so I re-ran loo with moment matching, but got the following error: Error in if (quantities_i$ki < ki) { : missing value where TRUE/FALSE needed.

I don’t know what this means and haven’t found a discussion of this error. I’ve included my code and the warnings and errors below, along with session information. I can’t provide a reproducible example, as the data set is not shareable and too large in any case, but please let me know if there is additional information that would be helpful to have.

I don’t think multiple refits with reloo will be a realistic option, as this model took about 2.7 hours to fit, it is much simpler than the models I actually want to fit, and the data set I actually want to use is larger. Is there a way I can get moment matching to work?

library(brms)
library(cmdstanr)

# Model formula
form = course.grade ~ instruction.mode + incoming.gpa + 
 (1|instructor.id) + (1|student.id)

# Set priors
n.cat = 13
threshold.priors = qlogis(cumsum(rep(1/n.cat, n.cat)))[1:(n.cat-1)] %>% 
                            round(., 3)
threshold.priors = map_df(seq_along(threshold.priors), 
                          ~prior_string(paste0("normal(",threshold.priors[.x],", 1)"), 
                                        class="Intercept", coef=.x))

priors = c(prior(normal(0,0.5), class="b"),
           prior(normal(0,1), class="b", coef="incoming.gpa"),
           threshold.priors)
# Fit model
m = brm(data = mdat, file="models/learning-mode/two-terms/tt1",
        family = cumulative(link="logit", threshold="flexible"),
        formula = form, prior = priors,
        chains = 3, cores = 3, threads=threading(3),
        backend="cmdstanr", sample_prior=TRUE, save_pars=save_pars(all=TRUE))

m = add_criterion(m, criterion="loo", moment_match=FALSE)

Automatically saving the model object in ‘models/learning-mode/two-terms/tt1.rds’
Warning messages:
1: The global prior ‘student_t(3, 0, 2.5)’ of class ‘Intercept’ will not be used in the model as all related coefficients have individual priors already. If you did not set those priors yourself, then maybe brms has assigned default priors. See ?set_prior and ?get_prior for more details.
2: Found 132 observations with a pareto_k > 0.7 in model ‘model’. It is recommended to set ‘moment_match = TRUE’ in order to perform moment matching for problematic observations.

loo(m)
Computed from 3000 by 217883 log-likelihood matrix

          Estimate    SE
elpd_loo -373018.3 497.2
p_loo      27621.2  72.0
looic     746036.6 994.3
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count  Pct.    Min. n_eff
(-Inf, 0.5]   (good)     215836 99.1%   186       
 (0.5, 0.7]   (ok)         1915  0.9%   68        
   (0.7, 1]   (bad)         130  0.1%   88        
   (1, Inf)   (very bad)      2  0.0%   15        
See help('pareto-k-diagnostic') for details.
# Given the warnings above, redo loo with moment_match=TRUE
m = add_criterion(m, criterion="loo", moment_match=TRUE, overwrite=TRUE)

Recompiling the model with ‘rstan’
Recompilation done
Error in if (quantities_i$ki < ki) { :
missing value where TRUE/FALSE needed
In addition: Warning message:
The global prior ‘student_t(3, 0, 2.5)’ of class ‘Intercept’ will not be used in the model as all related coefficients have individual priors already. If you did not set those priors yourself, then maybe brms has assigned default priors. See ?set_prior and ?get_prior for more details.
Error: Moment matching failed. Perhaps you did not set ‘save_pars = save_pars(all = TRUE)’ when fitting your model? If you are running moment matching on another machine than the one used to fit the model, you may need to set recompile = TRUE.

Session Info

Macbook Pro M1 Max, Ventura 13.2.1
R 4.2.2
loo 2.6.0

m$version
$brms
[1] ‘2.19.0’

$rstan
[1] ‘2.26.13’

$stanHeaders
[1] ‘2.26.13’

$cmdstanr
[1] ‘0.5.2’

$cmdstan
[1] ‘2.31.0’
1 Like

Just bumping this to see if anyone has a suggestion on how to resolve my question. I’m finding that loo with moment matching fails with ordinal models that have random effects, even though I’ve set save_pars(all=TRUE). I also filed an issue with a reproducible example on the loo github repo, but haven’t gotten any responses.

Hi joels,

I was wondering if you resolved this issue. I’m also using brms for an ordinal regression and when I try to set a random intercept based on respondent, moment_match = TRUE fails and I can’t get it to work.

I haven’t resolved this issue. @avehtari do you know if this is a bug in loo, or if there’s a workaround for this error? I filed an issue at the loo repo that includes a reproducible example.

I don’t know, but pinging @jonah if he would have time to check this

1 Like

This is coming from https://github.com/stan-dev/loo/blob/2fec3c006e52dca504f27e7870d121706272a567/R/loo_moment_matching.R#L292, which @topipa is most familiar with. I just tagged him over at the GitHub issue.

3 Likes

This may be fixed now on the master branch of the loo repo after @topipa’s PR #224

Thanks Jonah. I reinstalled loo from github and I tried to moment match two ways, as add_criterion and by setting the moment_match in the loo() function. Both of them abort the current R session.

Here’s an example code:

bay.jag1.5 ← brm(
formula = jag_pop ~ 1 + (1|Name) + past_harm_jags + habitat_protection + Protecting_hunting + RNSC,
data = bayes_data,
family = cumulative(“probit”),
chains = 4,
iter = 10000,
prior(normal(0, 5),
class = Intercept),
init = “0”,
)
summary(bay.jag1.5)
m5loo ← loo(bay.jag1.5, moment_match = TRUE)

I’ve also tried:
add_criterion(bay.jag1.5, criterion = “loo”, moment_match = TRUE)

and got the same result.

1 Like

Matt, I had a similar problem and posted it at the github issue I opened. After reinstalling R, stanHeaders 2.26 and rstan 2.26 (as described here), based on @topipa’s suggestion, I was able to get loo with moment matching to complete without error. However, the loo results were exactly the same with and without moment matching and took 393 times as long to run (2.73 hours vs. 25 seconds on my Mac M2 Max). I’m not sure if this is expected, so I posted a follow-up query on github.

Can you try reinstalling rstan-related packages and let us know (1) if you’re able to get moment matching to work, and (2) if so, whether including moment matching changes your loo results?

1 Like

Hi Joel,

I uninstalled and reinstalled all rstan related packages according to the instructions and reinstalled. R still aborts when running with the moment match. This only happens when running the moment_match on two models, and succeeds on three others. I suspect that the two models that don’t run are too complex for my small dataset. I think I’m going to use k-fold-CV like in this tutorial: Roaches cross-validation demo

1 Like

I’ll add that I’m also having this issue, in the case of a multivariate, multilevel Beta regression model. I’ve replicated the issue with all updated (CRAN) packages on both Windows and Mac. To make things even more peculiar, it sometimes errors and sometimes not, even with consecutive clean R sessions on the same computer.

I understand, in looking at the Github issues for ‘loo’ that sometimes there are problems troubleshooting this problem because Devs can’t replicate them, which seems unsurprising since 10 minutes apart on clean sessions I can’t always replicate the error! Is there some sort of stochastic element that would throw the error even when “add_criterion(…, criterion=“loo”, moment_match=TRUE, ‘save_pars = save_pars(all = TRUE))” is used that would cause this inconsistent behavior?

Can you try installing loo from github with

remotes::install_github("stan-dev/loo")

and report if that fixes your problem?

Last week I made a PR that has been merged, which might fix your problem

The problem I fixed arises if the log posterior density for any of the transformed draws happens to be -Inf. The behavior then depends on the posterior draws, and rerunning with different seed may work without problems.

@avehtari I can work on a reproducible example, but I still get an error in some cases. Let me know if this is related to your fix or something else, please. I can make a reprex as necessary. I’ve confirmed this errors on both my Mac and PC with the Dev ‘loo’ update.

sart_mean_bf_dist <- bf(sartmean1 ~ Environment+Workload + (1|part_id),
                        phi ~ Environment+Workload + (1|part_id)) +   Beta()
sart_eo_bf_dist <- bf(sart_eox ~ Environment+Workload + (1|part_id),
                      phi ~ Environment+Workload + (1|part_id)) +  Beta()
sart_ec_bf_dist <- bf(sart_ecx ~ Environment+Workload + (1|part_id),
                      phi ~ Environment+Workload + (1|part_id)) +  Beta()

sart_fit1_dist <- brm(
  sart_mean_bf_dist + sart_eo_bf_dist + sart_ec_bf_dist + set_rescor(FALSE),
  data = virtra_dat, chains = 4, cores = 4, iter = 10000, warmup = 8000,
  prior = c(prior(normal(0, 3), class=b, resp= sartmean1), prior(gamma(1.5, .7), class=b, resp= sartmean1, dpar= phi, lb=0), 
            prior(normal(0, .5), class=sd, resp= sartmean1), prior(normal(0, .5), class=sd, resp= sartmean1, dpar= phi),
            prior(normal(0, 3), class=b, resp= sarteox), prior(gamma(1.5, .7), class=b, resp= sarteox, dpar= phi, lb=0),
            prior(normal(0, .5), class=sd, resp= sarteox), prior(normal(0, .5), class=sd, resp= sarteox, dpar=phi),
            prior(normal(0, 3), class=b, resp= sartecx), prior(gamma(1.5, .7), class=b, resp= sartecx, dpar= phi, lb=0),
            prior(normal(0, .5), class=sd, resp= sartecx), prior(normal(0, .5), class=sd, resp= sartecx, dpar=phi)),
            
  control = list(adapt_delta = 0.99), save_pars=save_pars(all=TRUE), seed = 711, 
)

sart_fit1_dist <- add_criterion(sart_fit1_dist, "loo", moment_match = TRUE, reloo=TRUE) 

It then returns the following error:

Error : Exception: beta_lpdf: Second shape parameter is 0, but must be positive finite! (in 'string', line 226, column 6 to column 111)
Error: Moment matching failed. Perhaps you did not set 'save_pars = save_pars(all = TRUE)' when fitting your model? If you are running moment matching on another machine than the one used to fit the model, you may need to set recompile = TRUE.

What makes it peculiar to me is that the initial model does not error and more complex models with the same data do not error nor do their moment matching. Am I missing something or is it a software bug?

1 Like

It’s not a bug due to wrong code, but it is due to the limitation of the floating point presentation of numbers. When you sample with Stan, it catches these exceptions in log density computation and rejects those proposals. When the moment matching transforms the posterior draws and asks Stan to recompute log density, Stan can return -Inf, which has now special handling, or Stan can cause exception which is not yet handled. These were not handled before because we didn’t see them in our own experiments. Obviously the popularity of moment matching has increased, and then the probability that someone will hit these edge cases increases. More complex model can be better specified and have better behavior. In case of a simpler model, leaving out observation may have a bigger change in the posterior, and moment matching then transform the draws more. The transformation is made in unconstrained space, but now it seems that when the moment matching transformed draw is constrained there is underflow.

I can fix this faster, if I get a complete reproducible example. I hope I can catch that exception, but this will definitely be more challenging to fix than the -Inf case.

2 Likes

Here is the reprex. Admittedly, I can only get it to cause the error on my Mac and not the Windows one, which has me completely baffled. (both machines still error with the original code)
Are you able to replicate the error, @avehtari?


library(brms)


#make the data
out1 <- c(0.4259, 0.42855, 0.4193, 0.3835, 0.348, 0.5182, 0.52025, 0.44115, 
          0.4603, 0.44805, 0.3567857, 0.3799, 0.50925, 0.4768, 0.56815, 0.3948421, 
          0.42665, 0.37795, 0.36645, 0.34565, 0.7618947, 0.8046667, 0.42105, 0.40785, 
          0.41405, 0.43255, 0.3467895, 0.4078, 0.3496429, 0.4345263, 0.3606, 0.4466389, 
          0.3419714, 0.3883143, 0.3832778, 0.4014444, 0.3393714, 0.4705, 0.3766, 
          0.3727143, 0.403, 0.3646774, 0.3610571, 0.5300833, 0.4632778, 0.4330556, 
          0.3387568, 0.3698824, 0.3623429, 0.3744412, 0.3373889, 0.4087632, 0.5207692, 
          0.5020857, 0.4230278, 0.3997714, 0.4826571, 0.3942222, 0.3812432, 0.3345143, 
          0.3699737, 0.3783143, 0.615833, 0.582929, 0.59, 0.559, 0.6004, 0.56274, 0.59394, 
          0.59537, 0.5802, 0.599667, 0.55908, 0.57475, 0.5909, 0.58331, 0.584053, 0.553714, 
          0.577391, 0.55479, 0.5277, 0.540563, 0.544611, 0.599187, 0.577923, 0.583842, 
          0.563524, 0.574, 0.542056, 0.54887, 0.583167, 0.595, 0.588, 0.54663, 0.4306, 0.547, 
          0.56587, 0.5517, 0.55697, 0.51808, 0.57569, 0.52975, 0.56571, 0.52133, 0.55158, 
          0.59483, 0.62878, 0.543733, 0.480909, 0.61368, 0.57294, 0.577353, 0.497968, 0.562611, 
          0.626611, 0.591759, 0.592515, 0.538882, 0.5612, 0.553219, 0.57397, 0.539781, 0.562767, 0.556472)

out2 <- c(0.0001, 0.0001, 0.0001, 0.0001, 0.15, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.3, 0.0001, 
          0.0001, 0.0001, 0.0001, 0.05, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 
          0.0001, 0.0001, 0.05, 0.0001, 0.3, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 
          0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.1143, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.02857, 
          0.0001, 0.02857, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 
          0.0001, 0.0001, 0.4444, 0.12, 0.45, 0.1364, 0.25, 0.2083, 0.1, 0.2, 0.08696, 0.3684, 0.2353, 
          0.16, 0.1304, 0.09091, 0.1364, 0.1667, 0.0001, 0.09524, 0.08696, 0.05882, 0.05263, 0.0001, 0.2273, 
          0.0001, 0.08696, 0.2222, 0.0001, 0.08, 0.1176, 0.3125, 0.08333, 0.1892, 0.02703, 0.02857, 0.1351, 
          0.2222, 0.1053, 0.1429, 0.1944, 0.1351, 0.1316, 0.1429, 0.2778, 0.2121, 0.0001, 0.05405, 0.1282, 
          0.05882, 0.05556, 0.07895, 0.1111, 0.05128, 0.0001, 0.2105, 0.07895, 0.05556, 0.06061, 0.05714, 
          0.1316, 0.0001, 0.1429, 0.02632)

out3 <- c(0.5, 0.05, 0.2, 0.1, 0.1, 0.2, 0.05, 0.0001, 0.05, 0.0001, 0.15, 0.2, 0.0001, 0.0001, 0.0001, 
          0.35, 0.15, 0.1, 0.0001, 0.05, 0.1, 0.05, 0.05, 0.05, 0.05, 0.0001, 0.25, 0.15, 0.25, 0.0001, 
          0.1, 0.0001, 0.4, 0.8, 0.5, 0.5, 0.8, 0.0001, 0.0001, 0.4, 0.2, 0.8, 0.6, 0.0001, 0.0001, 0.5, 
          0.9999, 0.2, 0.4, 0.8, 0.25, 0.4, 0.25, 0.0001, 0.25, 0.4, 0.0001, 0.9999, 0.6667, 0.2, 0.0001, 
          0.4, 0.0001, 0.0001, 0.05, 0.1111, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.09524, 0.0001, 
          0.0001, 0.05882, 0.0001, 0.0001, 0.04545, 0.0001, 0.0001, 0.05882, 0.04348, 0.0001, 0.0001, 
          0.05556, 0.0001, 0.0001, 0.0001, 0.0001, 0.1333, 0.0001, 0.0001, 0.1875, 0.0001, 0.0001, 0.0001, 
          0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.5, 0.0001, 0.0001, 0.0001, 0.0001, 0.9999, 
          0.0001, 0.0001, 0.0001, 0.0001, 0.25, 0.0001, 0.0001, 0.0001, 0.0001, 0.25, 0.1429, 0.0001, 
          0.0001, 0.0001, 0.0001, 0.0001)

id <- as.character(c("4f9", "06b", "6b8", "10", "47f", "148", "230", "430", "699", "aa4", "c03", "d4c", "d83", "fc8", 
        "2df", "6ea", "12a", "40a", "aa6", "e25", "51b", "3b6", "263", "aac", "a87", "67d", "39e", "a40", 
        "f0f", "3ac", "ece", "4f9", "06b", "6b8", "10", "47f", "148", "230", "430", "699", "aa4", "c03", 
        "d4c", "d83", "fc8", "2df", "6ea", "12a", "40a", "aa6", "e25", "51b", "3b6", "263", "aac", "a87", 
        "67d", "39e", "a40", "f0f", "3ac", "ece", "4f9", "06b", "6b8", "10", "47f", "148", "230", "430", 
        "699", "aa4", "c03", "d4c", "d83", "fc8", "2df", "6ea", "12a", "40a", "aa6", "e25", "51b", "3b6", 
        "263", "aac", "a87", "67d", "39e", "a40", "f0f", "3ac", "ece", "4f9", "06b", "6b8", "10", "47f", 
        "148", "230", "430", "699", "aa4", "c03", "d4c", "d83", "fc8", "2df", "6ea", "12a", "40a", "aa6", 
        "e25", "51b", "3b6", "263", "aac", "a87", "67d", "39e", "a40", "f0f", "3ac", "ece"))

environment <- as.factor(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
                           1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
                           1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
                           2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
                           2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2))

load <- as.factor(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
                    1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
                    2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
                    1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
                    2, 2, 2, 2))

dat <- data.frame(out1, out2, out3, id, environment, load)


f1 <- bf(out1 ~ environment+load + (1|id),
                        phi ~ environment+load + (1|id)) +   Beta()
f2 <- bf(out2 ~ environment+load + (1|id),
                      phi ~ environment+load + (1|id)) +  Beta()
f3<- bf(out3 ~ environment+load + (1|id),
                      phi ~ environment+load + (1|id)) +  Beta()

fit_dist <- brm(
  f1 + f2 + f3 + set_rescor(FALSE),
  data = dat, chains = 4, cores = 4, iter = 10000, warmup = 8000,
  prior = c(prior(normal(0, 3), class=b, resp= out1), prior(gamma(1.5, .7), class=b, resp= out1, dpar= phi, lb=0), 
            prior(normal(0, .5), class=sd, resp= out1), prior(normal(0, .5), class=sd, resp= out1, dpar= phi),
            prior(normal(0, 3), class=b, resp= out2), prior(gamma(1.5, .7), class=b, resp= out2, dpar= phi, lb=0),
            prior(normal(0, .5), class=sd, resp= out2), prior(normal(0, .5), class=sd, resp= out2, dpar=phi),
            prior(normal(0, 3), class=b, resp= out3), prior(gamma(1.5, .7), class=b, resp= out3, dpar= phi, lb=0),
            prior(normal(0, .5), class=sd, resp= out3), prior(normal(0, .5), class=sd, resp= out3, dpar=phi)),
  control = list(adapt_delta = 0.99), save_pars=save_pars(all=TRUE), seed = 711, 
)

#hopefully make the error
fit_dist <- add_criterion(fit_dist, "loo", moment_match = TRUE) 

Thanks for the example. The error replicates. I’ll investigate.

I’ve made a PR catch Stan log_prob exceptions inside moment matching by avehtari · Pull Request #262 · stan-dev/loo · GitHub for loo package which will catch the error. I guess this will be soon available in github master branch.

As in your example, some Pareto k’s are very big (larger than 1), moment matching is not able to get all Pareto k’s below 0.7, but at least the moment matching is finishing for all other observations.

The PR has been merged to master. Can you try with github version of loo?

remotes::install_github("stan-dev/loo")
1 Like

Yes, sorry. Just saw your response. I’ll check on it and report back.

It is still returning the same error as before on my end ("Error : Exception: beta_lpdf: Second shape parameter is 0, but…) with a moment matching failure.

Should I be expecting something different with the update?