Getting inscrutable error when running add_criterion() on brms ordinal regression model

I am conducting a multivariate ordinal regression examining the effect of number of days of amphetamine-type-stimulant use in the 28 days prior to starting treatment (variable ats_baseline) on three outcomes: psychological health, physical health, and quality of life, all measured on a 0-10 integer scale (variables psych, phys, qol) during the first years of treatment (variable yearsFromStart, a continuous variable between 0 and 1). Data are grouped by encounter (variable encID) which defines a block of treatment. For our purposes this can be considered equivalent to a participant ID number. The minimum number of measurements per encID is two. All encounters have the three outcomes measured at start of treatment (i.e. yearsFromStart= 0) and at least one other time. When these non start of treatment measurements occur is irregular. Here are the data

amphUseInOTP_mvr_ppq.RData (23.8 KB)

Here’s code to run the model but I strongly advise against running it: it takes hours.

# nicer name for dataset
mvrppq <- amphUseInOTP_mvr_ppq


# need to transform outcome vars to positive integers
mvrppq %<>%
  mutate(across(.cols = psych:qol,
                .fns = ~as.integer(.x+1)))

bform_ord_mvr_ppq <- bf(mvbind(psych, phys, qol) ~ ats_baseline*yearsFromStart + (1|p|encID))

# ordinal model
fit_ordinal_mvr_ppq <- brm(bform_ord_mvr_ppq,
                           family = cumulative("probit"),
                           data = mvrppq,
                           save_pars = save_pars(all=TRUE),
                           seed = 1234,
                           refresh = 0,
                           chains = 4,
                           warmup = 1e3,
                           iter = 3e3)

And this is the output

Family: MV(cumulative, cumulative, cumulative) 
  Links: mu = probit
         mu = probit
         mu = probit 
Formula: psych ~ ats_baseline * yearsFromStart + (1 | p | encID) 
         phys ~ ats_baseline * yearsFromStart + (1 | p | encID) 
         qol ~ ats_baseline * yearsFromStart + (1 | p | encID) 
   Data: mvrppq (Number of observations: 2056) 
  Draws: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
         total post-warmup draws = 8000

Multilevel Hyperparameters:
~encID (Number of levels: 670) 
                                    Estimate Est.Error l-95% CI u-95% CI Rhat
sd(psych_Intercept)                     0.82      0.04     0.75     0.90 1.00
sd(phys_Intercept)                      0.76      0.04     0.69     0.84 1.00
sd(qol_Intercept)                       0.86      0.04     0.78     0.94 1.00
cor(psych_Intercept,phys_Intercept)     0.93      0.03     0.87     0.98 1.01
cor(psych_Intercept,qol_Intercept)      0.99      0.01     0.97     1.00 1.00
cor(phys_Intercept,qol_Intercept)       0.94      0.03     0.89     0.98 1.00
                                    Bulk_ESS Tail_ESS
sd(psych_Intercept)                     3347     5567
sd(phys_Intercept)                      2686     4782
sd(qol_Intercept)                       3428     5103
cor(psych_Intercept,phys_Intercept)      936     1704
cor(psych_Intercept,qol_Intercept)      1945     3076
cor(phys_Intercept,qol_Intercept)       2416     3308

Regression Coefficients:
                                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
psych_Intercept[1]                   -3.28      0.15    -3.59    -3.00 1.00     7232
psych_Intercept[2]                   -2.74      0.10    -2.94    -2.54 1.00     7601
psych_Intercept[3]                   -2.10      0.07    -2.25    -1.96 1.00     4931
psych_Intercept[4]                   -1.59      0.06    -1.72    -1.47 1.00     4705
psych_Intercept[5]                   -1.15      0.06    -1.27    -1.04 1.00     4589
psych_Intercept[6]                   -0.42      0.05    -0.53    -0.32 1.00     4266
psych_Intercept[7]                    0.12      0.05     0.01     0.22 1.00     4133
psych_Intercept[8]                    0.83      0.06     0.73     0.94 1.00     4311
psych_Intercept[9]                    1.87      0.07     1.73     2.00 1.00     5365
psych_Intercept[10]                   2.55      0.08     2.39     2.71 1.00     6595
phys_Intercept[1]                    -3.09      0.13    -3.35    -2.84 1.00     6969
phys_Intercept[2]                    -2.81      0.11    -3.02    -2.60 1.00     7420
phys_Intercept[3]                    -2.28      0.08    -2.44    -2.13 1.00     6472
phys_Intercept[4]                    -1.77      0.07    -1.90    -1.64 1.00     5815
phys_Intercept[5]                    -1.29      0.06    -1.40    -1.18 1.00     5668
phys_Intercept[6]                    -0.58      0.05    -0.68    -0.48 1.00     5593
phys_Intercept[7]                     0.01      0.05    -0.09     0.11 1.00     5490
phys_Intercept[8]                     0.74      0.05     0.64     0.85 1.00     5609
phys_Intercept[9]                     1.71      0.06     1.59     1.83 1.00     6197
phys_Intercept[10]                    2.39      0.08     2.24     2.55 1.00     7233
qol_Intercept[1]                     -3.32      0.15    -3.62    -3.04 1.00     7671
qol_Intercept[2]                     -2.71      0.10    -2.91    -2.52 1.00     7930
qol_Intercept[3]                     -2.17      0.08    -2.33    -2.03 1.00     6522
qol_Intercept[4]                     -1.71      0.07    -1.84    -1.58 1.00     5718
qol_Intercept[5]                     -1.32      0.06    -1.44    -1.20 1.00     5297
qol_Intercept[6]                     -0.57      0.05    -0.68    -0.47 1.00     4439
qol_Intercept[7]                      0.01      0.05    -0.10     0.11 1.00     4289
qol_Intercept[8]                      0.80      0.06     0.69     0.91 1.00     4563
qol_Intercept[9]                      1.71      0.06     1.59     1.83 1.00     4735
qol_Intercept[10]                     2.31      0.07     2.16     2.45 1.00     5786
psych_ats_baseline                   -0.04      0.01    -0.06    -0.02 1.00     3409
psych_yearsFromStart                  0.47      0.09     0.30     0.64 1.00    10453
psych_ats_baseline:yearsFromStart     0.05      0.02     0.02     0.08 1.00    10556
phys_ats_baseline                    -0.03      0.01    -0.04    -0.01 1.00     3769
phys_yearsFromStart                   0.31      0.09     0.14     0.48 1.00    12132
phys_ats_baseline:yearsFromStart      0.03      0.02    -0.00     0.06 1.00     9863
qol_ats_baseline                     -0.04      0.01    -0.06    -0.02 1.00     2974
qol_yearsFromStart                    0.52      0.08     0.35     0.68 1.00    12127
qol_ats_baseline:yearsFromStart       0.05      0.02     0.02     0.09 1.00    10553
                                  Tail_ESS
psych_Intercept[1]                    5264
psych_Intercept[2]                    5713
psych_Intercept[3]                    5343
psych_Intercept[4]                    6288
psych_Intercept[5]                    6033
psych_Intercept[6]                    6122
psych_Intercept[7]                    6134
psych_Intercept[8]                    6418
psych_Intercept[9]                    5814
psych_Intercept[10]                   6264
phys_Intercept[1]                     5747
phys_Intercept[2]                     6073
phys_Intercept[3]                     6461
phys_Intercept[4]                     6048
phys_Intercept[5]                     6566
phys_Intercept[6]                     6269
phys_Intercept[7]                     6065
phys_Intercept[8]                     6416
phys_Intercept[9]                     6873
phys_Intercept[10]                    5979
qol_Intercept[1]                      5957
qol_Intercept[2]                      6325
qol_Intercept[3]                      6788
qol_Intercept[4]                      6631
qol_Intercept[5]                      6383
qol_Intercept[6]                      6100
qol_Intercept[7]                      6167
qol_Intercept[8]                      5953
qol_Intercept[9]                      6369
qol_Intercept[10]                     6001
psych_ats_baseline                    4634
psych_yearsFromStart                  6243
psych_ats_baseline:yearsFromStart     6259
phys_ats_baseline                     5436
phys_yearsFromStart                   6305
phys_ats_baseline:yearsFromStart      6080
qol_ats_baseline                      5078
qol_yearsFromStart                    6547
qol_ats_baseline:yearsFromStart       5849

Further Distributional Parameters:
           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc_psych     1.00      0.00     1.00     1.00   NA       NA       NA
disc_phys      1.00      0.00     1.00     1.00   NA       NA       NA
disc_qol       1.00      0.00     1.00     1.00   NA       NA       NA

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

So the model appears to converge ok. But when I run this through add_criterion()

fit_ordinal_mvr_ppq <- add_criterion(fit_ordinal_mvr_ppq,
                                     criterion="loo",
                                     save_psis = TRUE,
                                     moment_match = T,
                                     reloo = T,
                                     recompile = TRUE)

It runs for a long long time then fails, with this error

rsession-arm64(54828) MallocStackLogging: can't turn off malloc stack logging because it was not enabled.
Recompiling the model with 'rstan'
Recompilation done
Error : Exception: lub_free: Correlation variable is -1.00007, but must be in the interval [-1, 1] (in 'anon_model', line 125, column 2 to column 32)
Error in `loo_moment_match.brmsfit()`:
! Moment matching failed. Perhaps you did not set 'save_pars = save_pars(all = TRUE)' when fitting your model? If you are running moment matching on another machine than the one used to fit the model, you may need to set recompile = TRUE.

I have no idea what the error message means. Can anyone give me some advice?

The add_criterion() is run in the same session on the same machine, and, as you can see, I definitely set save_pars = save_pars(all=TRUE)

Just in case, I tried out running this with setting backend = ‘cmdstanr’ in the brm() call and it ended in the same error. (Only difference is that mine said lub_free: Correlation variable is -1.00005instead of -1.00007.)

I suspect it could be something with loo with moment matching… but I honestly don’t know. Vanilla loo works, but warns about 144 obs with pareto_k > 0.7. When I ran loo without moment matching, but with reloo = T, it was running seemingly ok, just takes a long time because the model must be refit 144 times. That could be an option for you if you have the patience :^)

So, like this:

fit_ordinal_mvr_ppq_reloo <- add_criterion(fit_ordinal_mvr_ppq,
                                            criterion="loo",
                                            save_psis = TRUE,
                                            reloo = T,
                                            recompile = TRUE)

I’m no expert, but my understanding is that the purpose of moment-matching is to avoid refitting the model (link). So having both moment_match and reloo set to TRUEmight be at odds.

Thank you for the suggestion @zacho. I will run it and report back.

If you can provide the data, I can try to find where in the moment match code the error happens

Hi Aki. Data is provided in the original post, immediately after the first paragraph

Hi @zacho. I tried the code you suggested but on fitting model 16 out of 148 got the error

Error in chol.default(Sigma) : 
  the leading minor of order 3 is not positive

@llewmills, I didn’t get any errors, so it seems there is some missing information about your setup.

I had

library(dplyr)
library(brms)
load("amphUseInOTP_mvr_ppq.RData")

That is, I’m using rstan backend for sampling. One possibility is that you were using cmdstanr backend and old cmdstanr version for sampling. Old cmdstan saved the posterior draws to a csv file with 6 significant digits, but Stan checks the constrained variables with 8 significant digits causing constraint mismatches. Using rstan for sampling stores the posterior draws in memory and thus using the full double floating point accuracy. Using the latest cmdstan 2.37 uses more significant digits when saving to csv, which should avoid constraint check errors. Note that even if cmdstanr would be used for sampling, the moment matching is using rstan (in theory cmdstanr could be used for moment matching, but changing that in brms is a big work due to how the posterior draws are currently stored).

In this case, switching to the latest cmdstanr or using rstan also for sampling would not help, With my laptop and using one core moment matching took hours. It could be made faster using more cores. However, only two LOO folds did get Pareto-k reduced below 0.7, and there are still 155 bad Pareto-k’s. The model is just too flexible with that varying intercept term. There are 56 encID’s with only one observation and 253 encID’s with only two observations. Removing the only observation or half of the observations corresponding to a varying intercept changes the posterior a lot.

How to go forward:

  • It’s easiest to use K-fold-CV.
  • You could integrate out the varying intercepts using 1D quadrature, and use integrated LOO, but as some encID’s have more than observation, the code needed is a bit more complicated than what I currently have in Roaches cross-validation case study
1 Like

Hi Aki. you’re right about the single observations encIDs. Apologies for that. I made the dataset I posted by taking including any encID that had more than one observation for ANY of the seven substance outcomes we have discussed in the past - ats use, heroin use, alcohol use, cannabis use, psych health, physical health and quality of life. What I didn’t do was filter that for the last three variables when I made the dataset I included in this post. Because I ran the ordinal models on individual outcomes before brms just excluded the missing outcome data automatically. I should have checked the mutlivariate data more carefully, my apologies. Apologies also @zacho . I will remove missing observations and try with a new dataset of all encIDs with more than one observation and rstan on the backend. Thanks for your help again.

do you think changing the priors on the intercepts might result in fewer bad pareto-k?

It’s also worth mentioning that there is no reason this needs to be a multivariate model. I’ve already run regressions on these three outcomes individually. This is just an experiment.

@avehtari I have run the mutlivariate ordinal model on a new dataset with complete data for all three outcomes. Here is the output.

Family: MV(cumulative, cumulative, cumulative) 
  Links: mu = probit; disc = identity
         mu = probit; disc = identity
         mu = probit; disc = identity 
Formula: psych ~ ats_baseline * yearsFromStart + (1 | p | encID) 
         phys ~ ats_baseline * yearsFromStart + (1 | p | encID) 
         qol ~ ats_baseline * yearsFromStart + (1 | p | encID) 
   Data: mvrppq (Number of observations: 1906) 
  Draws: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
         total post-warmup draws = 8000

Multilevel Hyperparameters:
~encID (Number of levels: 598) 
                                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(psych_Intercept)                     0.79      0.04     0.71     0.87 1.00     2347     4500
sd(phys_Intercept)                      0.75      0.04     0.67     0.83 1.00     2029     4569
sd(qol_Intercept)                       0.81      0.04     0.74     0.90 1.00     2263     4095
cor(psych_Intercept,phys_Intercept)     0.94      0.03     0.88     0.99 1.01      779     1134
cor(psych_Intercept,qol_Intercept)      0.99      0.01     0.97     1.00 1.00     1170     2467
cor(phys_Intercept,qol_Intercept)       0.94      0.03     0.88     0.99 1.00     1455     1735

Regression Coefficients:
                                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
psych_Intercept[1]                   -3.27      0.16    -3.60    -2.98 1.00     4202     4536
psych_Intercept[2]                   -2.74      0.10    -2.95    -2.54 1.00     5067     5574
psych_Intercept[3]                   -2.10      0.08    -2.25    -1.95 1.00     4964     6094
psych_Intercept[4]                   -1.59      0.06    -1.72    -1.47 1.00     4178     5743
psych_Intercept[5]                   -1.16      0.06    -1.28    -1.04 1.00     3486     4909
psych_Intercept[6]                   -0.45      0.06    -0.56    -0.34 1.00     3089     4883
psych_Intercept[7]                    0.08      0.05    -0.02     0.19 1.00     2983     4911
psych_Intercept[8]                    0.80      0.06     0.69     0.91 1.00     3107     5239
psych_Intercept[9]                    1.84      0.07     1.71     1.97 1.00     3642     4678
psych_Intercept[10]                   2.50      0.08     2.34     2.67 1.00     4512     6008
phys_Intercept[1]                    -3.03      0.13    -3.28    -2.78 1.00     4007     5101
phys_Intercept[2]                    -2.76      0.11    -2.98    -2.56 1.00     4273     5471
phys_Intercept[3]                    -2.24      0.08    -2.41    -2.09 1.00     4422     5881
phys_Intercept[4]                    -1.74      0.07    -1.88    -1.61 1.00     4090     5686
phys_Intercept[5]                    -1.29      0.06    -1.41    -1.17 1.00     3668     5244
phys_Intercept[6]                    -0.58      0.05    -0.69    -0.47 1.00     3460     5208
phys_Intercept[7]                     0.00      0.05    -0.10     0.11 1.00     3327     5201
phys_Intercept[8]                     0.74      0.06     0.63     0.85 1.00     3445     5119
phys_Intercept[9]                     1.71      0.07     1.59     1.85 1.00     3586     5231
phys_Intercept[10]                    2.40      0.08     2.24     2.56 1.00     4776     5035
qol_Intercept[1]                     -3.23      0.14    -3.52    -2.96 1.00     4741     5095
qol_Intercept[2]                     -2.68      0.10    -2.88    -2.48 1.00     5385     5533
qol_Intercept[3]                     -2.18      0.08    -2.33    -2.03 1.00     4521     5776
qol_Intercept[4]                     -1.70      0.07    -1.83    -1.57 1.00     3953     5807
qol_Intercept[5]                     -1.34      0.06    -1.47    -1.22 1.00     3640     5125
qol_Intercept[6]                     -0.61      0.06    -0.72    -0.50 1.00     3151     4959
qol_Intercept[7]                     -0.04      0.06    -0.15     0.07 1.00     3128     5033
qol_Intercept[8]                      0.75      0.06     0.64     0.87 1.00     3349     4586
qol_Intercept[9]                      1.66      0.07     1.53     1.78 1.00     3453     4969
qol_Intercept[10]                     2.26      0.08     2.11     2.40 1.00     4024     5747
psych_ats_baseline                   -0.04      0.01    -0.06    -0.02 1.00     2403     4048
psych_yearsFromStart                  0.46      0.09     0.29     0.63 1.00     7200     5962
psych_ats_baseline:yearsFromStart     0.05      0.02     0.02     0.09 1.00     6114     5912
phys_ats_baseline                    -0.02      0.01    -0.04    -0.01 1.00     2502     3923
phys_yearsFromStart                   0.33      0.09     0.16     0.50 1.00     8512     5998
phys_ats_baseline:yearsFromStart      0.03      0.02    -0.01     0.06 1.00     6611     6202
qol_ats_baseline                     -0.04      0.01    -0.06    -0.02 1.00     2478     3639
qol_yearsFromStart                    0.47      0.09     0.30     0.65 1.00     7583     5152
qol_ats_baseline:yearsFromStart       0.06      0.02     0.02     0.09 1.00     7683     6213

Further Distributional Parameters:
           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc_psych     1.00      0.00     1.00     1.00   NA       NA       NA
disc_phys      1.00      0.00     1.00     1.00   NA       NA       NA
disc_qol       1.00      0.00     1.00     1.00   NA       NA       NA

The pareto-k diagnostics are pretty poor

Computed from 8000 by 1906 log-weights matrix.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 1.7]).
Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)     1805  94.7%   79      
   (0.7, 1]   (bad)        93   4.9%   <NA>    
   (1, Inf)   (very bad)    8   0.4%   <NA>    
See help('pareto-k-diagnostic') for details.

And when I go to run the k-fold cv via

k_ordinal_mvr_ppq_ao_nm <- kfold(fit_ordinal_mvr_ppq_ao_nm,
                                 K = 10)

It throws the error

Fitting model 1 out of 10
Error in rbind(deparse.level, ...) : 
  numbers of columns of arguments do not match

Any advice?

Would setting different priors improve model fit and get those pareto-k values looking better?

This is expected, as earlier you had more high Pareto-k’s than single observation persons, and thus removing those can’t solve the issue completely. With this many, it is also likely that moment matching is not able to help.

kfold error seems like an error in brms, and I suggest creating an issue in brms github repo

1 Like