Difficulty comparing models with loo

jroon · August 6, 2020, 10:18am

I have two competing brms models. Both ran without divergences but with a Bulk-ESS warning. However I tried to compare them using loo I got the error:

Warning messages:
1: Found 148 observations with a pareto_k > 0.7 in model 'fit1'. It is recommended to set 'moment_match = TRUE' in order to perform moment matching for problematic observations.  
2: Found 276 observations with a pareto_k > 0.7 in model 'fit2'. It is recommended to set 'moment_match = TRUE' in order to perform moment matching for problematic observations.

Ok. I turned on moment matching: loo(fit1, fit2, moment_match = TRUE). However doing that I got the error:

Error: Moment matching failed. Perhaps you did not set 'save_all_pars' to TRUE when fitting your model?

Ok. I re-ran my models with save_all_pars = TRUE. This time I got some further warnings:

The largest R-hat is NA, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hatBulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-essTail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess

Ok. I’m not sure where the R-hat is NA comes from - whene I do summary(fit2) I dont’ see any R-hat’s = NA.

In any case, I tried loo again with moment-matching:

> loo(fit1, fit2, moment_match = TRUE, save_psis = TRUE)
Error in .update_pars(x, upars = upars, ...) : 
  length(new_samples) == nrow(pars) is not TRUE
Error: Moment matching failed. Perhaps you did not set 'save_all_pars' to TRUE when fitting your model?

Ok. … wait no, not ok!
save_all_pars = TRUE this time . I’m very confused at this point and feel like I’m running in time-consuming circles. How do I unpick this situation and compare my models?

paul.buerkner · August 6, 2020, 10:32am

You can safely ignore the Rhat is NA warnings.

With regard to the last error, it looks more like a bug somewhere. I need a minimal reproducible example to see what is going on.

jroon · August 6, 2020, 10:38am

Ok thanks good to know can ignore the Rhat warning.

Minimal reproducible example could be tricky - I definitely can’t share the data and the models probably too complex for (me) to sim data. Leave it with me I’ll try to think up a way to make a reprex later today!
Thanks

jroon · August 6, 2020, 2:14pm

Hi @paul.buerkner - it took a few tries to find the right combination of things but I believe that this will reproduce it:

library(brms)

data("BTdata", package = "MCMCglmm")

fit1 <- brm(
    mvbind(tarsus, back) ~ hatchdate + (hatchdate | fosternest),
    data = BTdata, chains = 2, cores = 2, save_all_pars = TRUE
)

fit2 <- brm(
    mvbind(tarsus, back) ~ hatchdate + (hatchdate | p | fosternest),
    data = BTdata, chains = 2, cores = 2, save_all_pars = TRUE
)


loo(fit1, fit2, moment_match = TRUE)


> loo(fit1, fit2, moment_match = TRUE)
Error in .update_pars(x, upars = upars, ...) : 
  length(new_samples) == nrow(pars) is not TRUE
In addition: Warning message:
Some Pareto k diagnostic values are slightly high. See help('pareto-k-diagnostic') for details.
 
Error: Moment matching failed. Perhaps you did not set 'save_all_pars' to TRUE when fitting your model?

Let me know if that doesn’t do it for you. FYI I’m using brms version 2.13.5

paul.buerkner · August 6, 2020, 3:05pm

It does work for me. Which version of rstan are you using?

jroon · August 6, 2020, 3:09pm

rstan 2.19.3
I’m also on R 3.6.2 still. I’ve a bunch of half-done projects and so I don’t want to update to R 4.0+ until some are more done. Is that a problem here?

paul.buerkner · August 6, 2020, 3:19pm

I don’t know. You can have R 3.6 and R 4 installed on the same machine without breaking anything, so that could be worth a try but I honestly don’t know what the problem is as I cannot reproduce the error.

avehtari · August 6, 2020, 3:28pm

jroon:

I have two competing brms models. Both ran without divergences but with a Bulk-ESS warning. However I tried to compare them using loo I got the error:
Warning messages:
1: Found 148 observations with a pareto_k > 0.7 in model 'fit1'. It is recommended to set 'moment_match = TRUE' in order to perform moment matching for problematic observations.  
2: Found 276 observations with a pareto_k > 0.7 in model 'fi

It’s likely that moment matching will not be able to help with that many problematic pareto_k values. Please post how you call brms (all br,s agruments such as formula, priors, etc) and tell how many observations you have (if hierarchical model then also how many group levels etc), so that I can give better informed suggestion what to do next.

jroon · August 6, 2020, 3:47pm

Ok I’ll try to find time to install it in coming days, although I’m backed up alot with deadlines. I have a paper under revision with about a month of cpu time behind it and I’m terrified to change anything until the reviews comeback in case I break anything 😅… I really really really don’t want to run it all again from scratch 😂😇

Sure no problem - there are 278 individuals(groups levels), and in total there are 745 observations. I’m interested in comparing different outcome measures and so I’m using multivariate models and the residual correlations are of primary interest. The model specifications are below - they are basically mixed effects models for each outcome (based on apriori knowledge). I tuned the priors based on prior predictive checks to be weakly informative.

fit1 <- brm(
    mvbind(out_fvc, out_svc, out_snip, out_peak) ~ days_from_baseline + (days_from_baseline | uin),
    data = df2, chains = 4, cores = 4,
    prior = c(set_prior("normal(0, 0.1)", class = "b", resp = c("outfvc", "outsvc","outsnip", "outpeak" )),
              set_prior("normal(3.2, 0.1)", class = "Intercept", resp = "outfvc" ),
              set_prior("normal(3.1, 0.2)", class = "Intercept", resp = "outsvc" ),
              set_prior("normal(65, 20)", class = "Intercept", resp = "outsnip" ),
              set_prior("student_t(3, 350, 100)", class = "Intercept", resp = "outpeak" )
              ),
    sample_prior = "no", save_all_pars = TRUE )

fit2 <- brm(
    mvbind(out_fvc, out_svc, out_snip, out_peak) ~ days_from_baseline + (days_from_baseline | p | uin),
    data = df2, chains = 4, cores = 4,
    prior = c(set_prior("normal(0, 0.1)", class = "b", resp = c("outfvc", "outsvc","outsnip", "outpeak" )),
              set_prior("normal(3.2, 0.1)", class = "Intercept", resp = "outfvc" ),
              set_prior("normal(3.1, 0.2)", class = "Intercept", resp = "outsvc" ),
              set_prior("normal(65, 20)", class = "Intercept", resp = "outsnip" ),
              set_prior("student_t(3, 350, 100)", class = "Intercept", resp = "outpeak" )
              ),
    sample_prior = "no", save_all_pars = TRUE )

Would increasing the cores used and /or iterations help with the pareto_k values (I don’t have a deep understanding here)

avehtari · August 6, 2020, 4:13pm

If some uin groups have just one or a few observations, then removing an observation can affect a lot the posterior of the corresponding uin intercept and coefficient parameters. On average there is 2.7 observations to influence the posterior of 2 local parameters (or since you are using mvbind are there 1+4 parameters?), so there has to be many groups where you have just 1 or 2 observations to influence the posterior of 2 local parameters. When you remove 1 out of 1 observation then the posterior of the local parameters is just the prior which is likely to much wider. When you remove 1 out of 2 observations then you are removing half of the data for those local parameters and it’s still likely that the posterior will change a lot. See related random effect model loo example at Roaches loo case study

Unlikely. See also questions 16 and 17 in CV-FAQ

jroon · August 6, 2020, 9:51pm

Thanks for those thoughts @avehtari - that makes alot of sense. Alas yes quite a few only have one observation (blame covid for stopping data collection early 😢). I will look through those links thank you.

jroon · August 7, 2020, 11:47am

Hi @paul.buerkner just to come back to you on this. I dug out my back-up computer to test this. It had been wiped, so under Mojave 10.14.6, I did a clean install of R 3.6.3, installed rstan and brms then ran my reprex above -> the problem was still there. I updated to R 4.0.2, reinstalled rstan and brms and ran the models again -> problem resolved, loo ran fine 👍 Now, while I’m at it install Anaconda and test out a few python-y things on my todo list without having to worry about this: https://xkcd.com/1987/

@avehtari reading the Roaches loo case study I have been trying out kfold CV with my models. I was wondering about two things:

When I run kfold, the p_kfold comes out as NA - is this a worry?
I see that I have the option with kfold to split the dataset by group - would you advise to use this for hierarchical models like mine above?

avehtari · August 7, 2020, 12:37pm

The case study gives also NA for p_kfold, because computing that requires also running the model with full data. I don’t remember why we report it like that, and maybe we should run the full data model, too, to show proper values. I should make an issue of this.

It depends what is your goal. I discuss this in Cross-validation for hierarchical models case study with link also to a video with more explanation.

jroon · August 7, 2020, 1:48pm

Very helpful thanks - the video too 👍

AndreasR · August 10, 2020, 3:04pm

I ran into the same problem just now and found this thread. I already learned something from @avehtari’s comments but just wanted to share my experience in case it really is a bug.

I am running brms 2.13.5, rstan 2.21.2, R version 4.0.2 on a Ubuntu 18.4 workstation.

I am trying to compare multiple model specifications using only a subset of my data, which might explain the high pareto_k values that I am experiencing. My full dataset has about 20k observations but needs some days to run which is why I am using a 1k sample for trying out model specifications

I adapted a simulated MRP example of Lauren Kennedy to produce this example that produces the error on my machine. I’m sorry that it’s a long code, but I hope it reproduces the error.

```
Error in .update_pars(x, upars = upars, ...) : 
  length(new_samples) == nrow(pars) is not TRUE
In addition: Warning message:
Some Pareto k diagnostic values are slightly high. See help('pareto-k-diagnostic') for details.
 
Error: Moment matching failed. Perhaps you did not set 'save_all_pars' to TRUE when fitting your model?
```

Okay, I just tried to reproduce the error on my Laptop running Windows 10, R 4.0.2, and brms 2.13.0. Would not reproduce. I updated to brms 2.13.5 and now I got the error again on my Laptop as well.

Edit: I might’ve updated other packages as well, not quite sure. I’m just reinstalling Brms 2.13.0 and will see if I can reproduce it again with that version.

[library(truncnorm)

#Define function that simulates mrp data ####
simulate_mrp_data <- function(n) {
  J <- c(2, 3, 5, 3, 400) # male or not, jobstatus, age (<18, 18-29, 30-49, 50-64, >64),migration background, 
  poststrat <- as.data.frame(array(NA, c(prod(J), length(J)+1))) # Columns of post-strat matrix, plus one for size
  colnames(poststrat) <- c("male", "job", "age","migback", "state",'N')
  count <- 0
  for (i1 in 1:J[1]){ # for i1 in 1:2 (i.e. for both genders)
    for (i2 in 1:J[2]){
      for (i3 in 1:J[3]){
        for (i4 in 1:J[4]){
          for (i5 in 1:J[5]){ # 1:J[5] is 1:50 because J[5] is 50
            count <- count + 1
            # Fill them in so we know what category we are referring to
            poststrat[count, 1:5] <- c(i1-1, i2, i3,i4,i5) 
          }
        }
      }
    }
  }
  # Proportion in each sample in the population (Values based on Zensus 2011 DE)
  p_male <- c(0.512, 0.488)
  p_job <- c(0.515, 0.025, 0.459)
  p_age <- c(0.164,0.142,0.285,0.204,0.206)
  p_migback <- c(.808,.116,.76)
  p_state_tmp <- runif(n = 400, min = 10,max = 20)
  p_state <- p_state_tmp/sum(p_state_tmp)
  poststrat$N <- 0
  for (j in 1:prod(J)){ # 82e6 is a random massive population size here, about Germany
    poststrat$N[j] <- round(82e5 * p_male[poststrat[j,1]+1] * # get the value for each j in poststrat and multiply by the proportions in the sample
                              p_job[poststrat[j,2]] * p_age[poststrat[j,3]] * 
                              p_migback[poststrat[j,4]] * p_state[poststrat[j,5]]) #Adjust the N to be the number observed in each category in each group
  }
  
  # Now let's adjust for the probability of response depending on characteristics
  p_response_baseline <- 0.01
  p_response_male <- c(2, 0.8) / 2.8
  p_response_job <- c(1, 1.2, 2.5) / 4.7
  p_response_age <- c(0.5, 0.4, 1, 1.5,  3) / 6.4
  p_response_migback <- c(1, 0.8, 0.6) / 2.4
  p_response_state <- rbeta(n = 400, shape1 =  1, shape2 =  1)
  p_response_state <- p_response_state / sum(p_response_state)
  p_response <- rep(NA, prod(J))
  for (j in 1:prod(J)) {
    p_response[j] <-
      p_response_baseline * p_response_male[poststrat[j, 1] + 1] *
      p_response_job[poststrat[j, 2]] * p_response_age[poststrat[j, 3]] *
      p_response_migback[poststrat[j, 4]] * p_response_state[poststrat[j, 5]]
  }
  people <- sample( prod(J), size = n, replace = TRUE, prob = poststrat$N * p_response) #sample random people depending on the proportions and response rate of that cell
  
  ## For respondent i, people[i] is that person's poststrat cell, (Which cell does that person in our sample stem from)
  ## some number between 1 and 32
  n_cell <- rep(NA, prod(J))
  for (j in 1:prod(J)) {
    n_cell[j] <- sum(people == j)
  }
  
  coef_male <- c(0, -1)
  coef_job <- c(1, -1, 0.5)
  coef_age <- c(2, 0 , -1, -0.5, 1)
  coef_migback <- c(0, -0.5, -1)
  coef_state <- c(0, round(rnorm(399, 0, 1.5), 1))
  
  true_popn <- data.frame(poststrat[, 1:5], lsat = rep(NA, prod(J))) # get the first columns from the cell matrix, define a new one with variable of interest NAs
  for (j in 1:prod(J)) {
    true_popn$lsat_influence[j] <- sum(  # Influence on life satisfaction depending on subgroup
      coef_male[poststrat[j, 1] + 1] +
        coef_job[poststrat[j, 2]] + coef_age[poststrat[j, 3]] +
        coef_migback[poststrat[j, 4]] + coef_state[poststrat[j, 5]]
    )
  }
  true_popn$lsat <- round(rtruncnorm(n = 1, a = 0, b = 10, sd = 2, mean = 6 + true_popn$lsat_influence),0) #generate a life satisfaction value for each observation between 0 and 10
  #male or not, jobstatus, age, migback, state
  y <- round(true_popn$lsat[people] , 0) #for our sample "people" we get the life satisfaction from cell that the person belongs to
  male <- poststrat[people, 1]
  job <- poststrat[people, 2]
  age <- poststrat[people, 3]
  migback <- poststrat[people, 4]
  state <- poststrat[people, 5]
  
  sample <- data.frame(lsat = y, 
                       male, age, job, migback, state, 
                       id = 1:length(people)) #create a dataframe from our people sample
  
  #Make all numeric:
  for (i in 1:ncol(poststrat)) {
    poststrat[, i] <- as.numeric(poststrat[, i])
  }
  for (i in 1:ncol(true_popn)) {
    true_popn[, i] <- as.numeric(true_popn[, i])
  }
  for (i in 1:ncol(sample)) {
    sample[, i] <- as.numeric(sample[, i])
  }
  list(
    sample = sample,
    poststrat = poststrat,
    true_popn = true_popn
  )
}

# Generate sample data using the function ####
set.seed(123)
mrp_sim <- simulate_mrp_data(n=1000)
sample <- mrp_sim[["sample"]]
sample$state <- factor(sample$state, levels=1:400)

prior4 <- c(prior(normal(6,1), class = Intercept),
            prior(normal(0,0.75), class = b),
            prior(normal(0,0.75), class = sd),
            prior(exponential(0.5), class = sigma),
            prior(lkj(1), class = cor))

fit <- brm(
  formula = lsat ~ male + age + job + migback + (1 + male + age + job + migback | state),
  data = sample, family = gaussian,
  control=list(adapt_delta=.8, max_treedepth=15),
  chains = 8, cores = 8, iter = 4000, warmup = 1000,
  prior = prior4, save_all_pars = TRUE)

fit <- add_criterion(fit, "waic")
fit <- add_criterion(fit, "loo", moment_match = TRUE)

kauttoj · September 28, 2020, 12:06pm

Any updates on this issue?
I’m having the exact same problem with versions “brms_2.13.5” and “rstan_2.21.2”. It happens on both R 4.0.2 (windows 10) and 3.6.3 (CentOS Linux 7). I did a fresh rstan/brms install on both systems. In my case setting “save_all_pars=TRUE” does not help either.

AndreasR: Did downgrading to 2.13.0 help in your case?

paul.buerkner · September 29, 2020, 6:02am

Can you post a minimal reproducible example that causes the error with R 4.0.x? I have still trouble reproducing this myself.

kauttoj · September 29, 2020, 7:55am

Here is code and data (both attached) that gives me the error. It might not be the “very minimal” example, but sufficiently minimal for you to start debugging, I hope.

Also my sessionInfo(), maybe useful:

R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=Finnish_Finland.1252  LC_CTYPE=Finnish_Finland.1252    LC_MONETARY=Finnish_Finland.1252
[4] LC_NUMERIC=C                     LC_TIME=Finnish_Finland.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] brms_2.13.5 Rcpp_1.0.5 

loaded via a namespace (and not attached):
 [1] Brobdingnag_1.2-6    jsonlite_1.7.0       gtools_3.8.2         StanHeaders_2.21.0-6 RcppParallel_5.0.2  
 [6] threejs_0.3.3        shiny_1.5.0          assertthat_0.2.1     stats4_4.0.2         backports_1.1.7     
[11] pillar_1.4.6         lattice_0.20-41      glue_1.4.1           digest_0.6.25        checkmate_2.0.0     
[16] promises_1.1.1       colorspace_1.4-1     htmltools_0.5.0      httpuv_1.5.4         Matrix_1.2-18       
[21] plyr_1.8.6           dygraphs_1.1.1.6     pkgconfig_2.0.3      rstan_2.21.2         purrr_0.3.4         
[26] xtable_1.8-4         mvtnorm_1.1-1        scales_1.1.1         processx_3.4.3       later_1.1.0.1       
[31] tibble_3.0.3         bayesplot_1.7.2      generics_0.0.2       ggplot2_3.3.2        ellipsis_0.3.1      
[36] DT_0.15              withr_2.2.0          shinyjs_1.1          cli_2.0.2            magrittr_1.5        
[41] crayon_1.3.4         mime_0.9             ps_1.3.4             fansi_0.4.1          nlme_3.1-148        
[46] xts_0.12-0           pkgbuild_1.1.0       colourpicker_1.0     rsconnect_0.8.16     tools_4.0.2         
[51] loo_2.3.1            prettyunits_1.1.1    lifecycle_0.2.0      matrixStats_0.56.0   stringr_1.4.0       
[56] V8_3.2.0             munsell_0.5.0        callr_3.4.3          compiler_4.0.2       rlang_0.4.7         
[61] grid_4.0.2           ggridges_0.5.2       rstudioapi_0.11      htmlwidgets_1.5.1    crosstalk_1.1.0.1   
[66] igraph_1.2.5         miniUI_0.1.1.1       base64enc_0.1-3      codetools_0.2-16     gtable_0.3.0        
[71] inline_0.3.15        abind_1.4-5          curl_4.3             markdown_1.1         reshape2_1.4.4      
[76] R6_2.4.1             gridExtra_2.3        rstantools_2.1.1     zoo_1.8-8            bridgesampling_1.0-0
[81] dplyr_1.0.2          fastmap_1.0.1        shinystan_2.5.0      shinythemes_1.1.2    stringi_1.4.6       
[86] parallel_4.0.2       vctrs_0.3.2          tidyselect_1.1.0     coda_0.19-3

minimal_example.R (1.3 KB) my_testdata.csv (3.2 KB)

torkar · September 29, 2020, 8:06am

@paul.buerkner when I run this with cmdstan backend I get (on the line where we add criteria LOO):

Error: Backend 'rstan' is required for this method.

And when I run with backend="rstan" I get:

Error: $ operator not defined for this S4 class

so something is fishy…

paul.buerkner · September 29, 2020, 8:10am

Thanks! So many different error messages :-D

Topic		Replies	Views
Error with loo_moment_match brms loo	15	2449	August 12, 2021
Error: "object 'storage' not found" when running loo with moment matching brms loo	7	1922	August 24, 2020
Add_criterion with moment_match=TRUE failing even when save_pars(all=TRUE) was set during model fit brms loo , ordinal-response , brms	24	2469	April 30, 2024
Brms::loo(moment_match = T) does not appear to work General loo , brms	13	2543	February 6, 2023
Comparing complex multivariate models with loo_compare brms loo	13	1863	September 30, 2020

Difficulty comparing models with loo

Related topics