Brms ordinal model diagnostics. DHARMa for Bayesians

Rtui · November 14, 2022, 4:49am

Hello,

I am a masters student and quite new to Bayesian statistics and would greatly appreciate your help!

I am fitting mixed effect cumulative probit model in brms. I would like to check the normality and heteroscedasticity of the models residuals and look for signs of spatial autocorrelation. I was hoping to use the DHARMa package for this but keep receiving the following error.

################
Error in if (minSim == maxSim) scaledResiduals[i] = minSim else scaledResiduals[i] = runif(1, :
missing value where TRUE/FALSE needed
#####################

I have provide a reproduceable below?

If not, do you have advice on examining residuals for cumulative probit model in brms? I am particularly concerns about spatial autocorrelation.

Please let me if you need any more information.

Many thanks!

note: the code I am using is below is modified from these two posts

Code for createDHARMa.
Code for model creation

library(dplyr)
library(forcats)
library(tidyr)
library(modelr)
library(tidybayes)
library(rstan)
library(brms)
library(posterior)


#generate dataset
set.seed(5)
n = 10
n_condition = 5

testdata <-
  tibble(
    response = rep(c("A","B","C","D","E"), n),
    predictor = rnorm(n * 5, c(0,1,2,1,-1,8,9,8), 3),
    randomEff = rnorm(n * 5, c(2,3), 1)) %>% 
  mutate(randomEff = round(randomEff)) %>% 
  mutate(randomEff = as.factor(randomEff)) %>% 
  mutate(response = factor(response, order = TRUE, levels = c("A","B","C","D","E")))

str(df)

#fit model
model <- brm(response ~ predictor + (1|randomEff),
            data = testdata,
            family=cumulative("logit"),
            file = paste0("mod1.Rmd"),
            control = list(adapt_delta=0.98), backend = "cmdstanr",
            chains = 4, cores = 4, warmup = 1000, iter = 2000)


# attempting DHARMa 
model.check <- createDHARMa(
  simulatedResponse = t(posterior_predict(model)),
  observedResponse = testdata$response,
  fittedPredictedResponse = apply(t(posterior_epred(model)), 1, mean),
  integerResponse = TRUE)

plot(model.check)

```{r }

torkar · November 14, 2022, 3:17pm

Rtui · November 14, 2022, 9:30pm

Thank you @torkar I really appreciate you quick reply!

I have attempted to read through the resources you recommended but am still stuck. Likely because I am new to Bayesian methods, I found the post quite hard to follow.

I have included a reproducible example below and updated my post.
Further advice would be a great help!!

library(dplyr)
library(forcats)
library(tidyr)
library(modelr)
library(tidybayes)
library(rstan)
library(brms)
library(posterior)


#generate dataset
set.seed(5)
n = 10
n_condition = 5

testdata <-
  tibble(
    response = rep(c("A","B","C","D","E"), n),
    predictor = rnorm(n * 5, c(0,1,2,1,-1,8,9,8), 3),
    randomEff = rnorm(n * 5, c(2,3), 1)) %>% 
  mutate(randomEff = round(randomEff)) %>% 
  mutate(randomEff = as.factor(randomEff)) %>% 
  mutate(response = factor(response, order = TRUE, levels = c("A","B","C","D","E")))

torkar · November 15, 2022, 6:17am

Here’s a std (logit) cumulative model with default priors. I would strongly recommend you to not use this but do prior predictive checks and so on:
m <- brm(response ~ 1 + predictor + (1|randomEff), family = cumulative, data = testdata, control = list(adapt_delta=0.95))

r$> summary(m)
 Family: cumulative 
  Links: mu = logit; disc = identity 
Formula: response ~ 1 + predictor + (1 | randomEff) 
   Data: testdata (Number of observations: 50) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Group-Level Effects: 
~randomEff (Number of levels: 5) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.48      0.46     0.02     1.75 1.00     1550     2111

Population-Level Effects: 
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1]    -1.61      0.50    -2.63    -0.63 1.00     1894     1600
Intercept[2]    -0.56      0.44    -1.42     0.31 1.00     2325     1767
Intercept[3]     0.31      0.44    -0.53     1.23 1.00     2246     1733
Intercept[4]     1.37      0.49     0.45     2.38 1.00     2503     1905
predictor       -0.05      0.05    -0.15     0.06 1.00     3892     2896

Family Specific Parameters: 
     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc     1.00      0.00     1.00     1.00   NA       NA       NA

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Rtui · November 15, 2022, 7:31am

Thank you very much @torkar.
I am glad to hear you would recommend using posterior checks over DHARMa.
For my original data I have done traceplots, pp_checks, examined the LOO Information Criterion, compared with a non-proportional odds model (i.e. acat family with category specific effects), and manually made some attempt at quantile plots for the residuals.
I am curious why you don’t recommend DHARMa? Is it just not necessary, or is there another reason?
No worries if you do not reply. Thank you for all your time and help!

torkar · November 15, 2022, 7:51am

Prior predictive checks are just as important! :)

Rtui · November 16, 2022, 11:47pm

Thank you @torkar ! I have not realized this.
I am doing these now.

For those who come across this conversation in the future, here are some useful papers on prios and prior checking. For a practical guide I highly recommend the second as this provides a link to all the code and data needed to run the analyses.

The Importance of Prior Sensitivity Analysis in Bayesian Statistics: Demonstrations Using an Interactive Shiny App

Choosing priors in Bayesian ecological models by simulating from the prior predictive distribution

Rtui · November 17, 2022, 2:38am

I am sorry to bother you again!
May I ask if there is a way to specify “flat” priors?
The default for cummulative models is student_t(3, 0, 2.5), class = "b". I am not shore this is appropriate as my study is on a disease which we had little idea of the incidence of prior to the study. We expected anywhere between 1% and ~80% of plants to be infected. So i think a uniform (or “negative”?) prior would be more suitable.

Any advice greatly appreciated.

I tried to solution in this post but it did not work.

torkar · November 17, 2022, 7:47am

Provide a small example, please. It makes it easier to understand. Use simulated data.

Is the 1%-80% a value you expect the \beta to have, i.e., it is constrained between [0,1]? Or are you actually talking about modeling the outcome as [0,1]?

Topic		Replies	Views
Brms: how can I compare differences of variance of marginal_effects() for categories (i.e. category probabilities) brms techniques	4	536	September 1, 2020
Question about family = cumulative in brms Modeling ordinal-response , priors , brms	5	1686	April 4, 2023
R package "brms" model specification Modeling rstan , specification , brms	2	243	May 1, 2024
Convergence issues with multilevel cumulative probit regression model Modeling ordinal-response , brms	3	65	October 5, 2024
Ordinal Outcome and Ordinal Predictor in brms brms	6	1505	February 22, 2020

Brms ordinal model diagnostics. DHARMa for Bayesians

Related topics