Brms ordinal model diagnostics. DHARMa for Bayesians

Hello,

I am a masters student and quite new to Bayesian statistics and would greatly appreciate your help!

I am fitting mixed effect cumulative probit model in brms. I would like to check the normality and heteroscedasticity of the models residuals and look for signs of spatial autocorrelation. I was hoping to use the DHARMa package for this but keep receiving the following error.

################
Error in if (minSim == maxSim) scaledResiduals[i] = minSim else scaledResiduals[i] = runif(1, :
missing value where TRUE/FALSE needed
#####################

I have provide a reproduceable below?

If not, do you have advice on examining residuals for cumulative probit model in brms? I am particularly concerns about spatial autocorrelation.

Please let me if you need any more information.

Many thanks!

note: the code I am using is below is modified from these two posts

  1. Code for createDHARMa.
  2. Code for model creation
library(dplyr)
library(forcats)
library(tidyr)
library(modelr)
library(tidybayes)
library(rstan)
library(brms)
library(posterior)


#generate dataset
set.seed(5)
n = 10
n_condition = 5

testdata <-
  tibble(
    response = rep(c("A","B","C","D","E"), n),
    predictor = rnorm(n * 5, c(0,1,2,1,-1,8,9,8), 3),
    randomEff = rnorm(n * 5, c(2,3), 1)) %>% 
  mutate(randomEff = round(randomEff)) %>% 
  mutate(randomEff = as.factor(randomEff)) %>% 
  mutate(response = factor(response, order = TRUE, levels = c("A","B","C","D","E")))

str(df)

#fit model
model <- brm(response ~ predictor + (1|randomEff),
            data = testdata,
            family=cumulative("logit"),
            file = paste0("mod1.Rmd"),
            control = list(adapt_delta=0.98), backend = "cmdstanr",
            chains = 4, cores = 4, warmup = 1000, iter = 2000)


# attempting DHARMa 
model.check <- createDHARMa(
  simulatedResponse = t(posterior_predict(model)),
  observedResponse = testdata$response,
  fittedPredictedResponse = apply(t(posterior_epred(model)), 1, mean),
  integerResponse = TRUE)

plot(model.check)

```{r }
1 Like

Thank you @torkar I really appreciate you quick reply!

I have attempted to read through the resources you recommended but am still stuck. Likely because I am new to Bayesian methods, I found the post quite hard to follow.

I have included a reproducible example below and updated my post.
Further advice would be a great help!!

library(dplyr)
library(forcats)
library(tidyr)
library(modelr)
library(tidybayes)
library(rstan)
library(brms)
library(posterior)


#generate dataset
set.seed(5)
n = 10
n_condition = 5

testdata <-
  tibble(
    response = rep(c("A","B","C","D","E"), n),
    predictor = rnorm(n * 5, c(0,1,2,1,-1,8,9,8), 3),
    randomEff = rnorm(n * 5, c(2,3), 1)) %>% 
  mutate(randomEff = round(randomEff)) %>% 
  mutate(randomEff = as.factor(randomEff)) %>% 
  mutate(response = factor(response, order = TRUE, levels = c("A","B","C","D","E")))

Here’s a std (logit) cumulative model with default priors. I would strongly recommend you to not use this but do prior predictive checks and so on:
m <- brm(response ~ 1 + predictor + (1|randomEff), family = cumulative, data = testdata, control = list(adapt_delta=0.95))

r$> summary(m)
 Family: cumulative 
  Links: mu = logit; disc = identity 
Formula: response ~ 1 + predictor + (1 | randomEff) 
   Data: testdata (Number of observations: 50) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Group-Level Effects: 
~randomEff (Number of levels: 5) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.48      0.46     0.02     1.75 1.00     1550     2111

Population-Level Effects: 
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1]    -1.61      0.50    -2.63    -0.63 1.00     1894     1600
Intercept[2]    -0.56      0.44    -1.42     0.31 1.00     2325     1767
Intercept[3]     0.31      0.44    -0.53     1.23 1.00     2246     1733
Intercept[4]     1.37      0.49     0.45     2.38 1.00     2503     1905
predictor       -0.05      0.05    -0.15     0.06 1.00     3892     2896

Family Specific Parameters: 
     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc     1.00      0.00     1.00     1.00   NA       NA       NA

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
1 Like

Thank you very much @torkar.
I am glad to hear you would recommend using posterior checks over DHARMa.
For my original data I have done traceplots, pp_checks, examined the LOO Information Criterion, compared with a non-proportional odds model (i.e. acat family with category specific effects), and manually made some attempt at quantile plots for the residuals.
I am curious why you don’t recommend DHARMa? Is it just not necessary, or is there another reason?
No worries if you do not reply. Thank you for all your time and help!

Prior predictive checks are just as important! :)

Thank you @torkar ! I have not realized this.
I am doing these now.

For those who come across this conversation in the future, here are some useful papers on prios and prior checking. For a practical guide I highly recommend the second as this provides a link to all the code and data needed to run the analyses.

The Importance of Prior Sensitivity Analysis in Bayesian Statistics: Demonstrations Using an Interactive Shiny App

Choosing priors in Bayesian ecological models by simulating from the prior predictive distribution

I am sorry to bother you again!
May I ask if there is a way to specify “flat” priors?
The default for cummulative models is student_t(3, 0, 2.5), class = "b". I am not shore this is appropriate as my study is on a disease which we had little idea of the incidence of prior to the study. We expected anywhere between 1% and ~80% of plants to be infected. So i think a uniform (or “negative”?) prior would be more suitable.

Any advice greatly appreciated.

I tried to solution in this post but it did not work.

Provide a small example, please. It makes it easier to understand. Use simulated data.

Is the 1%-80% a value you expect the \beta to have, i.e., it is constrained between [0,1]? Or are you actually talking about modeling the outcome as [0,1]?

1 Like