How to simulate monotonic effects with an ordinal outcome

I’d like to simulate ordinal data for later analysis with brms. At its simplest, I have an ordinal predictor with an ordinal outcome, which I would analyse using something like this:

fit1 <- brm(ordinal_outcome ~ mo(ordinal_predictor), data = d, family = cumulative())

The Estimating Monotonic Effects with brms vignette uses the following code to simulate an ordinal predictor with a continuous outcome.

income_options <- c("below_20", "20_to_40", "40_to_100", "greater_100")
income <- factor(sample(income_options, 100, TRUE), 
                 levels = income_options, ordered = TRUE)
mean_ls <- c(30, 60, 70, 75)
ls <- mean_ls[income] + rnorm(100, sd = 7)
dat <- data.frame(income, ls)

How can I simulate a data set with an ordinal predictor and an ordinal outcome?

  • Operating System: macOS 12.5.1
  • brms Version: 2.17.0

To simulate an ordinal outcome, you need to follow the correct data generating assumptions.

An ordinal outcome y is assumed categorically distributed with K category probabilities \mathbf{\pi} = \{\pi_{1}, \pi_{2}, ..., \pi_{K}\}:

y \sim Categorical(\mathbf{\pi})

The category probabilities are generated from an underlying continuous scale split up into K bins using K - 1 thresholds, \mathbf{\theta} = \{\theta_{1}, ..., \theta_{K - 1}\}. With a suitable CDF, often the logistic but you could also choose the normal CDF, we can work out the probability in each bin. If we refer to the standard logistic CDF (with mean=0 and scale=1) as F, then:

\pi_{k} = F(\theta_{k} - \eta) - F(\theta_{k-1} - \eta)

where I’m assuming that \theta_{0} = - \infty and \theta_{K} = \infty so that F(\theta_{0} - \eta) = 0 and F(\theta_{K} - \eta) = 1. The \eta term holds the linear predictor, i.e. your ordinal predictor variable.

In R, you’d have something like (just quickly wrote this, it might contain errors):

# Generate a single ordinal data point
# @K number of ordinal categories
# @theta vector of latent cutpoints
# @eta linear predictor
gen_ordinal_data <- function(K, theta, eta){

   if(length(theta) != (K - 1))
        stop(paste0("theta must have length ", K - 1, " not ", length(K)))


   probs <- c()

   for(k in 1:K){

       if(k == 1) prev_thresh = -Inf
       else prev_thresh = theta[k - 1]

       if(k == K) next_thresh = Inf
       else next_thresh = theta[k]

       probs[k] = plogis(next_thresh - eta) - plogis(prev_thresh - eta)
   }
  
   return(sample(K, 1, prob=probs))
}

Then, you can plug whatever you want in for eta including your ordinal predictor variables.

5 Likes

Thank you so much for that! I really appreciate the detail.

  1. How would you explain the reason for taking the difference between \theta and \eta (e.g., within F(\theta_k - \eta))? Is it just because increasing \eta shifts the distribution of y to the right and decreasing it shifts the distribution to the left?

  2. How can I do parameter recovery with this code? That is, how can I set up the simulation so that I can anticipate the monotonic coefficient I will get in the brms fit output?

Here is my attempt at using the above code for a simulation and subsequent modelling:

## Generate a single ordinal data point
# @K number of ordinal categories
# @theta vector of latent cutpoints
# @eta linear predictor
gen_ordinal_data <- function(K, theta, eta) {
  if (length(theta) != (K - 1)) {
    stop(paste0("theta must have length ", K - 1, " not ", length(K)))
  }


  probs <- c()

  for (k in 1:K) {
    if (k == 1) {
      prev_thresh <- -Inf
    } else {
      prev_thresh <- theta[k - 1]
    }

    if (k == K) {
      next_thresh <- Inf
    } else {
      next_thresh <- theta[k]
    }

    probs[k] <- plogis(next_thresh - eta) - plogis(prev_thresh - eta)
  }

  return(sample(K, 1, prob = probs))
}

library(purrr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tibble)
library(brms)
#> Loading required package: Rcpp
#> Loading 'brms' package (version 2.17.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#> 
#> Attaching package: 'brms'
#> The following object is masked from 'package:stats':
#> 
#>     ar
options(mc.cores = parallel::detectCores())
set.seed(1)

K <- 5
theta <- seq(from = -4, to = 4, length.out = 4)
n <- 500
eta <- seq_len(5)
data_sim <-
  tibble(
    ID = seq_len(n)
  ) |>
  rowwise() |>
  mutate(
    variable_1 = sample(eta, 1),
    variable_2 = gen_ordinal_data(
      K = K,
      theta = theta,
      eta = variable_1
    )
  )

data_sim
#> # A tibble: 500 Γ— 3
#> # Rowwise: 
#>       ID variable_1 variable_2
#>    <int>      <int>      <int>
#>  1     1          1          4
#>  2     2          4          5
#>  3     3          1          4
#>  4     4          2          4
#>  5     5          5          5
#>  6     6          3          4
#>  7     7          2          4
#>  8     8          3          4
#>  9     9          3          4
#> 10    10          1          3
#> # … with 490 more rows

fit1 <- brm(variable_2 ~ mo(variable_1), data = data_sim, family = cumulative())
#> Compiling Stan program...
#> Trying to compile a simple C file
#> Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
#> clang -mmacosx-version-min=10.13 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.2/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.2/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.2/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.2/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.2/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.2/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.2/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.2/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DBOOST_NO_AUTO_PTR  -include '/Library/Frameworks/R.framework/Versions/4.2/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/usr/local/include   -fPIC  -Wall -g -O2  -c foo.c -o foo.o
#> In file included from <built-in>:1:
#> In file included from /Library/Frameworks/R.framework/Versions/4.2/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
#> In file included from /Library/Frameworks/R.framework/Versions/4.2/Resources/library/RcppEigen/include/Eigen/Dense:1:
#> In file included from /Library/Frameworks/R.framework/Versions/4.2/Resources/library/RcppEigen/include/Eigen/Core:88:
#> /Library/Frameworks/R.framework/Versions/4.2/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:1: error: unknown type name 'namespace'
#> namespace Eigen {
#> ^
#> /Library/Frameworks/R.framework/Versions/4.2/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:16: error: expected ';' after top level declarator
#> namespace Eigen {
#>                ^
#>                ;
#> In file included from <built-in>:1:
#> In file included from /Library/Frameworks/R.framework/Versions/4.2/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
#> In file included from /Library/Frameworks/R.framework/Versions/4.2/Resources/library/RcppEigen/include/Eigen/Dense:1:
#> /Library/Frameworks/R.framework/Versions/4.2/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
#> #include <complex>
#>          ^~~~~~~~~
#> 3 errors generated.
#> make: *** [foo.o] Error 1
#> Start sampling

fit1
#>  Family: cumulative 
#>   Links: mu = logit; disc = identity 
#> Formula: variable_2 ~ mo(variable_1) 
#>    Data: data_sim (Number of observations: 500) 
#>   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup draws = 4000
#> 
#> Population-Level Effects: 
#>              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept[1]    -5.16      1.06    -7.72    -3.58 1.00     1602     1062
#> Intercept[2]    -2.06      0.26    -2.58    -1.58 1.00     3879     2987
#> Intercept[3]     0.34      0.19    -0.03     0.72 1.00     3307     3324
#> Intercept[4]     2.93      0.25     2.47     3.43 1.00     2356     2708
#> movariable_1     1.03      0.08     0.86     1.20 1.00     2073     2275
#> 
#> Simplex Parameters: 
#>                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> movariable_11[1]     0.24      0.06     0.12     0.35 1.00     2286     1677
#> movariable_11[2]     0.20      0.07     0.07     0.34 1.00     2222     1836
#> movariable_11[3]     0.35      0.07     0.21     0.49 1.00     3304     2504
#> movariable_11[4]     0.21      0.07     0.08     0.34 1.00     2892     1425
#> 
#> Family Specific Parameters: 
#>      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> disc     1.00      0.00     1.00     1.00   NA       NA       NA
#> 
#> Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).

Created on 2022-09-16 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.1 (2022-06-23)
#>  os       macOS Big Sur ... 10.16
#>  system   x86_64, darwin17.0
#>  ui       X11
#>  language (EN)
#>  collate  en_AU.UTF-8
#>  ctype    en_AU.UTF-8
#>  tz       Australia/Sydney
#>  date     2022-09-16
#>  pandoc   2.19 @ /usr/local/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package        * version  date (UTC) lib source
#>  abind            1.4-5    2016-07-21 [1] CRAN (R 4.2.0)
#>  assertthat       0.2.1    2019-03-21 [1] CRAN (R 4.2.0)
#>  backports        1.4.1    2021-12-13 [1] CRAN (R 4.2.0)
#>  base64enc        0.1-3    2015-07-28 [1] CRAN (R 4.2.0)
#>  bayesplot        1.9.0    2022-03-10 [1] CRAN (R 4.2.0)
#>  bridgesampling   1.1-2    2021-04-16 [1] CRAN (R 4.2.0)
#>  brms           * 2.17.0   2022-04-13 [1] CRAN (R 4.2.0)
#>  Brobdingnag      1.2-7    2022-02-03 [1] CRAN (R 4.2.0)
#>  callr            3.7.1    2022-07-13 [1] CRAN (R 4.2.0)
#>  checkmate        2.1.0    2022-04-21 [1] CRAN (R 4.2.0)
#>  cli              3.3.0    2022-04-25 [1] CRAN (R 4.2.0)
#>  coda             0.19-4   2020-09-30 [1] CRAN (R 4.2.0)
#>  codetools        0.2-18   2020-11-04 [1] CRAN (R 4.2.1)
#>  colorspace       2.0-3    2022-02-21 [1] CRAN (R 4.2.0)
#>  colourpicker     1.1.1    2021-10-04 [1] CRAN (R 4.2.0)
#>  crayon           1.5.1    2022-03-26 [1] CRAN (R 4.2.0)
#>  crosstalk        1.2.0    2021-11-04 [1] CRAN (R 4.2.0)
#>  DBI              1.1.3    2022-06-18 [1] CRAN (R 4.2.0)
#>  digest           0.6.29   2021-12-01 [1] CRAN (R 4.2.0)
#>  distributional   0.3.0    2022-01-05 [1] CRAN (R 4.2.0)
#>  dplyr          * 1.0.9    2022-04-28 [1] CRAN (R 4.2.0)
#>  DT               0.24     2022-08-09 [1] CRAN (R 4.2.0)
#>  dygraphs         1.1.1.6  2018-07-11 [1] CRAN (R 4.2.0)
#>  ellipsis         0.3.2    2021-04-29 [1] CRAN (R 4.2.0)
#>  evaluate         0.16     2022-08-09 [1] CRAN (R 4.2.0)
#>  fansi            1.0.3    2022-03-24 [1] CRAN (R 4.2.0)
#>  farver           2.1.1    2022-07-06 [1] CRAN (R 4.2.0)
#>  fastmap          1.1.0    2021-01-25 [1] CRAN (R 4.2.0)
#>  fs               1.5.2    2021-12-08 [1] CRAN (R 4.2.0)
#>  generics         0.1.3    2022-07-05 [1] CRAN (R 4.2.0)
#>  ggplot2          3.3.6    2022-05-03 [1] CRAN (R 4.2.0)
#>  ggridges         0.5.3    2021-01-08 [1] CRAN (R 4.2.0)
#>  glue             1.6.2    2022-02-24 [1] CRAN (R 4.2.0)
#>  gridExtra        2.3      2017-09-09 [1] CRAN (R 4.2.0)
#>  gtable           0.3.0    2019-03-25 [1] CRAN (R 4.2.0)
#>  gtools           3.9.3    2022-07-11 [1] CRAN (R 4.2.0)
#>  highr            0.9      2021-04-16 [1] CRAN (R 4.2.0)
#>  htmltools        0.5.3    2022-07-18 [1] CRAN (R 4.2.0)
#>  htmlwidgets      1.5.4    2021-09-08 [1] CRAN (R 4.2.0)
#>  httpuv           1.6.5    2022-01-05 [1] CRAN (R 4.2.0)
#>  igraph           1.3.4    2022-07-19 [1] CRAN (R 4.2.0)
#>  inline           0.3.19   2021-05-31 [1] CRAN (R 4.2.0)
#>  knitr            1.39     2022-04-26 [1] CRAN (R 4.2.0)
#>  later            1.3.0    2021-08-18 [1] CRAN (R 4.2.0)
#>  lattice          0.20-45  2021-09-22 [1] CRAN (R 4.2.1)
#>  lifecycle        1.0.1    2021-09-24 [1] CRAN (R 4.2.0)
#>  loo              2.5.1    2022-03-24 [1] CRAN (R 4.2.0)
#>  magrittr         2.0.3    2022-03-30 [1] CRAN (R 4.2.0)
#>  markdown         1.1      2019-08-07 [1] CRAN (R 4.2.0)
#>  Matrix           1.4-1    2022-03-23 [1] CRAN (R 4.2.1)
#>  matrixStats      0.62.0   2022-04-19 [1] CRAN (R 4.2.0)
#>  mime             0.12     2021-09-28 [1] CRAN (R 4.2.0)
#>  miniUI           0.1.1.1  2018-05-18 [1] CRAN (R 4.2.0)
#>  munsell          0.5.0    2018-06-12 [1] CRAN (R 4.2.0)
#>  mvtnorm          1.1-3    2021-10-08 [1] CRAN (R 4.2.0)
#>  nlme             3.1-157  2022-03-25 [1] CRAN (R 4.2.1)
#>  pillar           1.8.1    2022-08-19 [1] CRAN (R 4.2.0)
#>  pkgbuild         1.3.1    2021-12-20 [1] CRAN (R 4.2.0)
#>  pkgconfig        2.0.3    2019-09-22 [1] CRAN (R 4.2.0)
#>  plyr             1.8.7    2022-03-24 [1] CRAN (R 4.2.0)
#>  posterior        1.3.0    2022-08-15 [1] CRAN (R 4.2.1)
#>  prettyunits      1.1.1    2020-01-24 [1] CRAN (R 4.2.0)
#>  processx         3.7.0    2022-07-07 [1] CRAN (R 4.2.0)
#>  promises         1.2.0.1  2021-02-11 [1] CRAN (R 4.2.0)
#>  ps               1.7.1    2022-06-18 [1] CRAN (R 4.2.0)
#>  purrr          * 0.3.4    2020-04-17 [1] CRAN (R 4.2.0)
#>  R.cache          0.16.0   2022-07-21 [1] CRAN (R 4.2.0)
#>  R.methodsS3      1.8.2    2022-06-13 [1] CRAN (R 4.2.0)
#>  R.oo             1.25.0   2022-06-12 [1] CRAN (R 4.2.0)
#>  R.utils          2.12.0   2022-06-28 [1] CRAN (R 4.2.0)
#>  R6               2.5.1    2021-08-19 [1] CRAN (R 4.2.0)
#>  Rcpp           * 1.0.9    2022-07-08 [1] CRAN (R 4.2.0)
#>  RcppParallel     5.1.5    2022-01-05 [1] CRAN (R 4.2.0)
#>  reprex           2.0.1    2021-08-05 [1] CRAN (R 4.2.0)
#>  reshape2         1.4.4    2020-04-09 [1] CRAN (R 4.2.0)
#>  rlang            1.0.4    2022-07-12 [1] CRAN (R 4.2.0)
#>  rmarkdown        2.14     2022-04-25 [1] CRAN (R 4.2.0)
#>  rstan            2.21.5   2022-04-11 [1] CRAN (R 4.2.0)
#>  rstantools       2.2.0    2022-04-08 [1] CRAN (R 4.2.0)
#>  scales           1.2.1    2022-08-20 [1] CRAN (R 4.2.0)
#>  sessioninfo      1.2.2    2021-12-06 [1] CRAN (R 4.2.0)
#>  shiny            1.7.2    2022-07-19 [1] CRAN (R 4.2.0)
#>  shinyjs          2.1.0    2021-12-23 [1] CRAN (R 4.2.0)
#>  shinystan        2.6.0    2022-03-03 [1] CRAN (R 4.2.0)
#>  shinythemes      1.2.0    2021-01-25 [1] CRAN (R 4.2.0)
#>  StanHeaders      2.21.0-7 2020-12-17 [1] CRAN (R 4.2.0)
#>  stringi          1.7.8    2022-07-11 [1] CRAN (R 4.2.0)
#>  stringr          1.4.0    2019-02-10 [1] CRAN (R 4.2.0)
#>  styler           1.7.0    2022-03-13 [1] CRAN (R 4.2.0)
#>  tensorA          0.36.2   2020-11-19 [1] CRAN (R 4.2.0)
#>  threejs          0.3.3    2020-01-21 [1] CRAN (R 4.2.0)
#>  tibble         * 3.1.8    2022-07-22 [1] CRAN (R 4.2.0)
#>  tidyselect       1.1.2    2022-02-21 [1] CRAN (R 4.2.0)
#>  utf8             1.2.2    2021-07-24 [1] CRAN (R 4.2.0)
#>  vctrs            0.4.1    2022-04-13 [1] CRAN (R 4.2.0)
#>  withr            2.5.0    2022-03-03 [1] CRAN (R 4.2.0)
#>  xfun             0.31     2022-05-10 [1] CRAN (R 4.2.0)
#>  xtable           1.8-4    2019-04-21 [1] CRAN (R 4.2.0)
#>  xts              0.12.1   2020-09-09 [1] CRAN (R 4.2.0)
#>  yaml             2.3.5    2022-02-21 [1] CRAN (R 4.2.0)
#>  zoo              1.8-10   2022-04-15 [1] CRAN (R 4.2.0)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.2/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

@shirdekel

  1. Yes, that’s correct. As the predictor variables increase in size, we subtract because that shifts the thresholds to the left, providing more probability density under the highest categories.
  2. If I understand your model, you’d need to fist simulate an ordinal predictor variable, then you’d need to use the gen_ordinal_data function to simulate your outcome ordinal variable, where eta is a function of the ordinal predictor. If you weren’t using monotonic effects, eta would be somthing like alpha + beta * ordinal_predictor, producing a vector of linear predictor values to generate the ordinal response variable. But you’d need to change the beta * ordinal_predictor variable specification to the structure of the monotonic predictor assumption.