The referenced ordinal models manuscript by Paul Burkner and Matti Vuorre describe the models as all having ordinal DVs. For example, in the second paragraph of the introduction of the cumulative models, they write: “Our cumulative model assumes that the observed ordinal variable Y , the opinion rating, originates from the categorization of a latent (not observable) continuous variable \tilde{Y}.” As I noted in my other response, if the predictor variables defined what kind of model you chose and you wanted to use sex as a predictor, then that would mean that you’d need to run a binary logistic regression, which intuitively you know would not give you good estimates of your outcome variable. Additionally, if the predictors solely determined the model you had to run, then it’s not clear what you would do in the case of predictors of various types (e.g., using age, sex, and a Likert-scale rating all as predictors in a single model).
For a really good, easily accessible overview of regression modeling, I’d highly recommend Richard McElreath’s Statistical Rethinking. There are other better books on generalized linear models and hierarchical regression modeling (Bayesian or not), but I find that Dr. McElreath’s book in particular is a good balance of being accessible to people without a huge math stats background while still being highly theoretically valuable. Plus, there’s a is bookdown project by Solomon Kurz that recodes Stastical Rethinking into brms (Statistical Rethinking with brms, ggplot2, and the tidyverse).
In short, there are a few ways that you can prove to yourself whether or not an ordinal regression is needed. Perhaps the most generally useful is the posterior predictive check. In model-based statistics, the problem of data analysis is usually best framed as trying to build a model that captures the simplified data generation process. To use your data as an example, you have observed a risk variable of interest and have some other variables that you think are informative/predictive of this risk variable. The model thus uses information about these other variables to predict what the outcome variable’s value is. In Bayesian methods, our models are both inferential and generative because we can use our posteriors to estimate a probabilistic distribution of possible/yet-to-be-observed predictions. The posterior predictive check (run in brms with the pp_check(...)
function) plots what your model’s posterior predicts the outcome variable to look like against what your observed outcome variable values are. In a well-functioning model, we want the model’s predictions to look like the raw data.
Obviously, you want to ensure that you have the right predictor variables that help to summarize the data generation process (i.e., the phenomena or proxies for phenomena that happened in the real-world to give rise to the dependent variable(s) that you have observed); however, this is only part of the model’s knowledge of the data generation process. We can include our own knowledge of the data to specify a generalization of the linear model. For example, if the observed variable has only two possible values, then it doesn’t make sense to specify a model where the likelihood of the data is defined by a normal distribution; instead, we can use a Bernoulli distribution (or a binomial distribution with 1 trial) to predict the dichotomous outcome. Similarly, if we have an outcome variable that is the number of successful attempts out of a total number of trials (e.g., number of balls hit in a game, number of stimuli correctly identified, etc), then we probably don’t want to build a model where we assume this data is normally distributed. Telling the model that a normal distribution describes the data would mean that we’re accepting the possibility that an observation of the dependent variable could take any real number despite the fact that we know that our data will always be positive (we can’t have a negative success), will have some upward bound (we don’t get infinite attempts), and may be highly skewed (the task may be very easy and lead to many successes or vice versa). In that case, a binomial distribution is likely the best option for our data. To learn more about what kinds of likelihoods you can specify in brms, you can look up the family
argument (by default, family = gaussian()
).
You can run your models under a few different distributions to see how well it performs in predicting the observed data via the pp_check(...)
function. For variables that are strictly positive, you may consider a gamma, log-normal, truncated normal/student, or truncated skew-normal distribution (see more about truncating a distribution in brms here: addition-terms: Additional Response Information in brms: Bayesian Regression Models using 'Stan'). If the outcome variable is always an integer and strictly positive, then you might consider a binomial, Poisson, or negative binomial distribution. I found it very helpful to try a lot of different models and examine their posterior predictive check at first and just learn how they worked, what distributional parameters they have, and what kinds of estimation properties they produce. In this case, you can try an ordinal model against a normal model and see how their posterior predictives compare to one another.
To extend the idea of building a model around what our data generation process is, I suggested that perhaps this data is actually multivariate. Ultimately, that’s a choice for you to make regarding the design of your study; however, if your outcome variables all arise from the same process, then there is advantage to fitting a multivariate model, even if you have separate hypotheses about each variable. The brms package offers a couple different options for multivariate regression. You can assume that all the data share the same predictors and likelihood and fit them using mvbind(...) ~ ...
. Or, you can assume that each outcome variable has its own set of predictors (and possible it’s own unique likelihood family). In this latter case, you can write each outcome variable with it’s own y1 <- bf(y1 ~ ...)
and then fit a single model as fit <- brm(y1 + y2 + ...)
. The advantages of the multivariate model is that it’s more accurate when the same processes give rise to multiple dependent variables (or when multiple dependent variables are all correlated/dependent on one another). Here’s a vignette on multivariate models in brms: Estimating Multivariate Models with brms. Notably, when you estimate the models all together in a multivariate model, you can still do all the inferences that you would normally do with univariate models. In brms, this just requires that you specify the resp = ...
argument in a lot of the post-hoc convenience functions (e.g., conditional_effects(..., resp = "rho")
).
I suspect that ultimately the reason that you wanted to consider an ordinal model for your Likert predictors is that that kind of data is qualitatively and clearly different in nature than other predictors, so it just feels like there’s something extra we should do with it. This is very true, and there are several ways to consider this problem. In R, you can specify the Likert items as ordered factors (e.g., ordered(..., levels = c(...))
), and then you can use a variety of different effect/contrast coding options to analyze the data. Alternatively, you could try to treat them as continuous and just enter them as numeric variables into the equation. The brms package offers a special function to use around ordinal predictors in the formula to account for their ordinal nature (see the vignette here: Estimating Monotonic Effects with brms).
In short, it doesn’t sound like you need to use the ordinal regression model since none of our outcome variables are ordinal, and it sounds like perhaps there is no need to worry about the ordinality of your predictors anyway since you can sum them together. Still, I encourage you to think about the data generation process and what you can build into your model through the likelihood statements, predictors, and brms formula(s) to help reflect your knowledge of the variables and processes that got you the data you’ve observed