Identification and Cumulative Probit models

Hi All,
I’ve been following @Solomon’s post on priors for cumulative probit models and attempting to implement in them in brms using the index notation (similar to @richard_mcelreath’s approach in Statistical Rethinking). Thanks again to @Solomon, there’s a post for how to do that, as well. Although the models ultimately fit (sample successfully, no divergences), they are extremely inefficient. That has me thinking that I need to treat at least one of my factor variables as a factor (rather than an index) to aid in the identification of the rest (there are 4 factor variables and 2 monotonic predictors in the model). I’d love anyone’s thoughts on whether the index variable approach is appropriate for ordinal models using the cumulative probit and whether all factor variables need to be specified as factors (rather than indices) or if it is enough (at least theoretically) to just specify one. Thanks in advance and huge thanks to @Solomon for these helpful resources!

Thanks for your interest in my post. Others have DM’d me about identification issues about some of the of the models in that post. Namely, the last model used this formula:

bf(rating | thres(gr = item) ~ 1 + male + (1 | id) + (1 | item)) +
    lf(disc                    ~ 0 + male + (1 | id) + (1 | item),
       # don't forget this line
       cmc = FALSE)

If memory serves, it’s probably not a good idea to allow the the discrimination parameter to vary by item. A better approach might be something like this:

bf(rating | thres(gr = item) ~ 1 + male + (1 | id) + (1 | item)) +
    lf(disc                    ~ 0 + male + (1 | id),
       # don't forget this line
       cmc = FALSE)

This topic has been on m to-do list for a while, and it’s going to remain there for a while yet. But I do indeed plan on walking through the issue more carefully at some point, and the post will get updated once I do.

Thanks @Solomon - I’ve been using the following and I want to make sure that this approach for using index variables isn’t totally on the wrong track…

bf(adjrating  ~  1  + country + issue  + age + gender + ideology + ed,
                         country ~ 0 + (1|country),
                         issue ~ 0 + (1|issue),
                         age ~ 0 + age_scl,
                         gender ~ 0 + (1|gender),
                         ideology ~ 0 +  mo(ideology),
                         ed ~ 0 + mo(education),
                         nl = TRUE) +  
              lf(disc ~ 0 + (1|country),
                 cmc = FALSE)

I have something like 22,000 respondents and 12 items, so it didn’t seem feasible to let disc vary by id

If you have 12 items and each participant rated all 12 of the items, I’d find a way to allow at least some of your parameters to vary by item. Otherwise your model is presuming the items behave identically, which seems like a strong assumption. Also assuming all participants responded to 12 items, I’d find a way to at least allow the mean structure to vary by id, but possibly the discrimination model, too.

Aside: You might want to add the #brms tag to this post so other brms users might more easily find it.

Thanks! I’ll work on that. Would it make sense to treat the country variable as a factor rather than an index? My understanding of your notes is that by setting a variable as a factor, the result is to fix that value’s mean at 0. With the index variable approach, that’s no longer true - correct? In the absence of that, it’s not clear to me where the index variables are drawing their overall mean from. Maybe that isn’t the issue and the models are just slow because there is so much data and so many factor variables.

If your participants are nested within countries, you might want to fit a 3-level model, rather than a cross-classified 2-level model.

1 Like