Using information about ordinal model cutpoints when known?

While reading the rstanarm vignette on “Estimating Ordinal Regression Models with rstanarm” by @jonah and @bgoodri , I came across the following section (under “Example”), where the authors fit a model relating tobacco consumption to explanatory variables. The ordinal outcome is measured in grams/day of tobacco: 0-9, 10-19, 20-29, 30+.

However, as the authors state, the cutpoints on the latent scale are actually known, unlike other applications of ordinal models, such as Likert scale outcomes.

The authors state:

Since these cutpoints are actually known , it would be more appropriate for the model to take that into account, but stan_polr does not currently support that.

I’m interested in this idea, but I haven’t been able to find any references on this topic. Could anyone help point me in the right direction?

It seems related to interval-censored survival data: for example, a patient can only be diagnosed with a disease when they visit the clinic, so the time of onset of the disease is known to have occurred between visits. Like time of onset to diagnosis, in the vignette example the true tobacco use is positive, and unknown but within a known range.

Link to the vignette here: Estimating Ordinal Regression Models with rstanarm • rstanarm

It seems to me that that’s a limitation of the specific implementation; with a general Stan model you could simply fix the known quantities. Maybe there’s a deeper theoretical or modeling insight, but I don’t know any references for anything like it.

I think this is different in that there’s no binning here with cutpoints, and it’s a dependent variable that is within an interval – you’d estimate the actual observation time (with uncertainty), just within that interval, as opposed to estimating the parameters for fixed intervals (or if they are not fixed, as in the example, estimate the intervals themselves).

Maybe there’s an interesting connection that relates estimating unobserved quantities in a known interval to known quantities in known intervals, but it may need some additional work (or finding work that already did).

1 Like