Setting priors with some assumptions for ordinal cumulative probit brms model

I have a (perhaps naive) question about setting priors for the intercept coefficients with some assumptions about the outcome variable for a cumulative probit model with brms.

The outcome variable is a 7-point Likert scale response. Based on this tutorial [Notes on the Bayesian cumulative probit | A. Solomon Kurz], I could set weakly informative priors for the intercept coefficients, assuming a uniform distribution between the Likert items. Say though that one might expect a slight positive bias toward the higher values, assuming larger values have a positive association.

How might I set and model such priors for the intercepts? Would it still be based, e.g., on the proportions of the tibble, or is there another way to think about this?

Is it worth the trouble? A Dirichlet prior on the multinomial probabilities, that is then transformed to the link scale inside Stan code, seems to work quote well. RMSb

Fair enough. I may be trying to be too precise in that regard. I could go with as is, as visually the distribution of the simulated data sets is roughly what one could expect for the intercepts.

I do have another question related to setting the beta priors for this model:

I have a group predictor, as well as continuous predictors that were log-transformed, scaled and centered. I had thought to set regularizing priors of N(0,1) for the group predictor. I was unsure how to estimate the prior for continuous predictor, so I generated a fake dataset of ordinal and continuous variables with varying degrees of correlations. I then ran those through a simple model looking at the relationship between the two with default priors to get a rough idea of the types of estimates, given this fake data. This would lead me to think I could also use N(0,1). However, my prior predictive plot is rather U-shaped, which I did not expect. Am I approaching the continuous predictor correctly, or is there another way I should consider this?

Related code for reference:

# set contrasts (sum-coding)
contrasts(df$group) = contr.sum(2)

# develop formula
bf.form = brms::bf(rating ~ group + cont1 + cont2 + 
                     group:cont1  + group:cont2)

get_prior(bf.form, data = df, family = cumulative('probit'))

tibble(rating = 1:7) %>% 
  mutate(proportion = 1/7) %>% 
  mutate(cumulative_proportion = cumsum(proportion)) %>% 
  mutate(right_hand_threshold = qnorm(cumulative_proportion))

priors = c(
  prior(normal(-1.07, 1), class = Intercept, coef = 1),
  prior(normal(-0.57, 1), class = Intercept, coef = 2),
  prior(normal(-0.18, 1), class = Intercept, coef = 3),
  prior(normal(0.18, 1), class = Intercept, coef = 4),
  prior(normal(0.57, 1), class = Intercept, coef = 5),
  prior(normal(1.07, 1), class = Intercept, coef = 6),
  prior(normal(0, 1), class = b) 

# simulate data
generator <- SBC_generator_brms(bf.form, data = df, 
                                family = cumulative('probit'), init = 0.1, prior = priors,
                                thin = 50, warmup = 10000, refresh = 2000,
                                # Will generate the log density - this is useful, 
                                #but a bit computationally expensive - turned off
                                generate_lp = FALSE)

datasets <- generate_datasets(generator, 100)

This is the prior predictive plot of the simulated datasets given the above noted priors:

In the rmsb package I specify priors on effects through the use of contrasts. This allows you to use your original scale. For example if x is modeled with a simple regression spline (so is nonlinear) and you want to specify a prior on the odds ratio for the inter-quartile-range effect of x you could do that quite easily. Several examples are here.

1 Like

I’d also suggest moving to ordinal logit. Not only do we have it coded in Stan, it’s a lot faster because the cdf for the standard logistic is a lot easier to compute than the cdf for the standard normal.

I don’t see how you could do that because it would require a fixed scale. You’d have to reparameterize everything in terms of differences to make sure you stay within that scale. It’s usually much more performant to put a soft constraint rather than hard constraint (like hard uniform distribution bounds).

In Stan itself, you have an increasing set of cutpoins and you can put a prior on the diffs between them without a Jacobian adjustment (though that’s still one prior short of being proper as you’ve only fixed N - 1 of N degrees of freedom).

Yes, using the approach in my blog, you would just change the proportions in the tibble to fit with your prior expectations. Just make sure the proportions sum up to 1.