Dear community,
I am a psychologist and I recently followed the tutorial of Bürkner & Vuorre to analyze my Likert-scale data with an ordinal hierarchical regression. At first I did not specify any priors cause the study tries to find a new raltionship between two conditions that were not tested before and thus no prior knowledge could be accessed in the litature. However to use the bf_rope function from bayestestR package I cannot use flat priors. Since the literature is not of help and I am investigating a psychological effect, I think it is quite likely that if an effect would exist, it would be small or medium. Reading through some posts, some people recommend cauchy priors in such cases with the JAGS default setting of 0.707. Could maybe someone explain to me if this workflow would be ok? or am I biasing the priors too much with this? I really appreciate any help!
FYI:
condition is dichotomous: imagery & perception
stimulus_type: face & art
ID: subject specific identifier
image_id: stimulus identifier
priors_4 <- c(
set_prior("cauchy(0, 0.707)", class = "b", coef = "Conditionimagery_moving_rating"),
set_prior("cauchy(0, 0.707)", class = "b", coef = "stimulus_typeface")
)
cauchy_prior_moving <- brm(Rating ~ Condition + stimulus_type + (1|ID) + (1|image_id), data=moving_long, prior = priors_4, family=cumulative("probit", threshold = "flexible"),
chains = 5, iter = 2000, warmup = 1000, core = 5)
Thanks for reading! Happy about any comment!
Have a nice day!
Best,
Max
or as in this post ( Prior Choice Recommendations · stan-dev/stan Wiki (github.com)), is it better to use norm(0,1) as a weakly informative prior, in the cases in which you think both sides of the effect could be the case and the effect will be small or medium?
If you are working with standardized variables, and you are not even certain that there is an effect, it would be outright implausible to find an effect size outside the range of +/- 2.0 SMD, right? A cauchy(0, 0.7) prior would put non-zero probability on much, much higher effects.
Also, if you are planning to use Bayes Factors, those are sensitive to priors, so being careful about prior specification makes sense.
I think this paper is a good reference for why default or uninformative priors should be avoided, especially if you are working with smallish samples.
https://www.tandfonline.com/doi/abs/10.1080/10705511.2016.1186549
(edit - thought I was replying and agreeing with someone else, but it was a double first post)
Presumably we know something about these conditions ahead of time even if there’s not data from an exactly identical experiment. For weakly informative priors, we’re only informing the scale.
If you use a half-normal like this, what you’re saying is that you’re assigning a 95% probability that the value is between 0 and 2. So it really depends on the scale you expect the result to be.
For example, if you have a regression covariate of height in mm vs. height in m, then you want differently scaled priors to be weakly informative.
Cauchy priors have very long tails, which can cause the same kinds of problems as the diffuse gamma/inverse gamma priors that were popular with BUGS.
> summary(abs(rcauchy(1000, scale=0.7)))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00035 0.28267 0.66068 2.02741 1.53050 273.66110
Here’s a paper by @andrewgelman on why overly diffuse priors can be a problem for hierarchical models: Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper)
We were recommending these Cauchy priors for a long time, so it’s not surprising you still see a lot of them around.
Thanks for all the replies!
I read through the suggested papers but I think I still have a conceptual problem.
Let’s say I use an ordinal regression with probit link and the DV has a rating from 1-7.
And it shall be predicted by 2 grouping factors:
- independent variable A (has two groups)
- independent variable B (has also two groups)
If based on my experience I think that both variables have very likely a small to medium effect on the DV. How do I pick a (weakly) informative prior that represents this assumption?
People often speak about the scale and I am not really sure what they mean. When I pick a prior with norm(0,1) does this represent that I think that the DV most likely shifts between 0-2 points on the liker-scale for ecah group OR does it mean the standardized effect (Cohen’s d) I expect the independent variable has is between 0-2? (which would include way too big effects)?
Really appreciate all your help!
Cheers,
max
When I pick a prior with norm(0,1) does this represent that I think that the DV most likely shifts between 0-2 points on the liker-scale for ecah group OR does it mean the standardized effect (Cohen’s d) I expect the independent variable has is between 0-2?
Neither. Your predictors will be modelled as having effects on a latent distribution (a normal distribution, in the case of a probit model). Since the latent normal has mean 0 and SD 1, the estimates can be interpreted as a kind of Cohen’s d – but they’re not effect sizes on the actual response scale.
Also, what your priors on effects are saying in terms of the probabilities of the different response categories depends also on the priors that are set for the thresholds, so it can get a little complicated …
One approach that will for sure be helpful is to conduct prior predictive checks.