I’m trying to model some psychological self-report data & the response variable was reported on a scale of 0 to 100. The distribution of the responses is left-skewed, with mode around 75 & quite a few responses on the boundaries (0 and 100). I first tried modeling the data as truncated normal but I was getting a lot of divergent transitions & weird fit based on posterior predictive checks. I’ve seen some threads where people talked about divergent transitions being common with truncated normal, especially with responses on the boundaries, so I instead tried the
skewed_normal() likelihood fromm brms, and with that I was able to get a more reasonable posterior predictive distributions (see attached picture). However, there’s still two problems with that : I’m still getting some divergent transitions, and the response isn’t bounded between 0 and 100.
I’ve looked around for different solutions that people have applied to similar data. I’ve seen some people divide the response variable by a 100 so it’s between 0 and 1, and then use the beta likelihood to model it. However, as I mentioned, in my data there are a few observations on the boundary (0’s and 1’s, in this case), and so
brms throws an error if I try to use the
I’ve been using the default vague brms priors, so I think the divergent transitions should hopefully go away once I set more sensible priors. However, I’m more concerned about the choice of likelihood. Does anyone know a good likelihood for this sort of data? Or how to fix the problems with the truncated normal likelihood?
PS: There’s clearly over-representation of certain “nice” values (i.e. 50, 75, etc…) in the response, which I think makes sense with self-reported data. I’d be keen to model that kink in the data, but I’m not sure if it’s not a little bit beyond my current modeling skills/time resources. Is there some easy way of modeling the over-represented values? If not, should I be worried about them affecting my overall model fit?