Negative skew, 0 inflated data, new to brms

Hello,

I am currently trying to apply a model to my outcome variable: sentiment score, values range from -4 to +2, is zero-inflated and is continuous.

I want to look whether there is a relationship between the way a question is asked (positive, negative, neutral wording) and the sentiment of the response. I have 2638 people asked a question about symptoms. 1/3 of the people were asked it with a negative wording, 1/3 with a neutral one, 1/3 with a positive one. From this, I did sentiment analysis (using Trincker’s package) to see whether their responses were more positive or negative, depending on the wording of the question.

Sentiment analysis breaks down responses into sentences, so I have 2638 people, but 7924 sentences, so I would assume to fit ID as a random effect.

The big question is: does the way the question is asked (primetype) affect the sentiment of the response? At first, I thought perhaps a glmm with gaussian family, but as my data is slightly negative skewed, with many 0s (more neutral answers), so that didn’t work.

As another option, I converted my outcome to an ordered categorical variable: positive, negative, neutral, and tried my hand at BRMS.
I fit a few models, and compared them with LOO

fit_m1<-brm(sentiment ~primetype, 
            data = df, family = cumulative("cloglog"))
summary(fit_m1)

LOOIC 4140.6 (SE 72.5)
PLOO 4.1 (0.1)

 fit_m2 <- brm(formula = sentiment ~ cs(primetype),
 data = df,
family = acat("cloglog"))

LOOIC 4118.1 (SE 72.2)
PLOO 6.0(0.2)

fit_m4 <- brm(formula = bf(sentiment ~  primetype) +
lf(disc ~ 0 + primetype, cmc = FALSE),
data = df,
family = cumulative("cloglog")
)

LOOIC 4118.2 (SE 72.)
PLOO
6(0.2)

for all, Monte Carlo SE of elpd_loo is 0.0. All Pareto k estimates are good (k < 0.5).

Sorry, a super long question-but essentially I’m looking to know whether this is a good way to do it, and if my output is good enough to use, or if there are other things I could do with this model. At present, I’ve given up on-being able to fit it as continuous, and am sticking to the ordered categorical, but perhaps someone has a better idea.

Thank you so much

1 Like

Hi and welcome. Thanks for writing out the model code. One thing that can help is to provide some data (either real or simulated) That way folks can have something to play around with.

Have you gone through the diagnostics plots in brms to check to see how the models are behaving?

2 Likes