What Response Distribution (Family) Should I be Using?

roofghost · April 24, 2021, 4:54am

I am modeling a business phenomena based on survey data (repeated measures, those frequently used in Structural Equation Modeling). My response variable is generated using Bartlett score after the factor analysis is conducted (using STATA, but I am modeling using R + brms of course.) It is a multilevel model where the grouping variable is different country.

I have been getting some interesting results and trying to close my analysis after conducting the posterior predictive check and other steps. I have found that brm by default uses “gaussian” family distributions for the response variable. The default pp_check looked much better after I specified “student” family. But I am still feeling that things can be improved. Any suggestions on which type of family should be used?

Many thanks in advance.

The case of a “gaussian” (the default) family

pp_check(gaussian_model)

pp_check(gaussian_model, type = ‘stat’, stat = ‘mean’)

         Estimate   SE
elpd_loo   -743.3 21.8
p_loo        35.1  4.4
looic      1486.6 43.7

It is the heavy deviation around -0.5 ~ -1.5 motivated me to explore a better response distribution. (Even though the gaussian family model is better in reconstructing the mean and with a smaller looic value.)

The case of a “student” family

pp_check(student_model)

pp_check(student_model, type = ‘stat’, stat = ‘mean’)

         Estimate   SE
elpd_loo   -770.6 19.4
p_loo        37.4  2.4
looic      1541.1 38.7

Visually, the deviation around -0.5 ~ -1.5 improved a little bit. But its ability to reconstruct the mean was weaker and has a bigger looic value.

I am fairly new to the Bayesian and Stan. I understand that model selection should not be rely on solely one single measure. But I rest my case… I am hoping to get your second opinion about the specific type of distribution family should be used here to bring the improvement on all measures.

Other information:
The response variable (y) is actually somewhat skewed.

mean = 0
sd = 1
skewness = -0.12
kurtosis = 3.89

Thank you!

mike-lawrence · April 29, 2021, 6:13pm

Choice of response distribution depends strongly on domain expertise. Can you explain a bit about the kind of data you’re dealing with?

roofghost · April 29, 2021, 11:33pm

Thanks for taking time mentioning this. I am doing operations management research. Specifically, the outcome is a type of perceived level on operation outcome (how quickly and efficiently respond to changes).

mike-lawrence · April 29, 2021, 11:53pm

How is the outcome actually measured? For example, is it a questionnaire with multiple questions?

roofghost · April 29, 2021, 11:58pm

It was a five level Likert scale questionnaire.
The response variable is a second order construct with two sub-dimensions. Each sub-dimension has three questions.
I did a factor analysis and the final score is extracted (or predicted) using Bartlett score.

mike-lawrence · April 30, 2021, 11:06am

Generally it is best to move inference as close to the raw data as possible, so I suggest you should be doing an ordinal regression of the individual likert items (for a tutorial, see here), add your factor-analytic (CFA example here) on top, within which you can also put your repeated-measures/SEM structure.

JLC · April 30, 2021, 1:02pm

Possibly related, @mike-lawrence are you aware of any exploratory factor analysis Stan implementations?

roofghost · May 1, 2021, 3:39am

Thank you. I will look through those references.
But one thing is, after I extracted the factor score using the Barlett technique, the composite score is now in a continuous scale, rather than ordinal. Do you still recommend me using the ordinal approach?

mike-lawrence · May 1, 2021, 12:23pm

When you model the raw likert responses, it will yield parameters on a continuous latent scale. You’ll then be doing all your multi-level stuff with that parameter as the “outcome”.

Topic		Replies	Views
How bad is this pp_check? Should I alter the distribution? Modeling fitting-issues , specification , brms	28	300	March 24, 2025
How to fix odd pp_check results? Modeling brms	2	95	August 1, 2024
Plot doesn't look good from pp_check() in brms Modeling brms	3	1124	January 27, 2023
Choosing a sampling distribution for left skewed data brms	15	1535	March 20, 2024
Help specifying the appropriate priors General specification	6	507	May 17, 2022

What Response Distribution (Family) Should I be Using?

The case of a “gaussian” (the default) family

The case of a “student” family

Related topics