I am modeling a business phenomena based on survey data (repeated measures, those frequently used in Structural Equation Modeling). My response variable is generated using Bartlett score after the factor analysis is conducted (using STATA, but I am modeling using R + brms of course.) It is a multilevel model where the grouping variable is different country.
I have been getting some interesting results and trying to close my analysis after conducting the posterior predictive check and other steps. I have found that brm
by default uses “gaussian” family distributions for the response variable. The default pp_check
looked much better after I specified “student” family. But I am still feeling that things can be improved. Any suggestions on which type of family should be used?
Many thanks in advance.
The case of a “gaussian” (the default) family
pp_check(gaussian_model)
pp_check(gaussian_model, type = ‘stat’, stat = ‘mean’)
Estimate SE
elpd_loo -743.3 21.8
p_loo 35.1 4.4
looic 1486.6 43.7
It is the heavy deviation around -0.5 ~ -1.5 motivated me to explore a better response distribution. (Even though the gaussian family model is better in reconstructing the mean and with a smaller looic value.)
The case of a “student” family
pp_check(student_model)
pp_check(student_model, type = ‘stat’, stat = ‘mean’)
Estimate SE
elpd_loo -770.6 19.4
p_loo 37.4 2.4
looic 1541.1 38.7
Visually, the deviation around -0.5 ~ -1.5 improved a little bit. But its ability to reconstruct the mean was weaker and has a bigger looic value.
I am fairly new to the Bayesian and Stan. I understand that model selection should not be rely on solely one single measure. But I rest my case… I am hoping to get your second opinion about the specific type of distribution family should be used here to bring the improvement on all measures.
Other information:
The response variable (y) is actually somewhat skewed.
mean = 0
sd = 1
skewness = -0.12
kurtosis = 3.89
Thank you!