I am modeling a business phenomena based on survey data (repeated measures, those frequently used in Structural Equation Modeling). My response variable is generated using Bartlett score after the factor analysis is conducted (using STATA, but I am modeling using R + brms of course.) It is a multilevel model where the grouping variable is different country.
I have been getting some interesting results and trying to close my analysis after conducting the posterior predictive check and other steps. I have found that
brm by default uses “gaussian” family distributions for the response variable. The default
pp_check looked much better after I specified “student” family. But I am still feeling that things can be improved. Any suggestions on which type of family should be used?
Many thanks in advance.
pp_check(gaussian_model, type = ‘stat’, stat = ‘mean’)
Estimate SE elpd_loo -743.3 21.8 p_loo 35.1 4.4 looic 1486.6 43.7
It is the heavy deviation around -0.5 ~ -1.5 motivated me to explore a better response distribution. (Even though the gaussian family model is better in reconstructing the mean and with a smaller looic value.)
pp_check(student_model, type = ‘stat’, stat = ‘mean’)
Estimate SE elpd_loo -770.6 19.4 p_loo 37.4 2.4 looic 1541.1 38.7
Visually, the deviation around -0.5 ~ -1.5 improved a little bit. But its ability to reconstruct the mean was weaker and has a bigger looic value.
I am fairly new to the Bayesian and Stan. I understand that model selection should not be rely on solely one single measure. But I rest my case… I am hoping to get your second opinion about the specific type of distribution family should be used here to bring the improvement on all measures.
The response variable (y) is actually somewhat skewed.
mean = 0 sd = 1 skewness = -0.12 kurtosis = 3.89