Hi,
As a long-time user of frequentist stats, I am trying to convert to Bayesian modeling, so I am quite new to the Bayesian approach and, therefore, to Stan. So I apologize if my question comes off as naive.
I am trying to model data that has something like a bimodal distribution. This data is motor performance after stroke. Many of the stroke motor scales have a fairly strong ceiling effect, like below:
The model has three independent variables and one random effect. The formula is:
motor_score ~ lesion_load_z + lesion_side + lesion_volume_z + (1|site)
Motor score is scaled between 0 and 1, lesion load and lesion volume are z-scored and lesion side and site are categorical. Lesion side has 2 levels and site has 21 levels. I have 242 samples total.
The first approach that I tried is using a Beta glm with:
mod1 = brm(NORMED_MOTOR ~ Total_Percsub_Cramer_z + LesionSide + Lesion_Volume_z + (1|SITE), data=reduced_modeling_df, family=Beta(link="logit"), iter = 6000, chains = 6)
This model converges fine but the fit isn’t great :
The second approach I tried is scaling (i.e., z-score) the outcome and then trying to fit a gaussian mixture:
mix = mixture(gaussian, gaussian)
mod2 = brm(NORMED_MOTOR_z ~ Total_Percsub_Cramer_z + LesionSide + Lesion_Volume_z + (1|SITE), data=reduced_modeling_df, iter = 6000, chains = 6, family=mix)
For this model, it has trouble converging with the errors:
1: There were 150 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.
2: There were 180 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
https://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
3: Examine the pairs() plot to diagnose sampling problems
4: The largest R-hat is 3.25, indicating chains have not mixed.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#r-hat
5: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess
6: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
And the pp_check looks really weird as the model is predicting outlandish values:
![gaus_mix_fit|690x490](upload://6JCm8IqO6yNECaaQxjWvFhVzgYB.png)
My intuition is that the beta distribution is the right approach, but I am not sure how to intelligently change the parameters of the model to improve the fit. I have tried adjusting priors but nothing so far is affecting much. Are there parameters that I can adjust to get the fit a bit closer? Any advice is very much appreciated!
Thanks!