Hello Stan Forum,
This is my first time posting, so I apologize for any errors, confusion, or poor formatting. I used brms in my analysis, so I am keeping the scripts in that format.
I am working with a simple binary multilevel model exploring the effect of a continuous variable (length) on the probability of two states (0:1) for a group of 385 species. State represents a discreet microhabitat, ground (0) or tree (1). I have included the covariance matrix derived from the phylogeny as a random effect/groupeffect in the model. One thing that may be important to note is that of the 385 species, 52 are state â€ś1â€ť, and 333 are state â€ś0â€ť. These states are dispersed pretty evenly across the phylogeny.
I am having a hard time understanding priors and variance in my grouplevel effect (phylo). First, here is the model:
fit < brm(
state ~ length + (1phylo), data = data1,
family = bernoulli(), cov_ranef = list(phylo = A),
iter = 3000,
prior = c(
set_prior("normal(0, 10)", class = "b", coef = "length"),
set_prior("normal(0, 50)", class = "Intercept"),
set_prior("XXXXXXXXXXX", "sd")
))
The â€śXXXâ€¦â€ť symbolizes the various priors I have implemented, explained below. I noticed that running the model with a standard halfcauchy prior on â€śphyloâ€ť leads to a very large variance for the sd estimate, which in turn is mirrored in the error in my intercept and â€ślengthâ€ť estimates. This seemed a little unusual, so I started investigating. I read about exponential priors being used to solve problems of extreme variance in grouplevel effects (Statistical Rethinking, McElreath 2016, pg. 363) as they represent the maximum entropy prior for standard deviation, and prevent extreme values in the posterior. I ran a few models using exponential priors and indeed the variance was drastically minimized, and the results â€śseem reasonableâ€ť for my dataset. I ran a few more models with different â€śsdâ€ť priors, and compared using AIC. Here are the models and results:
List of Models: These represent the â€śXXXXXXXXâ€ť in the model above
fit1_E exponential(0.1)
fit2_E exponential(1)
fit3_E exponential(2)
fit1_S prior(student_t(3, 0, 10)
fit2_S prior(student_t(3, 0, 4)
fit1_HC cauchy(0,5)
fit2_HC cauchy(0,2)
fit3_HC cauchy(0,1)
Results for each prior:
fit1_E term estimate std.error lower upper
1 b_Intercept 25.022634 18.378072 69.313583 2.251715
2 b_length 3.527981 1.882504 1.005739 8.096916
3 sd_phylo__Intercept 30.780155 14.304652 10.953545 65.831346
fit2_E term estimate std.error lower upper
1 b_Intercept 9.552696 5.9436732 23.7319120 0.4054055
2 b_length 1.279336 0.5420407 0.4991412 2.5891605
3 sd_phylo__Intercept 10.474395 3.5360108 5.3332635 18.7458643
fit3_E term estimate std.error lower upper
1 b_Intercept 6.9097195 3.9284101 15.824364 0.2717995
2 b_length 0.9090803 0.3363955 0.382051 1.7084082
3 sd_phylo__Intercept 7.0747783 1.9632313 3.989964 11.6119434
fit1_S term estimate std.error lower upper
1 b_Intercept 28.778864 23.325681 86.2666068 2.385186
2 b_length 4.279439 2.948905 0.9648861 11.732086
3 sd_phylo__Intercept 38.792357 26.448354 10.0547181 109.472044
fit2_S term estimate std.error lower upper
1 b_Intercept 21.795929 19.801790 74.7482721 2.610887
2 b_length 3.157116 2.431173 0.7307612 10.061481
3 sd_phylo__Intercept 27.953530 21.785459 7.4199031 89.639040
fit1_HC term estimate std.error lower upper
1 b_Intercept 41.21102 32.763435 117.790693 9.265238
2 b_length 7.25270 4.893182 1.196928 19.708175
3 sd_phylo__Intercept 74.85902 60.718119 12.131478 232.738342
fit2_HC term estimate std.error lower upper
1 b_Intercept 41.299566 32.745079 119.54397 6.357867
2 b_length 6.981974 4.713753 1.08501 18.763991
3 sd_phylo__Intercept 72.586361 59.047678 11.16131 228.011888
fit3_HC term estimate std.error lower upper
1 b_Intercept 42.285867 32.968770 120.137500 6.356373
2 b_length 7.345756 4.882662 1.246481 19.664864
3 sd_phylo__Intercept 76.800876 63.936976 12.765590 236.180079
It is pretty obvious looking at these results that the error and variance is all over the place, and especially large (though consistent) with the cauchy prior. You can also see that the variance was greatly reduced using the exponential priors
It is important to note that every run converged very nicely, has an ESS > 1000, and rhat = 1.00
I performed an WAIC and was a little surprised by the results:
WAIC SE
fit1_E 45.72 4.15
fit2_E 101.12 9.62
fit3_E 127.45 12.23
fit4_E 151.88 14.39
fit1_S 42.84 3.97
fit2_S 58.28 5.41
fit1_HC 27.72 2.45
fit2_HC 29.08 2.57
fit3_HC 26.71 2.35
It seems the models with cauchy priors have a drastically lower WAIC score than the studentt or exponential runs.
So here are the questions:

Is it safe to always follow the WAIC test to select a best model of reality, regardless of the extreme variance generated by the model(s) of choice?

Is the data itself, strongly weighted to score 0 over 1 (333 to 52) a possible cause of this variance and discrepancy in results? Is there a better distribution other than bernoulli() to model this kind of weighted data?

I understand that exponential() represents a vague â€śmaximum entropyâ€ť prior here, which should limit the range of variance by excluding extreme results. Did I interpret this correctly?

Ultimately all of the models agree that length has a strong positive effect on score. Is it wrong/illadvised of me to fit the stronger prior in this case to minimize the large variance?
I am not including any data as I believe the questions do not require it, but if someone thinks it would help then Iâ€™d be happy to include it.
I am really looking forward to hearing the forumâ€™s thoughts. Thanks for taking the time to read this!
Jon