Hello Stan Forum,
This is my first time posting, so I apologize for any errors, confusion, or poor formatting. I used brms in my analysis, so I am keeping the scripts in that format.
I am working with a simple binary multilevel model exploring the effect of a continuous variable (length) on the probability of two states (0:1) for a group of 385 species. State represents a discreet microhabitat, ground (0) or tree (1). I have included the covariance matrix derived from the phylogeny as a random effect/group-effect in the model. One thing that may be important to note is that of the 385 species, 52 are state “1”, and 333 are state “0”. These states are dispersed pretty evenly across the phylogeny.
I am having a hard time understanding priors and variance in my group-level effect (phylo). First, here is the model:
fit <- brm(
state ~ length + (1|phylo), data = data1,
family = bernoulli(), cov_ranef = list(phylo = A),
iter = 3000,
prior = c(
set_prior("normal(0, 10)", class = "b", coef = "length"),
set_prior("normal(0, 50)", class = "Intercept"),
set_prior("XXXXXXXXXXX", "sd")
))
The “XXX…” symbolizes the various priors I have implemented, explained below. I noticed that running the model with a standard half-cauchy prior on “phylo” leads to a very large variance for the sd estimate, which in turn is mirrored in the error in my intercept and “length” estimates. This seemed a little unusual, so I started investigating. I read about exponential priors being used to solve problems of extreme variance in group-level effects (Statistical Rethinking, McElreath 2016, pg. 363) as they represent the maximum entropy prior for standard deviation, and prevent extreme values in the posterior. I ran a few models using exponential priors and indeed the variance was drastically minimized, and the results “seem reasonable” for my dataset. I ran a few more models with different “sd” priors, and compared using AIC. Here are the models and results:
List of Models: These represent the “XXXXXXXX” in the model above
fit1_E exponential(0.1)
fit2_E exponential(1)
fit3_E exponential(2)
fit1_S prior(student_t(3, 0, 10)
fit2_S prior(student_t(3, 0, 4)
fit1_HC cauchy(0,5)
fit2_HC cauchy(0,2)
fit3_HC cauchy(0,1)
Results for each prior:
fit1_E term estimate std.error lower upper
1 b_Intercept -25.022634 18.378072 -69.313583 2.251715
2 b_length 3.527981 1.882504 1.005739 8.096916
3 sd_phylo__Intercept 30.780155 14.304652 10.953545 65.831346
fit2_E term estimate std.error lower upper
1 b_Intercept -9.552696 5.9436732 -23.7319120 0.4054055
2 b_length 1.279336 0.5420407 0.4991412 2.5891605
3 sd_phylo__Intercept 10.474395 3.5360108 5.3332635 18.7458643
fit3_E term estimate std.error lower upper
1 b_Intercept -6.9097195 3.9284101 -15.824364 -0.2717995
2 b_length 0.9090803 0.3363955 0.382051 1.7084082
3 sd_phylo__Intercept 7.0747783 1.9632313 3.989964 11.6119434
fit1_S term estimate std.error lower upper
1 b_Intercept -28.778864 23.325681 -86.2666068 2.385186
2 b_length 4.279439 2.948905 0.9648861 11.732086
3 sd_phylo__Intercept 38.792357 26.448354 10.0547181 109.472044
fit2_S term estimate std.error lower upper
1 b_Intercept -21.795929 19.801790 -74.7482721 2.610887
2 b_length 3.157116 2.431173 0.7307612 10.061481
3 sd_phylo__Intercept 27.953530 21.785459 7.4199031 89.639040
fit1_HC term estimate std.error lower upper
1 b_Intercept -41.21102 32.763435 -117.790693 9.265238
2 b_length 7.25270 4.893182 1.196928 19.708175
3 sd_phylo__Intercept 74.85902 60.718119 12.131478 232.738342
fit2_HC term estimate std.error lower upper
1 b_Intercept -41.299566 32.745079 -119.54397 6.357867
2 b_length 6.981974 4.713753 1.08501 18.763991
3 sd_phylo__Intercept 72.586361 59.047678 11.16131 228.011888
fit3_HC term estimate std.error lower upper
1 b_Intercept -42.285867 32.968770 -120.137500 6.356373
2 b_length 7.345756 4.882662 1.246481 19.664864
3 sd_phylo__Intercept 76.800876 63.936976 12.765590 236.180079
It is pretty obvious looking at these results that the error and variance is all over the place, and especially large (though consistent) with the cauchy prior. You can also see that the variance was greatly reduced using the exponential priors
It is important to note that every run converged very nicely, has an ESS > 1000, and rhat = 1.00
I performed an WAIC and was a little surprised by the results:
WAIC SE
fit1_E 45.72 4.15
fit2_E 101.12 9.62
fit3_E 127.45 12.23
fit4_E 151.88 14.39
fit1_S 42.84 3.97
fit2_S 58.28 5.41
fit1_HC 27.72 2.45
fit2_HC 29.08 2.57
fit3_HC 26.71 2.35
It seems the models with cauchy priors have a drastically lower WAIC score than the student-t or exponential runs.
So here are the questions:
-
Is it safe to always follow the WAIC test to select a best model of reality, regardless of the extreme variance generated by the model(s) of choice?
-
Is the data itself, strongly weighted to score 0 over 1 (333 to 52) a possible cause of this variance and discrepancy in results? Is there a better distribution other than bernoulli() to model this kind of weighted data?
-
I understand that exponential() represents a vague “maximum entropy” prior here, which should limit the range of variance by excluding extreme results. Did I interpret this correctly?
-
Ultimately all of the models agree that length has a strong positive effect on score. Is it wrong/illadvised of me to fit the stronger prior in this case to minimize the large variance?
I am not including any data as I believe the questions do not require it, but if someone thinks it would help then I’d be happy to include it.
I am really looking forward to hearing the forum’s thoughts. Thanks for taking the time to read this!
Jon