Hi,
I am using brms
and I am seeking advice on how to adjust my model specification based on posterior predictive check results.
My response variable is continuous and represents the difference between the final and the initial distances (measured by GPS) of an individual to a fixed polygon, for each month.
The distance variable is then positive when the individual moves away from the polygon during the month (i.e., its final distance is greater than the initial distance), and negative when it gets closer (i.e., final distance < initial distance). If the individual was and remained within the polygon, or stayed at an equal distance from the polygon, the distance is 0.
I have 59 individuals (females and males), that are monitored several months and sometimes during several years, leading to 765 observations (ID-month).
Here is the distribution of my distance variable:
You can see that a lot of individuals stay within the polygon (peak at 0), and that some individuals can move a lot at greater distances.
My first attempt to understand the monthly variation in distance is below. I first tried to fit a brm()
model with the gaussian family:
mod_gauss <- brm(Dist ~ Month + Sex + Weight + (1|ID) + (1|Year),
data = tab_dat, family = gaussian(),
chains = 3, iter = 24000, warmup = 4000, thin = 4,
cores = 3, control = list(adapt(adapt_delta = 0.98)))
where the Month and Sex are factors (and the Weight variable corresponds to the number of GPS locations recorded during the month for each individual to measure for a potential bias due to the GPS recording).
That model converged well, here is the summary:
Family: gaussian
Links: mu = identity; sigma = identity
Formula: Dist ~ Month + Sex + Weight+ (1 | ID) + (1 | Year)
Data: all_month_ind (Number of observations: 765)
Draws: 3 chains, each with iter = 6000; warmup = 1000; thin = 4;
total post-warmup draws = 3750
Group-Level Effects:
~ID (Number of levels: 59)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.9038 1.5884 0.0659 5.9645 1.0000 14317 13498
~Year(Number of levels: 10)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 2.4372 2.1880 0.0805 8.0871 1.0000 13866 14661
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 5.5497 24.2095 -41.2484 53.3543 1.0000 14662 14310
Monthmai -16.1251 10.2041 -35.8115 3.6511 0.9999 12978 14178
Monthjuin -41.2289 10.1286 -60.8787 -21.2383 1.0000 13154 13986
Monthjuil. -62.1557 10.3287 -82.4877 -42.0309 1.0000 13075 13573
Monthaoût -28.9785 10.3716 -49.0260 -8.5567 1.0000 13742 14304
Monthsept. -26.5548 10.4687 -46.7636 -5.8644 1.0000 13311 13878
Monthoct. -35.9533 10.6489 -56.6318 -14.9359 0.9999 13321 14093
Monthnov. -121.3006 11.0212 -143.1058 -99.8523 1.0001 13107 14077
Monthdéc. -73.5587 11.2449 -95.9426 -51.4944 1.0000 13493 14013
Monthjanv.P1 -27.1956 11.3819 -49.7920 -5.1402 1.0001 12417 13840
Monthfévr.P1 -13.5913 11.6732 -36.9981 9.1901 1.0001 13571 13913
MonthmarsP1 -23.6835 12.1765 -47.4323 0.3809 1.0000 13956 14082
SexM -7.3838 4.9331 -16.9902 2.2179 1.0001 14187 14268
Weight 0.1416 0.1372 -0.1263 0.4120 0.9999 14767 13910
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 64.4803 1.6609 61.3537 67.8209 1.0000 14222 14132
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
But I find the pp_check very odd (left panel : full view, right panel : zoomed x30):
(1) Any ideas on what is happening here?
I saw that the Gaussian is a specific case of the Student family and that using the Student family could help as it is more flexible and could allow for fat tails. I ran the same model with family = student()
but it was extremely long to run (several hours for student vs half an hour for the gaussian) and I never got it converge properly (Rhat > 3). I am not even sure that the pp_check of a model that did not converge is relevant but here it is:
(2) Is the gaussian model good to use ? If not, is there something I can do to fix it or should I use a different family?
(3) Is the Student family a good choice here? Any idea on what might be the reason for the increased duration of run (compared to gaussian) and how to make it converge?
Thank you in advance for all your help, here are the data if needed:
Data.csv (34.7 KB)