Hello everyone and a happy new year! :)
Never posted here before, but I´ll try my best. I am fitting a rather simple Bayesian linear mixed-effects model ins brms (R version 4.4.2 and brms_2.23.0, on Win 11)
I am analysing monitoring data of animals, that tracks when individuals of a species a) appear for the first time in our research sites in a given year, and b) when they leave again for the winter.
The data is censored by the timing of when our monitoring started or ended that season, i.e. in some instances, some or even all individuals where already present in our study site (left centered data). Similarly, some individuals were still present when we left at the end of the season (right censored data). The actual censoring dates change between the different years, as monitoring stared/ended on different days each season.
Looking at the raw data, the distribution of arrival and departure dates vary significantly between years…sometimes I see strong unimodal peeks, when all individuals arrive/leave nearly at the same date. In other years, individuals arrive or departe across several weaks, or even in a bimodal patterns. Across all years, arrival data are generally right skewed, while departure data are left skewed. That was my rational for using the skew_normal distribution.
In a first step, I wanted to see if the arrival and departure dates changed across years. My response variables are ordinal date numbers. Using default priors, I ran this (with 1010 observations, across 262 IDs). As I have repeated measures for some individuals (but not all), I wanted to include the individual´s ID (tag_ID), so this was my model for the Arrival:
mod_arrival_year_effect <- brm (first_observation | cens(censored_start_monitoring) ~
as.numeric(year) +
(1|tag_id),
data = data_weather_arrival,
family = skew_normal(),
chains=4, cores=4, iter = 3000)
This model runs fine: Rhat, Bulk_ESS, Tail_ESS and look good. The posteriors check (with the warning that the censored responses are not included) don´t look perfect to me, but I am happy that the model is at least running. This model also give nearly identical results to a linear mixed effect model (estimated with lmer).
Similary, for the Departure I then ran this model (1217 observations, across 314 IDs):
mod_departure_year_effect <- brm (last_observation | cens(censored_end_monitoring) ~
as.numeric(year) +
(1|tag_id),
data = data_weather_departure,
family = skew_normal(),
chains=4, cores=4)
This was the default prior:
prior class coef group resp dpar nlpar lb ub tag source
normal(0, 4) alpha default
(flat) b default
(flat) b as.numericyear (vectorized)
student_t(3, 252, 8.9) Intercept default
student_t(3, 0, 8.9) sd 0 default
student_t(3, 0, 8.9) sd tag_id 0 (vectorized)
student_t(3, 0, 8.9) sd Intercept tag_id 0 (vectorized)
student_t(3, 0, 8.9) sigma 0 default
Here the summary:
Warning: Parts of the model have not converged (some Rhats are > 1.05). Be careful when analysing the results! We recommend running more iterations and/or setting stronger priors. Family: skew_normal
Links: mu = identity
Formula: last_observation | cens(censored_end_monitoring) ~ as.numeric(year) + (1 | tag_id)
Data: data_weather_departure (Number of observations: 1217)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~tag_id (Number of levels: 314)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.60 1.81 0.14 4.61 4.08 4 11
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -5.20 13.60 -16.14 17.71 3.55 4 13
as.numericyear 0.38 0.85 -1.01 1.13 3.70 4 13
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 12.01 1.99 10.71 15.51 4.56 4 11
alpha 1.98 0.02 1.96 2.01 3.80 4 11
So…pretty bad. I am surprised that the intercept for theregression coefficients was estimated to be -5.20, when it should be somewhere around 250 (but maybe that is due to the low effective sample sizes and does not indicate anything at this stage?). I also got a lot of warnings during warmup (“Log probability evaluates to log(0), i.e. negative infinity”). Following suggestions I found online, I tried setting a stronger prior, the inits, delta and treedepth.
But I am not sure that is did this correctly:
init <- list(
b_Intercept = 250,
b_as.numericyear = 0,
sd_tag_id = 1)
list_inits = list(init, init, init, init)
prior_1 <- c(set_prior("normal(0,10)", class = "b", coef = "as.numericyear"),
set_prior("normal(0,4)", class = "alpha"),
set_prior("student_t(3, 0, 5.9)", class = "sigma", lb = 0))
mod_departure_year_effect_brms_expanded <- brm(last_observation | cens(censored_end_monitoring) ~
as.numeric(year) +
(1|tag_id) ,
data = data_weather_departure,
family = skew_normal(),
prior = prior_1,
init= list_inits,
control = list(adapt_delta = 0.95, max_treedepth = 15),
chains=4, cores=4, iter = 4000)
I had tried setting a prior for a skew normal distribution (mu =250, alpha = -3, sigma = 10) but that did not work as intended.
Where did I go wrong?
I appreciate any help! (and if you need more information, please let me know).
All the best,
Carolin
PS: I looked at the log-transformed ordinal dates, but they are still skewed so a log_normal family would not fit well.