I am fitting a logistic regression using brms. For background, this is an assay where groups of 50 flies are allowed to choose between a control stimulus and some kind of chemical attractant. There are 7 different attractants used. The assay was repeated 6 times for each attractant. Each time, a completely new group of 50 flies were used. (50*42=2100 flies used in the trial, none were ever “recycled.”) The number of flies responding to the chemical attractant was counted at 11 time points.
The hypothesis is that the level of attractiveness of the different attractants will build and then attenuate at different rates. Thus I decided to fit a generalized additive mixed model (GAMM) which will allow enough flexibility to fit different trends: flat over time, increase followed by roughly a plateau, and increase to a peak followed by a quick drop-off.
Data
Here is a graph of the data with the individual groups’ trends shown as thin lines colored by attractant compound. The median trend for each attractant is shown as a thick line.
This is the structure of the data, as you can see we have counts of flies responding to the attractant at each time point for each combination of replicate and compound. (At first it is measured every 5 minutes but frequency drops to every 15 minutes after a while.)
# A tibble: 462 × 6
replicate compound time pct_response n_total n_responding
<dbl> <fct> <dbl> <dbl> <dbl> <dbl>
1 1 α-terpinene 5 2 50 1
2 2 α-terpinene 5 4 50 2
3 3 α-terpinene 5 0 50 0
4 4 α-terpinene 5 6 50 3
5 5 α-terpinene 5 2 50 1
6 6 α-terpinene 5 4 50 2
7 1 α-terpinene 10 8 50 4
8 2 α-terpinene 10 6 50 3
9 3 α-terpinene 10 2 50 1
10 4 α-terpinene 10 4 50 2
Model
I’ve fit a logistic regression model with type of attractant (compound
) as fixed effect, with a factor smooth term by time
in minutes, grouped by compound. I have fit only a random intercept term ((1|replicate:compound)
for each of the 42 experimental units, thus I’m assuming the effect of time is fixed and doesn’t vary randomly the individual experimental units. I set a constraining prior of normal(0,2)
on the fixed effects. With a logit link function, this allows for large but not crazily implausible effect sizes.
gamm_fit_ind <- brm(
n_responding | trials(n_total) ~ compound + s(time, bs = 'fs', k = 3, m = 1, by = compound) + (1 | replicate:compound),
family = binomial(link = 'logit'), data = behav_ind,
prior = c(
prior(normal(0, 2), class = b)
),
chains = 4, iter = 6000, warmup = 5000, seed = 927,
file = 'project/modelfits/gamm_fit_ind'
)
The problem
The model converges quite well and appears to fit the data very well, but I keep getting divergent transitions (~25 to 50 divergent transitions post-warmup). I can’t figure out why this might be the case. Every time I try to set different priors on the sd
and sds
parameter classes, I get worse performance.
Can anyone please help me diagnose the issue? Is it because of some issue in the way I wrote the s
term? Is it because there are too few time points per experimental unit? Is it something that can be repaired with better priors? Or none of the above and I can safely ignore the warning? Thanks in advance for the help.