Ok first: love brms, great package! Thank you!
My objective: quantify the time span over which incidence of a disease is temporally correlated.
My data: annual numbers of cases and population at risk in over 250 communities during 9 consecutive years.
Visualisation of data already reveal:
- Overall declining trend in incidence
- Overall high variation in incidence, both over time and between communities
- Marked peaks in incidence within communities over consecutive numbers of years, but at various points in time for different communities. Most peaks are 2 to 4 years wide.
So far, I have managed to perform some basic exploratory analyses:
brm(formula = count ~ year + offset(log(pop_at_risk)) + (1 | community), family = poisson())
brm(formula = count ~ year + offset(log(pop_at_risk)) + (1 | community), family = negbinomial())
brm(formula = count ~ year + offset(log(pop_at_risk)) + (1 + year | community), family = poisson())
brm(formula = count ~ year + offset(log(pop_at_risk)) + (1 + year | community), family = negbinomial())
And as it turns out, the negative binomial with random intercept and slope fits the data best according to K-fold cross-validation (WAIC and approximate LOO gave warnings and suggested kfold). However, the Poisson model with random intercept and slope is a very close runner-up. I imagine that if I somehow add a temporally correlated error-term that captures the 2-4 year peaks that deviate from the log-linear community-level trend (I may need to kick out the random slope for that to work), the negative binomial model will no longer outperform the Poisson model.
Now, how to add such a temporally correlated error? N.B. the temporally structured errors should only be correlated within each community.
I immediately thought of simply adding a Gaussian Process term, e.g.:
brm(formula = count ~ year + offset(log(pop_at_risk)) + (1 | community) + gp(year), family = poisson())
However, this only adds a temporally structured error to the overall mean trend over time, whereas I want to add this temporally structured error separate to each community’s trendline, as the peaks / outbreaks happen at different time points in different communities. So perhaps group the gp by community:
brm(formula = count ~ year + offset(log(pop_at_risk)) + (1 | community) + gp(year, by = community), family = poisson())
But now I get 250+ separate GP’s added to the model, each with its own parameters, one for each community. I’d like to estimate one set of GP parameters (notably the scale parameter), and let each of the 250+ series of temporally structured errors be drawn from that. Is it possible? And if not, is it possible to extract the stan model code from the brm object and adapt it and work with that? I’m well versed in stan, so should be able to handle that.
Or is there another (better/easier) option for temporally structure errors that I’m completely missing? I’ve looked at the autocor argument, but am hesitant to use it as I have little experience with ARMA models and the like.
- Operating System: macOS High Sierra 10.13.6
- brms Version: 2.3.1