Brms: one shared gp for temporally correlated errors in each of 250+ timeseries

LucC · August 2, 2018, 1:56pm

Ok first: love brms, great package! Thank you!

My objective: quantify the time span over which incidence of a disease is temporally correlated.
My data: annual numbers of cases and population at risk in over 250 communities during 9 consecutive years.

Visualisation of data already reveal:

Overall declining trend in incidence
Overall high variation in incidence, both over time and between communities
Marked peaks in incidence within communities over consecutive numbers of years, but at various points in time for different communities. Most peaks are 2 to 4 years wide.

So far, I have managed to perform some basic exploratory analyses:

brm(formula = count ~ year + offset(log(pop_at_risk)) + (1 | community), family = poisson())
brm(formula = count ~ year + offset(log(pop_at_risk)) + (1 | community), family = negbinomial())
brm(formula = count ~ year + offset(log(pop_at_risk)) + (1 + year | community), family = poisson())
brm(formula = count ~ year + offset(log(pop_at_risk)) + (1 + year | community), family = negbinomial())

And as it turns out, the negative binomial with random intercept and slope fits the data best according to K-fold cross-validation (WAIC and approximate LOO gave warnings and suggested kfold). However, the Poisson model with random intercept and slope is a very close runner-up. I imagine that if I somehow add a temporally correlated error-term that captures the 2-4 year peaks that deviate from the log-linear community-level trend (I may need to kick out the random slope for that to work), the negative binomial model will no longer outperform the Poisson model.

Now, how to add such a temporally correlated error? N.B. the temporally structured errors should only be correlated within each community.

I immediately thought of simply adding a Gaussian Process term, e.g.:

brm(formula = count ~ year + offset(log(pop_at_risk)) + (1 | community) + gp(year), family = poisson())

However, this only adds a temporally structured error to the overall mean trend over time, whereas I want to add this temporally structured error separate to each community’s trendline, as the peaks / outbreaks happen at different time points in different communities. So perhaps group the gp by community:

brm(formula = count ~ year + offset(log(pop_at_risk)) + (1 | community) + gp(year, by = community), family = poisson())

But now I get 250+ separate GP’s added to the model, each with its own parameters, one for each community. I’d like to estimate one set of GP parameters (notably the scale parameter), and let each of the 250+ series of temporally structured errors be drawn from that. Is it possible? And if not, is it possible to extract the stan model code from the brm object and adapt it and work with that? I’m well versed in stan, so should be able to handle that.

Or is there another (better/easier) option for temporally structure errors that I’m completely missing? I’ve looked at the autocor argument, but am hesitant to use it as I have little experience with ARMA models and the like.

Operating System: macOS High Sierra 10.13.6
brms Version: 2.3.1

paul.buerkner · August 2, 2018, 3:01pm

Do you mean an hierarchical Gaussian process? This is not yet implemented in brms (see https://github.com/paul-buerkner/brms/issues/412).

If you want to implement one yourself, you can extract the generated Stan code via make_stancode or stancode.

LucC · August 2, 2018, 3:21pm

Thanks for the prompt reply, Paul. I had read that particular thread, but it’s not exactly what I’m looking for, although closely related. I’m not interested in partial pooling at this point, but rather full pooling by estimating only one scale parameter and one standard deviation, and then let all the 250+ group-level sets of 9 correlated errors be drawn from the same GP. So I’m not looking to let the scale and standard deviation of the GP vary between the groups (that’s what I think you mean by hierarchical).

I have taken a look at the stan code generated by brms with:

make_stancode(formula = count ~ year + offset(log(pop_at_risk)) + (1 | ID) + gp(year), data = …, family = poisson())
(code also attached - btw I just upgraded to brms 2.4.0 in the hope of finding something I missed before :D)

In the model block, I think I will need to change the first line:

model {
vector[N] mu = temp_Intercept + Xc * b + gp(Xgp_1, sdgp_1[1], lscale_1[1], zgp_1) + offset;
…
}
}

such that the gp error terms are no longer drawn with a N by N covariance matrix (N = about 2500), but in ~250 sets of 9 by 9 covariance matrices based on identical GP hyperparameters. As the values of the predictor (“Xgp_1” below) are the same for all 250 groups, I only need to perform the cholesky decomposition once and then simply draw 250 sets of 9 standard normal errors terms and multiply each with the result of the cholesky decomposition, and then just substitute those for the gp(.) term in the code. “Just” haha.

Does that make sense?

gp_brms.stan (83.4 KB)

paul.buerkner · August 2, 2018, 10:12pm

Try out gp(year, gr = TRUE) for autogrouping in the sense you described.

That way we are fitting the same GP for all groups. Still not sure if that’s what you want but we may be getting closer.

LucC · August 3, 2018, 6:13am

Hi Paul, thanks for the suggestion. I tried it, ran it, and looked at the autogenerated code. If I understand correctly, gp(year, gr = TRUE) does indeed only generate one 9 by 9 covariance matrix (nine unique years in data):

vector[N] mu = temp_Intercept + Xc * b + gp(Xgp_1, sdgp_1[1], lscale_1[1], zgp_1)[Jgp_1] + offset;

where zgp_1 is a vector of length nine with the standard normal distribution and Xgp_1 is the vector of unique years in the data, also length nine, and Jgp_1 is an N-length vector with group indicator for every datapoint.

However, this means that essentially only 9 temporally structured error terms are generated, which basically describe the deviation of the overall annual averages from the model’s linear predictor. Whereas I need 250+ sets of 9 temporally structured error terms, but all drawn from the same gp. In brms I would expect to describe it syntactically as

brm(formula = count ~ year + offset(log(pop_at_risk)) + (1 + gp(year) | ID), data = …, family = poisson())

So gp with fixed parameters (no hierarchical stuff), but the gp itself serving as the prior for a random function to be added per group (much like a random intercept or a random slope).

I will tinker some more with the auto-generated code. Related: what function in the brms package preformats the data that is fed into the rstan call?

Again thanks for your time and suggestions!

paul.buerkner · August 3, 2018, 6:18am

That’s what I thought. Perhaps any of these options gets you close to what you want.
(1 + gp(year) | ID) would necessarily imply some pooling so it would not be the right syntax even if that was implemented.

make_standata generates the data.

LucC · August 3, 2018, 6:41am

Thanks! Will report back when I have something working.

Digging through the autogenerated code and autogenerated data, I found something surprising: the values of the predictor (i.e. year) used in the gp (Xgp_1 in the autogenerated code and data) have been scaled down by a factor of 8, i.e. .125, .250, .275, ..., 1.125 instead of the original 1:9 the data. I can appreciate the need to scale the predictor, which is probably related to the default prior for the GP. However, what I don’t get is why scale down by factor 8? And why doesn’t this happen in the transformed data block (as for the columns of the design matrix) but outside stan? And more importantly, the estimated scale parameter for the GP is not re-scaled to the original unit of the predictor (1:9). Or does that happen outside stan as well?

paul.buerkner · August 3, 2018, 7:01am

Take a quick look at ?gp for how to turn of scaling.

LucC · August 3, 2018, 9:10am

Paul! Yes, I think I managed to get it right! Turned out to be simple - the matrix algebra always fries my brain. See attachment.

Happy as a kid in a candy story. Will produce similar code for NB regression with GP random function over a predictor per subgroup. Will share here when it’s done.

Note that I haven’t changed any of the naming conventions in the data block as I wanted to be able to plug in the data generated by make_standata.

pois_gp_brms_adapted.stan (4.0 KB)

LucC · August 3, 2018, 9:17am

When I turn off scaling in brms version 2.4.0, the prior for lscale is automatically scaled with it by brms. It therefore seems that the comment following remark in ?gp is no longer applicable:

Since the default prior on lscale expects scaled predictors, it is recommended to manually specify priors on lscale, if scale is set to FALSE.

Further, when I specifiy

make_stancode(formula = count ~ year + offset(log(pop_at_risk)) + (1 + year | ID) + gp(year, gr = TRUE, scale = FALSE), prior = prior(student(2, 2, 10), class = lscale), data = …, family = poisson())

the auto-generated code still carries the default inverse gamma prior for lscale, while the call to make_stancode did not throw any warning about a potential misspecification of the prior argument.

Not an issue for me as I’m now working with manually adapted code, but I wonder if this is supposed to happen.

paul.buerkner · August 3, 2018, 9:20am

Yeah indeed the remark is no longer applicable. But I would expect that the prior scales with the scale of your data or is the prior on lscale the same (that is the same parameter values as well) regardless of whether you scale or not?

LucC · August 3, 2018, 9:23am

No worries, the prior scales beautifully with the scale of the data when I switch the scaling option on or off

LucC · August 3, 2018, 11:17am

As promised, here are the stan code files for a poisson and negative binomial mixed effects regression (random slope and intercept) including a random function per group drawn from a Gaussian Process (GP hyperparameters are estimated and are the same for all groups).

nb_gp_brms_adapted.stan (4.1 KB)
pois_gp_brms_adapted.stan (4.0 KB)
test_data.csv (1.1 KB)
test_script.R (1.5 KB)

Not sure whether this should be qualified as “hierarchical GP” as the GP hyperparameters do no vary per group.

Open question/notification from last post: specifying another prior for the GP scale with the prior argument is not propagated into the auto-generated code.

paul.buerkner · August 3, 2018, 11:44am

Can you give me a reproducible example of what prior is not propagated into the Stan code?

LucC · August 3, 2018, 12:48pm

make_stancode(formula = count ~ year + offset(log(pop_at_risk)) + (1 + year | ID) + gp(year, gr = TRUE, scale = FALSE), prior = prior(student(2, 2, 10), class = lscale), data = data.frame(ID = 1:30, year = 1:3, pop_at_risk = runif(30), count = runif(30)), family = poisson())

paul.buerkner · August 3, 2018, 1:42pm

That’s because prior specifcation of brms is hierachcial and each coefficient has its own prior already. Go for prior(student(2, 2, 10), class = lscale, coef = <X>) where is the coefficient name. You can find out the name via get_prior().

LucC · August 3, 2018, 2:41pm

get_prior(formula = count ~ year + offset(log(pop_at_risk)) + (1 + year | ID) + gp(year, gr = TRUE, scale = FALSE), data = data.frame(ID = 1:30, year = 1:3, pop_at_risk = runif(30), count = runif(30)), family = poisson())

reports:

prior: inv_gamma(45.631905, 62.799395)
class: lscale
coef: gp(year,gr=TRUE,scale=FALSE)

but neither of the following works:

prior = prior(student(2, 2, 10), class = lscale, coef = gp)
prior = prior(student(2, 2, 10), class = lscale, coef = gp(year))
prior = prior(student(2, 2, 10), class = lscale, coef = gp(year,gr=TRUE,scale=FALSE))

LucC · August 3, 2018, 3:02pm

Just noticed I missed the _t part in defining my student-t distribution. Interestingly this works:

make_stancode(formula = count ~ year + offset(log(pop_at_risk)) + (1 + year | ID) + gp(year), prior = prior(student_t(0, 2, 10), class = lscale, coef = gp(year)), data = data.frame(ID = 1:30, year = 1:3, pop_at_risk = runif(30), count = runif(30)), family = poisson())

but it breaks when I add a scale or gr argument:

make_stancode(formula = count ~ year + offset(log(pop_at_risk)) + (1 + year | ID) + gp(year, scale = FALSE), prior = prior(student_t(0, 2, 10), class = lscale, coef = gp(year)), data = data.frame(ID = 1:30, year = 1:3, pop_at_risk = runif(30), count = runif(30)), family = poisson())

even when I reflect the additional argument in the definition of the scale prior

make_stancode(formula = count ~ year + offset(log(pop_at_risk)) + (1 + year | ID) + gp(year, scale = FALSE), prior = prior(student_t(0, 2, 10), class = lscale, coef = gp(year, scale = FALSE)), data = data.frame(ID = 1:30, year = 1:3, pop_at_risk = runif(30), count = runif(30)), family = poisson())

paul.buerkner · August 3, 2018, 3:37pm

That’s because you need to specify the coefficient name correctly and there is only one correct name for it.

Topic		Replies	Views
Autoregression on non-linear model parameters in brms brms	16	1620	September 20, 2018
Auto-grouping option in brms' gp() brms gaussian-process	5	1152	September 29, 2019
Bym and besag models in brms brms	34	3310	September 17, 2019
Gaussian distribution for logit-transformed binomial response to add an auto-regressive structure brms bayesplot	17	2451	June 13, 2019
BRM function suddenly stopped running and throws a range of different errors Modeling	26	2267	October 7, 2020

Brms: one shared gp for temporally correlated errors in each of 250+ timeseries

Related topics