Brms to run mixed effect GAMs with different priors for each covariate

Operating System: macOS Mojave
brms version 2.10.0

I’m interested in running a mixed effects GAM in a Bayesian framework so that I can put in various priors for each covariate. Based on the help file and blogs I have found code that it seems to work up until when I start messing with priors. Basically I need to set lower and upper bounds for various covariates. I found on this site:https://github.com/paul-buerkner/brms/issues/86 how to do this but I still get an error. Below is my code for the model:

priors<-c(set_prior(“normal(26,2)”,class=“b”,ub=32,nlpar=‘etaTemp’),
set_prior(“normal(5,1)”,class=“b”,lb=3.5,nlpar=‘etaDO.2’),
set_prior(“normal(25,3)”,class=“b”,lb=17,nlpar=‘etaSal’))
testmod<-brm(bf(Binomial~s(etaDO.2)+s(etaSal)+s(etaTemp)+offset(Effort)+(1|Year)+(1|Month)),
data = dat, family = binomial(),nonlinear=list(etaTemp~0+Temp,etaDO.2~0+DO.2,etaSal~0+Sal),
prior=priors, cores = 1, seed = 17,iter = 4000, warmup = 1000, thin = 10) #can use control = list(adapt_delta = 0.99) to help model if need be

I tried to do the eta addition you used but I still get a warning:

Error: The following variables are missing in ‘data’:
‘etaDO.2’, ‘etaSal’, ‘etaTemp’–

Any help is greatly appreciated. Thanks!

1 Like

There are multiple problems with your code. First, the nonlinear argument no longer exists as of brms 2.0. Second, you cannot pass non-linear parameters into spline functions. I am not exactly sure why and on what you need to set lower and upper boundaries, but if you want to restrict non-linear parameters then you can simply use transformations such as exp of inv_logit. I provide some examples in https://arxiv.org/abs/1905.09501

I see, okay I didn’t realize that it no longer existed. I need to set up bounds because I need to predict beyond the range of the covariates that the model was made over and I have other data that has informed me that there should be bounds for each of the covariates. I’m surprised this doesn’t exist in the function since this is a particular problem in ecological data and forecasting species habitat. I don’t understand how a transformation would solve this issue but I’ll look into your paper. Thanks.

If you need to constrain the smooth functions (because it is those covariates you need to extrapolate beyond the data) take a look at the b spline basis in mgcv (?b.spline after loading mgcv).

This is a very flexible basis and you can set the end points of the basis support that you need to extrapolate to, whilst focusing the basis on the range of the data (by setting the inner boundary knots), with the remaining knots (to reach k) spread evenly between these inner knots unless otherwise specified. The idea is that you need the penalty to extend out beyond the support of the data otherwise the smooth function explodes - there’s nothing in the data to constrain it.

The example shows how to use it and it’s effect, using mgcv, but I assume that brms should be able to handle this basis, though you only need to get the specification of the basis correct using the m argument.

1 Like

Awesome, thanks for the advice. It looks like this might be the way I should go. I’ve looked at some of the instructions for b spline in mgcv but I’m still confused about the syntax. So my current model looks like this (without b spline):

mod=gam(Binomial~s(DO.2)+s(Sal)+s(Temp,k=14)+offset(Effort)+s(Year,bs=“re”)+s(Month,bs=“re”),data=dat,family=‘binomial’, method=“REML”)

However, I would like an upper bound for Temp to be 32, where the probability at or above 32 drops down to 0. For DO.2 I would like a lower bound at 3.5, where the probability at or below 3.5 drops down to 0. From reading examples is looks like the parameter m inside the spline needs to be manipulated but I still don’t really understand how to modify it to get what I need above. Do you have any other advice on this?

I don’t believe you can do that. You can only constrain what the spline contributes to your model when you extrapolate beyond the support of the data. What you can do is allow the penalty that pulls the function towards a null function (zero) to act beyond the support of the data. Without the penalty extending beyond the support of the data the behaviour of the spline can be extreme.

m is specified as a vector. You set this up by using the bs = "bs" basis and by passing a vector of 4 knots via a named list to the knots argument of gam. In order those values are:

  1. the minimum temperature or DO that you want to extrapolate to,
  2. the minimum temperature or DO observed in the data
  3. the maximum temperature or DO observed in the data,
  4. the maximum temperature or DO that you want to extrapolate to.

In this way, the two extreme knots (1 and 4) allow the penalty to operate in the range of the data where you will be extrapolating in, whilst concentrating knots where there is data (the inner two knots (2, 3).

If you run the example in ?b.spline, you’ll see how this affects the behaviour of the spline when extrapolating.

With mgcv there aren’t other ways in which you can control the behaviour of splines beyond the boundaries of the data from which the basis has been formed.

ucfagls-thanks for the response. I’m tried doing what you suggested but it appears based on the bspline help page (https://rdrr.io/cran/mgcv/man/smooth.construct.bs.smooth.spec.html) that the first value in m (m[1]) determines the spline order. If that’s the case putting a temperature value in the m vector wouldn 't make sense.

For example, I tried to do what you suggested but I don’t think the syntax for m is correct.

testmod=gam(Binomial~s(DO.2)+s(Sal)+s(Temp,m=c(10,14,29,32))+offset(Effort)+s(Year,bs=“re”)+s(Month,bs=“re”),data=dat,family=‘binomial’, method=“ML”)

The response over temperature was extremely curvy when I did this and I think it is because the values in m are large. Any idea why this would be?

Thanks,

Sorry @danc, brain fart. m controls settings for the basis penalty, knots for this basis controls how the thing extrapolates.

Here’s an example, noting that you can only do this weird penalty exrapolation thing with the bspline basis so you need bs = "bs" in the s() call for Temp

knots <- list(Temp = c(10,14,29,32))
testmod <- gam(Binomial ~ s(DO.2) + s(Sal) + s(Temp, bs = "bs") +
                 offset(Effort) + s(Year, bs="re") + s(Month, bs="re"),
               data=dat, family="binomial", method="ML", knots = knots)

knots is supposed to be a named list, where the names should mach the covariate names used in smooths. If you don’t supply anything for some covariates then the usual knots locating code is used, which isn’t really relevant for your other smooths as the TPRS bases for DO.2 and Sal have a knot at all unique data points (and then the basis gets truncated to give a low-rank basis).

The reason in your model why the curves are really wiggly is that you had penalties on the 10th, 14th, 29th and 32nd derivatives of the smooth, so there was nothing controlling wiggliness in the usual sense (like curvature, the 2nd derivative).

Thank again for the help @ucfagls. Looks like I’m getting closer but am still running into some issues, which I believe are syntax related. I followed what you suggested:

knots<-list(Temp=c(10,14,29,32))
testmod <- gam(Binomial ~ s(DO.2) + s(Sal) + s(Temp, bs = “bs”) + offset(Effort) + s(Year, bs=“re”) + s(Month, bs=“re”), data=dat, family=“binomial”, method=“ML”, knots = knots)

When I do this I get the warning message, “In smooth.construct.bs.smooth.spec(object, dk$data, dk$knots) : there is no information about some basis coefficients”

Is the reason why this warning is occurring because I don’t have anything for m or k into s(Temp)? I’m not sure what I would put for both of those values. I was also just wondering in between the actual data values (between 14 and 29) would the smoother still act like a TPRS like it would for Sal and DO.2?

I think that warning happens when the middle pair of knots doesn’t quite include the data (perhaps due to some rounding etc when creating the knots argument) The middle two knots should be support by data so should be the minimum and maximum values taken by the data. I guess your data don’t quite reach 14 or 29?

1 Like