Splines in the event submodel of stan_jm()

I have trouble including a spline transformation to one of the variables in the ‘event’ submodel of a stan_jm() joint model.

No issues with splines in the longitudinal submodel as you can see with the model specification below (using sample data/notation from stan_jm vignette)

mod1 <- stan_jm(formulaLong = logBili ~ sex + trt + bSpline(year,degree=3,knots=4) + (bSpline(year,degree=3,knots=4) | id), dataLong = pbcLong, formulaEvent = survival::Surv(futimeYears, death) ~ sex + trt + age, dataEvent = pbcSurv, cores=parallel::detectCores(), time_var = "year", assoc = NULL, chains = 2, refresh = 20, iter=5000, seed = 12345)

However, the code below which attempts to use a b-spline on the age covariate in the event submodel gives as error.

mod2 <- stan_jm(formulaLong = logBili ~ sex + trt + bSpline(year,degree=3,knots=4) + (bSpline(year,degree=3,knots=4) | id), dataLong = pbcLong, formulaEvent = survival::Surv(futimeYears, death) ~ sex + trt + bSpline(age,degree = 3,knot=3), dataEvent = pbcSurv, cores=parallel::detectCores(), time_var = "year", assoc = NULL, chains = 2, refresh = 20, iter=5000, seed = 12345)

This is the error message:

Error in rcpp_bSpline_basis(x = xx, df = df, degree = degree, internal_knots = knots, : Internal knots must be set inside of boundary knots.

I have tried to use other spline packages or polynomial transformation, all give different types of errors (splines, splines2, rms:rcs(), poly() etc.). Is there any way to work around this? Thank you!

Hi @clairvoyant. Hmm, yeah sorry, this isn’t really something I had envisioned at the outset, and so probably didn’t think to implement it or check that it worked. But now that you bring it up, transformations inside the event model formula should have been something I considered, but unfortunately they weren’t.

The bSpline example seems to be throwing that error for some reason outside of stan_jm. For example just running:

splines2::bSpline(pbcSurv$age, degree = 3, knots = 3)

throws the same error about Internal knots must be set inside of boundary knots.

But you are right that stan_jm also fails for other spline functions.

For example, using splines::bs() I get an error about age not being found. Which is likely to be an error coming from stan_jm due to (re-)evaluating the formula somewhere without the correct evaluation environment. Most likely, I am trying to re-evaluate the model formula against model frame, which already has the evaluated spline basis terms for age rather than the age variable itself, hence why it complains about age not being found.

This is something that would probably take a bit of time to resolve, so I’m not sure I’d be able to fix/add it anytime soon.

One hacky way to get around this might be to calculate your spline basis terms outside of stan_jm, since those basis terms and/or their knot locations are not dependent on time (I say that because time – i.e . the time_var variable – is a very special covariate in stan_jm, that has special internal handling, whereas something like sex, trt, or age isn’t). So you can decide on your knot locations for age and build the basis functions outside of stan_jm, then pass those basis terms in as separate covariates. It might be a bit clunky later on when trying to generate predictions (since you’d have to calculate the basis terms for the age covariate in the predictions as well) but I think it would work - just be careful to use the same knot locations when generating the age basis terms in the predictions.

Something like:

library(rstanarm)
library(splines)

basis_terms <- splines::bs(pbcSurv$age, degree = 3, df = 4)

for (n in colnames(basis_terms)) {
  pbcSurv[,paste("age_basis", n, sep = "_")] <- basis_terms[,n]
}

mod <- stan_jm(
  formulaLong = logBili ~ sex + trt + bs(year, degree = 3, df = 4) + (1 | id), 
  dataLong = pbcLong, 
  formulaEvent = survival::Surv(futimeYears, death) ~ sex + trt + age_basis_1 + age_basis_2 + age_basis_3 + age_basis_4, 
  dataEvent = pbcSurv, 
  time_var = "year", 
  assoc = NULL, 
  cores = parallel::detectCores(), 
  chains = 2, 
  refresh = 20, 
  iter = 1000, 
  seed = 12345)

You may want to consider centering age before calculating the spline basis.

Also, you may want to explicitly specifying the internal knot locations (using the knots argument) rather than the df argument I’ve used in the code above, so that you know the exact knot locations that are being used. I noticed in your code you had the knots argument for bSpline… it might be worth noting that that argument is “knot locations” and not “number of knots” (I think), so specifying bSpline(age,degree = 3,knot=3) wouldn’t really make sense since 3 isn’t a good location for a knot in terms of the age distribution – you either want to specify df for the number of knots in default locations, or explicitly choose locations for the knots.

Anyway, hope that helps. Sorry this doesn’t just work out of the box!

Actually I realise now that the reason splines2::bSpline(pbcSurv$age, degree = 3, knots = 3) was failing with that Internal knots must be set inside of boundary knots error, is because of the last point I mentioned – the knots argument is saying an internal knot at age = 3, which doesn’t make sense because it is outside the min/max age range.

Hello @sambrilleman , thank you so much! Creating the spline basis outside and then bringing into the model via the data frame works like a charm! Also, thank you for the heads up about the ‘knot’ argument, I hadn’t realized that it was the knot locations until you mentioned it, thought it was the number of knots like in rms:rcs().

1 Like