Hierarchical brms with splines (limiting prediction below threshold)

Hi,
I have a dataset where a vehicle travels (horizontal axis) and a component starts to wear (vertical axis). I have read and searched for lots of examples, but any others are always welcome.

The population relates to multiple vehicles. Each vehicle is its own entity, thus I want it to have its individual effects influencing predictions.

I have created a generated dataset for 4 groups (entities). Some of these are very data poor. I have tried fitting a population and grouped smoothing with an individual intercept.

My questions are:
(1) The default of mgcv (used in brms?) is thin plate. Should this be set instead as ā€œcsā€?
(2) Is the brm model form structure correct?
(3) The vertical axis will respresent time. Hence I want to predict when this component will reach a threshold and provide a P10 and P90 value. The prediction currently is predicting below the value for each last data point (essentially in the past). Any advice on how to limit that predicting threshold in brms so it doesnā€™t predict below its highest ā€˜yā€™ value?

Any pearls of wisdom are greatly appreciated.
Steve

#--> Data Generation ####
d <- data.frame(                                             
  group   = c(rep(1,50), rep(2,50), rep(3,10), rep(4, 3)),
  x  = c(sample(seq(0,5,length.out=101), 50, replace=FALSE)
         , sample(seq(0,5,length.out=101), 50, replace=FALSE)
         , sample(seq(0,3,length.out=31), 10, replace=FALSE)
         , sample(seq(1,2,length.out=11), 3, replace=FALSE)
  ),
  y  = NA
)


for (ii in 1:dim(d)[1]){
  if (d$group[ii] == 1){
    d$y[ii] <-  2 + 0.2*d$x[ii]^(5/2) + rnorm(1,0,1.5)
  } else if (d$group[ii] == 2){
    d$y[ii] <-  2 + 0.25*d$x[ii]^(5/2) + rnorm(1,0,0.5)
  } else if (d$group[ii] == 3){
    d$y[ii] <-  1.7 + 0.3*d$x[ii]^(5/2) + rnorm(1,0,0.2)
  } else {
    d$y[ii] <-  2.5 + 0.2*d$x[ii]^(5/2) + rnorm(1,0,0.5)
  }
}

d$group <- d$group %>% as.factor()

# Scatter plot by group (AXIS FLIP / INVERSE)
ggplot(d, aes(x = y, y = x, color = as.factor(group))) +
  geom_point()


#--> BY FACTOR APPROACH ####
fit6i_test <- brm(bf(x ~ s(y) + s(y, by=group) + (1 | group))
                  , data = d
                  , iter = 1e3 , chains = 4 , cores = 4 , seed = 123 , silent=FALSE
                  , save_pars = save_pars(all=TRUE)
)

#--> Prediction based on model (range common) ####

## predict response for new data
dsize <- 60
y_seq <- data.frame(group = factor(c(rep(1,dsize/2), rep(2,dsize/2), rep(3,dsize/2), rep(4,dsize/2))),
                    y = rep(seq(min(d$y),22,length.out=dsize/2),4)
)

fitd <- fitted(fit6i_test, newdata = y_seq, probs = c(0.1, 0.9)) %>% as_tibble() %>% bind_cols(y_seq)
pred <- predict(fit6i_test, newdata = y_seq, probs = c(0.1, 0.9)) %>%as_tibble() %>% bind_cols(y_seq)


#Change column names for generic plot code below
names(fitd)[3] <- "Pmin"
names(fitd)[4] <- "Pmax"

names(pred)[3] <- "Pmin"
names(pred)[4] <- "Pmax"

p <- ggplot(data = d, aes(x = y, group=as.factor(group))) +
  geom_ribbon(data = pred,
              aes(ymin = Pmin, ymax = Pmax, colour=as.factor(group)),
              fill = "grey83", alpha = 0.5) +
  geom_smooth(data = fitd,
              aes(y = Estimate, ymin = Pmin, ymax = Pmax),
              stat = "identity",
              fill = "grey70", color = "black", alpha = 0.2, size = 1/4, linetype=as.numeric(fitd$group)) +
  geom_point(aes(y = x, col=as.factor(group)),
             shape = 1, size = 1.5, alpha = 0.8) +
  labs(subtitle = "Bayesian hierarchical Spline INVERSE, P10 & P90",
       y = "x") +
  #coord_cartesian(xlim = range(d$y),
  #                ylim = range(d$x)) +
  theme(text = element_text(family = "Times"),
        panel.grid = element_blank()) +
  geom_vline(xintercept=18, linetype="dashed", 
             color = "red", size=1) #+
  #geom_hline(yintercept=d$x %>% max(), linetype=3, 
  #           color = "red", size=1)

p

  • Operating System: Linux
  • brms Version: 2.14.4

Thanks for providing the code and some simulated data. I am bumping this up so folks can take a look. There are a number of other Stan and brms time series posts around here that might be helpful too.

A great resource for heirarchical smooth models generally is here:

See Figure 4 in particular.

Maybe Iā€™m misreading your third question, but it sounds like you might want to explore shape-constrained splines? I donā€™t think these are supported in brmsā€™ implementation of mgcv, but could be wrong.

And have you tried nonlinear parametric curves? Maybe Iā€™m reading too much into your dummy data.

2 Likes

Just got your data and code up and running. The model runs however there are a bunch of errors that shouldnā€™t be ignored. May want to dial back the complexity of the model first to see whatā€™s going on here. I think with:

s(y, by=group)

and a small number of groups the model is not going to behave well. It also looks like the model could use some stronger priors?

So just rolling the model back a bit to

brm(bf(x ~ 1 + s(y, bs = "cs") + (1 | group))
                  , data = d
                  , iter = 2000 , chains = 4 , cores = 6)

Still throws tons of errors that point at the need for more iterations (but not some hugantic #) and stronger priors but

1 Like

Thanks. Good adviceā€¦ā€œless is moreā€.

Well I would start there! This isnā€™t my domain area*. And that model is making some bold assumptions about whatā€™s important.

Like if that model looks ok to start with, then version 2 should have some strong priors and make sure those priors are documented in the R script.

Once that is all good you can start adding complexity if the data and domain knowledge demand or support that.

  • when I see data like that I code up a Von Bertalanffy function in Stan.