Hi,
I have a dataset where a vehicle travels (horizontal axis) and a component starts to wear (vertical axis). I have read and searched for lots of examples, but any others are always welcome.
The population relates to multiple vehicles. Each vehicle is its own entity, thus I want it to have its individual effects influencing predictions.
I have created a generated dataset for 4 groups (entities). Some of these are very data poor. I have tried fitting a population and grouped smoothing with an individual intercept.
My questions are:
(1) The default of mgcv (used in brms?) is thin plate. Should this be set instead as ācsā?
(2) Is the brm model form structure correct?
(3) The vertical axis will respresent time. Hence I want to predict when this component will reach a threshold and provide a P10 and P90 value. The prediction currently is predicting below the value for each last data point (essentially in the past). Any advice on how to limit that predicting threshold in brms so it doesnāt predict below its highest āyā value?
Any pearls of wisdom are greatly appreciated.
Steve
#--> Data Generation ####
d <- data.frame(
group = c(rep(1,50), rep(2,50), rep(3,10), rep(4, 3)),
x = c(sample(seq(0,5,length.out=101), 50, replace=FALSE)
, sample(seq(0,5,length.out=101), 50, replace=FALSE)
, sample(seq(0,3,length.out=31), 10, replace=FALSE)
, sample(seq(1,2,length.out=11), 3, replace=FALSE)
),
y = NA
)
for (ii in 1:dim(d)[1]){
if (d$group[ii] == 1){
d$y[ii] <- 2 + 0.2*d$x[ii]^(5/2) + rnorm(1,0,1.5)
} else if (d$group[ii] == 2){
d$y[ii] <- 2 + 0.25*d$x[ii]^(5/2) + rnorm(1,0,0.5)
} else if (d$group[ii] == 3){
d$y[ii] <- 1.7 + 0.3*d$x[ii]^(5/2) + rnorm(1,0,0.2)
} else {
d$y[ii] <- 2.5 + 0.2*d$x[ii]^(5/2) + rnorm(1,0,0.5)
}
}
d$group <- d$group %>% as.factor()
# Scatter plot by group (AXIS FLIP / INVERSE)
ggplot(d, aes(x = y, y = x, color = as.factor(group))) +
geom_point()
#--> BY FACTOR APPROACH ####
fit6i_test <- brm(bf(x ~ s(y) + s(y, by=group) + (1 | group))
, data = d
, iter = 1e3 , chains = 4 , cores = 4 , seed = 123 , silent=FALSE
, save_pars = save_pars(all=TRUE)
)
#--> Prediction based on model (range common) ####
## predict response for new data
dsize <- 60
y_seq <- data.frame(group = factor(c(rep(1,dsize/2), rep(2,dsize/2), rep(3,dsize/2), rep(4,dsize/2))),
y = rep(seq(min(d$y),22,length.out=dsize/2),4)
)
fitd <- fitted(fit6i_test, newdata = y_seq, probs = c(0.1, 0.9)) %>% as_tibble() %>% bind_cols(y_seq)
pred <- predict(fit6i_test, newdata = y_seq, probs = c(0.1, 0.9)) %>%as_tibble() %>% bind_cols(y_seq)
#Change column names for generic plot code below
names(fitd)[3] <- "Pmin"
names(fitd)[4] <- "Pmax"
names(pred)[3] <- "Pmin"
names(pred)[4] <- "Pmax"
p <- ggplot(data = d, aes(x = y, group=as.factor(group))) +
geom_ribbon(data = pred,
aes(ymin = Pmin, ymax = Pmax, colour=as.factor(group)),
fill = "grey83", alpha = 0.5) +
geom_smooth(data = fitd,
aes(y = Estimate, ymin = Pmin, ymax = Pmax),
stat = "identity",
fill = "grey70", color = "black", alpha = 0.2, size = 1/4, linetype=as.numeric(fitd$group)) +
geom_point(aes(y = x, col=as.factor(group)),
shape = 1, size = 1.5, alpha = 0.8) +
labs(subtitle = "Bayesian hierarchical Spline INVERSE, P10 & P90",
y = "x") +
#coord_cartesian(xlim = range(d$y),
# ylim = range(d$x)) +
theme(text = element_text(family = "Times"),
panel.grid = element_blank()) +
geom_vline(xintercept=18, linetype="dashed",
color = "red", size=1) #+
#geom_hline(yintercept=d$x %>% max(), linetype=3,
# color = "red", size=1)
p
- Operating System: Linux
- brms Version: 2.14.4