Inspired by this post https://www.fromthebottomoftheheap.net/2014/05/15/identifying-periods-of-change-with-gams/, I’m trying to identify periods of change in a GAM model, using bayesian inference.
However, I find that my choice of epsilon (the delta difference between two points), is very sensitive to the resulting confidence intervals. If I choose a really small epsilon, which is what we should want and do, my intervals of the associated first derivative blow up. If this epsilon is bigger, the intervals are narrower.
I’m using the method of finite differences , where we estimate the trend at two points very close to each other, and calculate the difference between those two points to estimate the slope.
I’m clearly missing out on something while calculating the derivatives.
library(dplyr)
library(magrittr)
library(ggplot2)
library(brms)
library(mgcv)
set.seed(08242019)
dta <- mgcv::gamSim(eg = 4, n = 200, scale = 2)
mod <- brms::brm(
y ~ s(x2, by = fac), data = dta,
chains = 4,
cores = 4,
prior = c(
prior(student_t(3, 0, 1), class = sigma),
prior(student_t(3, 0, 1), class = sds),
prior(student_t(3, 0, 1), class = b),
prior(student_t(3, 0, 5), class = Intercept)
))
epsilon <- 0.1
first <- expand.grid(x2 = seq(0, 0.99, length = 4000),
fac = c('1', '2','3'))
second <- expand.grid(x2 = seq(0 + epsilon, 0.99 + epsilon, length = 4000),
fac = c('1', '2','3'))
## get predictions
first_preds <- posterior_predict(mod, newdata = first, summary = FALSE)
second_preds <- posterior_predict(mod, newdata = second, summary = FALSE)
## calcualte the difference of the predictions
diff <- (second_preds - first_preds)/epsilon
## using the posterior samples, we calculate the mean and lower and upper quantiles
mean_finite_diff <- apply(diff, MARGIN = 2,FUN = mean)
lower_finite_diff <- apply(diff, MARGIN = 2,FUN = quantile, prob = 0.025)
upper_finite_diff <- apply(diff, MARGIN = 2,FUN = quantile, prob = 0.975)
data.frame(mean_finite_diff) %>%
cbind(lower_finite_diff) %>%
cbind(upper_finite_diff) %>%
cbind(first) %>%
ggplot(.,aes(x2, mean_finite_diff, group = fac)) + geom_smooth(aes(color = fac), se = FALSE) + geom_ribbon(aes(ymin = lower_finite_diff,
ymax = upper_finite_diff), alpha = 0.05)
However, when you decrease the epsilon,
epsilon <- 0.0001
first <- expand.grid(x2 = seq(0, 0.99, length = 4000),
fac = c('1', '2','3'))
#second <- expand.grid(x2 = seq(0 + epsilon, 0.99 + epsilon, length = 4000),
# fac = c('1', '2','3'))
second <- first %>%
mutate(x2 = x2 + epsilon)
## get predictions
first_preds <- posterior_predict(mod, newdata = first, summary = FALSE)
second_preds <- posterior_predict(mod, newdata = second, summary = FALSE)
## calcualte the difference of the predictions
diff <- (second_preds - first_preds)/epsilon
## using the posterior samples, we calculate the mean and lower and upper quantiles
mean_finite_diff <- apply(diff, MARGIN = 2,FUN = mean)
lower_finite_diff <- apply(diff, MARGIN = 2,FUN = quantile, prob = 0.025)
upper_finite_diff <- apply(diff, MARGIN = 2,FUN = quantile, prob = 0.975)
## plot
data.frame(mean_finite_diff) %>%
cbind(lower_finite_diff) %>%
cbind(upper_finite_diff) %>%
cbind(first) %>%
ggplot(.,aes(x2, mean_finite_diff, group = fac)) + geom_smooth(aes(color = fac), se = FALSE) + geom_ribbon(aes(ymin = lower_finite_diff,
ymax = upper_finite_diff), alpha = 0.05)
Ideally we want a lower epsilon to get a better approximation of the derivative. But why do the uncertainty intervals blow up?