Calculate the first derivative and it's posterior distribution of an estimated spline trend

Inspired by this post https://www.fromthebottomoftheheap.net/2014/05/15/identifying-periods-of-change-with-gams/, I’m trying to identify periods of change in a GAM model, using bayesian inference.

However, I find that my choice of epsilon (the delta difference between two points), is very sensitive to the resulting confidence intervals. If I choose a really small epsilon, which is what we should want and do, my intervals of the associated first derivative blow up. If this epsilon is bigger, the intervals are narrower.

I’m using the method of finite differences , where we estimate the trend at two points very close to each other, and calculate the difference between those two points to estimate the slope.

I’m clearly missing out on something while calculating the derivatives.

library(dplyr)
library(magrittr)
library(ggplot2)
library(brms)
library(mgcv)

set.seed(08242019)
dta <- mgcv::gamSim(eg = 4, n = 200, scale = 2)


mod <- brms::brm(
  y ~ s(x2, by = fac), data = dta, 
  chains = 4,
    cores = 4,
    prior = c(
      prior(student_t(3, 0, 1), class = sigma),
      prior(student_t(3, 0, 1), class = sds),
      prior(student_t(3, 0, 1), class = b),
      prior(student_t(3, 0, 5), class = Intercept)
))


epsilon <- 0.1

first <- expand.grid(x2 = seq(0, 0.99, length = 4000),
                    fac = c('1', '2','3'))

second <- expand.grid(x2 = seq(0 + epsilon, 0.99 + epsilon, length = 4000),
                    fac = c('1', '2','3'))

## get predictions
first_preds <- posterior_predict(mod, newdata = first, summary = FALSE)
second_preds <- posterior_predict(mod, newdata = second, summary = FALSE)

## calcualte the difference of the predictions  
diff <- (second_preds - first_preds)/epsilon

## using the posterior samples, we calculate the mean and lower and upper quantiles  
mean_finite_diff <- apply(diff, MARGIN = 2,FUN = mean)
lower_finite_diff <- apply(diff, MARGIN = 2,FUN = quantile, prob = 0.025)
upper_finite_diff <- apply(diff, MARGIN = 2,FUN = quantile, prob = 0.975)

data.frame(mean_finite_diff) %>%
  cbind(lower_finite_diff) %>%
  cbind(upper_finite_diff) %>%
  cbind(first) %>%
  ggplot(.,aes(x2, mean_finite_diff, group  = fac)) + geom_smooth(aes(color = fac), se = FALSE) + geom_ribbon(aes(ymin = lower_finite_diff, 
                                                                                                                  ymax = upper_finite_diff), alpha = 0.05)

However, when you decrease the epsilon,

epsilon <- 0.0001

first <- expand.grid(x2 = seq(0, 0.99, length = 4000),
                    fac = c('1', '2','3'))

#second <- expand.grid(x2 = seq(0 + epsilon, 0.99 + epsilon, length = 4000),
 #                   fac = c('1', '2','3'))


second <- first %>%
          mutate(x2 = x2 + epsilon)

## get predictions
first_preds <- posterior_predict(mod, newdata = first, summary = FALSE)
second_preds <- posterior_predict(mod, newdata = second, summary = FALSE)

## calcualte the difference of the predictions  
diff <- (second_preds - first_preds)/epsilon

## using the posterior samples, we calculate the mean and lower and upper quantiles  
mean_finite_diff <- apply(diff, MARGIN = 2,FUN = mean)
lower_finite_diff <- apply(diff, MARGIN = 2,FUN = quantile, prob = 0.025)
upper_finite_diff <- apply(diff, MARGIN = 2,FUN = quantile, prob = 0.975)


## plot

data.frame(mean_finite_diff) %>%
  cbind(lower_finite_diff) %>%
  cbind(upper_finite_diff) %>%
  cbind(first) %>%
  ggplot(.,aes(x2, mean_finite_diff, group  = fac)) + geom_smooth(aes(color = fac), se = FALSE) + geom_ribbon(aes(ymin = lower_finite_diff, 
                                                                                                                  ymax = upper_finite_diff), alpha = 0.05)

Ideally we want a lower epsilon to get a better approximation of the derivative. But why do the uncertainty intervals blow up?

1 Like

I’m not a 100% sure why, but the solution is just to use posterior_linpred instead. I still don’t understand though, why the choice of epsilon would be sensitive to the sampling variance in posterior_predict.