Divergent transitions with splines

#1

Hi all. I’m encountering divergent transitions in what I thought was a fairly straight-forward model with b-splines. I know the standard recommendation is to increase adapt.delta until these go away, but I wanted to double-check that there wasn’t any issues with the formulation of the model itself. Any input would be greatly appreciated:

Here’s the model:

``````data {
// n_y: number of observations in y
int n_y ;
// y: vector of observations for y
vector[n_y] y ;
// x: values of x
vector[n_y] x ;
//num_splines: number of splines
int num_splines ;
//splines: matrix of splines
matrix[num_splines,n_y] splines ;
}
parameters {
// noise: measurement noise
real<lower=0> noise ;
// volatility: wiggliness of the functions
real<lower=0> volatility ;
//intercept: intercept of the function
real intercept ;
//slope: slope of the function
real slope ;
// splines_helper: helper variable for splines
row_vector[num_splines] splines_helper ;
}
transformed parameters{
// f: latent functions
vector[n_y] f ;
f = intercept + slope*x + //linear part
to_vector((volatility*cumulative_sum(splines_helper))*splines) ; //wiggly part (random-walk prior)
}
model {
intercept ~ normal(0, 1) ;
slope ~ normal(0, 1) ;
volatility ~ normal(0, 1) ;
noise ~ normal(0, 1) ;
splines_helper ~ normal(0,1);
y ~ normal(f,noise);
}
``````

And here’s R code to make and fit some fake data:

``````# load packages
library('splines')
library('tidyverse')
library('rstan')
rstan_options(auto_write = TRUE)

# n_x: number of unique samples on x-axis
n_x = 100

# prep a tibble with combination of x, conditions & reps
dat = tibble(
x = seq(-10,10,length.out=n_x)
)

# set random seed for reproducibility
set.seed(1)

# wiggly true function observed with noise
dat %>%
dplyr::mutate(
true = sin(x)*dnorm(x,5,8) + x/100 + 1
, obs =  true + rnorm(n(),0,.01)
) ->
dat

# show the condition functions
dat %>%
ggplot(
mapping = aes(
x = x
, y = true
)
)+
geom_line()

# show the noisy observations
dat %>%
ggplot(
mapping = aes(
x = x
, y = obs
)
)+
geom_line()

#create splines
num_knots = 10
knots = seq(min(dat\$x),max(dat\$x),length.out = num_knots)
spline_degree = 3
splines <- t(bs(dat\$x, knots=knots, degree=spline_degree, intercept = T))

# create the data list for stan
data_for_stan = list(
n_y = nrow(dat)
, y = dat\$obs
, x = dat\$x
, num_splines = nrow(splines)
, splines = splines
)

#compile
spline_mod = rstan::stan_model('spline_fit.stan')

# start the chains
post = rstan::sampling(
data = data_for_stan
, object = spline_mod
, chains = 4
, cores = 4
, iter = 2e3
)
#note warnings here

#check parameters
print(
post
, par = c(
'noise'
, 'volatility'
, 'intercept'
, 'slope'
)
)

#check pairs plot
pairs(
post
, pars = c(
'noise'
, 'volatility'
, 'intercept'
, 'slope'
)
)
``````

And here’s the pairs plot:

#2

The yellow dots means it exceeded the maximum treedepth. I can’t see if there are any red dots indicating a divergence.

#3

And that usually means you have regions of varying curvature and you’re trying to explore a relatiely flat part of the posterior with a small step size.

Having a cumulative sum very much links the `splines_helper` variables together very tightly.

The only efficiency improvement I see would be declaring `f` as a `row_vector` so you don’t need the extra copy in `to_vector`.

#4

Treedepth exceedence happens also with constant curvature but strong correlation as mass matrix adaptation uses diagonal matrix. See, e.g. “Gaussian linear model with adjustable priors” in https://rawgit.com/avehtari/BDA_R_demos/master/demos_rstan/rstan_demo.html with 2.6% iterations with saturated treedepth and “Gaussian linear model with standardized data” with 0 saturations, just because in the first case there is very strong correlation between alpha and beta.

#5

Very good point. It’s very flat along the diagonal of correlation, but very curved going off diagonal. And that’s true even if variables are both unit scale.

#6

I enjoyed reading the comments. Know I want to add my 2 cents:

You added the intercept twice. Here:

and here:

Removing one of them should result in better mixing.

Added later

You are fitting a random intercept model:
Do we need to put a hyper-prior on the variance of slope and intercept?
And since we see a linear dependency between slope and intercept: Do we need
a multivariate normal distribution on both parameters?

#7

Yes, I’m copying that from the Splines In Stan demo, where it serves to implement a random-walk prior, which in turn achieves a bias towards smoother functions and less sensitivity to the number of splines used.

#8

Thanks, good catch.

#9

Thanks for the correction. You’re right—the cumulative sum is just going into calculating the location parameter for the normal for `y`. The actual prior on `splines_helper` is independent by component.