Divergent transitions with splines


#1

Hi all. I’m encountering divergent transitions in what I thought was a fairly straight-forward model with b-splines. I know the standard recommendation is to increase adapt.delta until these go away, but I wanted to double-check that there wasn’t any issues with the formulation of the model itself. Any input would be greatly appreciated:

Here’s the model:

data {
    // n_y: number of observations in y
    int n_y ;
    // y: vector of observations for y
    vector[n_y] y ;
    // x: values of x
    vector[n_y] x ;
    //num_splines: number of splines
    int num_splines ;
    //splines: matrix of splines
    matrix[num_splines,n_y] splines ;
}
parameters {
    // noise: measurement noise
    real<lower=0> noise ;
    // volatility: wiggliness of the functions
    real<lower=0> volatility ;
    //intercept: intercept of the function
    real intercept ;
    //slope: slope of the function
    real slope ;
    // splines_helper: helper variable for splines
    row_vector[num_splines] splines_helper ;
}
transformed parameters{
    // f: latent functions
    vector[n_y] f ;
    f = intercept + slope*x + //linear part
        to_vector((volatility*cumulative_sum(splines_helper))*splines) ; //wiggly part (random-walk prior)
}
model {
    intercept ~ normal(0, 1) ;
    slope ~ normal(0, 1) ;
    volatility ~ normal(0, 1) ;
    noise ~ normal(0, 1) ;
    splines_helper ~ normal(0,1);
    y ~ normal(f,noise);
}

And here’s R code to make and fit some fake data:

# load packages
library('splines')
library('tidyverse')
library('rstan')
rstan_options(auto_write = TRUE)

# n_x: number of unique samples on x-axis
n_x = 100

# prep a tibble with combination of x, conditions & reps
dat = tibble(
    x = seq(-10,10,length.out=n_x)
)

# set random seed for reproducibility
set.seed(1)

# wiggly true function observed with noise
dat %>%
    dplyr::mutate(
        true = sin(x)*dnorm(x,5,8) + x/100 + 1
        , obs =  true + rnorm(n(),0,.01)
    ) ->
    dat


# show the condition functions
dat %>%
    ggplot(
        mapping = aes(
            x = x
            , y = true
        )
    )+
    geom_line()

# show the noisy observations
dat %>%
    ggplot(
        mapping = aes(
            x = x
            , y = obs
        )
    )+
    geom_line()


#create splines
num_knots = 10
knots = seq(min(dat$x),max(dat$x),length.out = num_knots)
spline_degree = 3
splines <- t(bs(dat$x, knots=knots, degree=spline_degree, intercept = T))

# create the data list for stan
data_for_stan = list(
    n_y = nrow(dat)
    , y = dat$obs
    , x = dat$x
    , num_splines = nrow(splines)
    , splines = splines
)

#compile
spline_mod = rstan::stan_model('spline_fit.stan')

# start the chains
post = rstan::sampling(
    data = data_for_stan
    , object = spline_mod
    , chains = 4
    , cores = 4
    , iter = 2e3
)
#note warnings here

#check parameters
print(
    post
    , par = c(
        'noise'
        , 'volatility'
        , 'intercept'
        , 'slope'
    )
)

#check pairs plot
pairs(
    post
    , pars = c(
        'noise'
        , 'volatility'
        , 'intercept'
        , 'slope'
    )
)

And here’s the pairs plot:


#2

The yellow dots means it exceeded the maximum treedepth. I can’t see if there are any red dots indicating a divergence.


#3

And that usually means you have regions of varying curvature and you’re trying to explore a relatiely flat part of the posterior with a small step size.

Having a cumulative sum very much links the splines_helper variables together very tightly.

The only efficiency improvement I see would be declaring f as a row_vector so you don’t need the extra copy in to_vector.


#4

Treedepth exceedence happens also with constant curvature but strong correlation as mass matrix adaptation uses diagonal matrix. See, e.g. “Gaussian linear model with adjustable priors” in https://rawgit.com/avehtari/BDA_R_demos/master/demos_rstan/rstan_demo.html with 2.6% iterations with saturated treedepth and “Gaussian linear model with standardized data” with 0 saturations, just because in the first case there is very strong correlation between alpha and beta.


#5

Very good point. It’s very flat along the diagonal of correlation, but very curved going off diagonal. And that’s true even if variables are both unit scale.


#6

I enjoyed reading the comments. Know I want to add my 2 cents:

You added the intercept twice. Here:

and here:

Removing one of them should result in better mixing.

Added later

You are fitting a random intercept model:
Do we need to put a hyper-prior on the variance of slope and intercept?
And since we see a linear dependency between slope and intercept: Do we need
a multivariate normal distribution on both parameters?


#7

Yes, I’m copying that from the Splines In Stan demo, where it serves to implement a random-walk prior, which in turn achieves a bias towards smoother functions and less sensitivity to the number of splines used.


#8

Thanks, good catch.


#9

Thanks for the correction. You’re right—the cumulative sum is just going into calculating the location parameter for the normal for y. The actual prior on splines_helper is independent by component.