B-spline model with one interior knot

Hi all! The following model is a simplified model that I am working on. I will summarise it briefly. The model is a quadratic B-spline model with one interior knot (and two exterior knots). The knot vector is (0, T/2,T), where T>0 is unknown and is a parameter of the model. Following some recursive relationships the quadratic spline model can be written as follows.

y(t) = \underbrace{ \big[a_1 + 4t/T(a_2-a_1)+t^2/T^2(4a_1-6a_2+2a_3)\big]}_\text{Polynomial 1}I(0\leq t < T/2) \\+\underbrace{\big[2(a_2-a_3)+t/T(8a_3-4a_2)+t^2/T^2(2a_2-6a_3)\big]}_\text{Polynomial 2}I(T/2\leq t < T) + \varepsilon, \hspace{0.2cm} \varepsilon \sim N(0,\sigma^2),

where a_1, a_2, a_3, T, and \sigma are the five model parameters. The deterministic part of y(t) is described by polynomial 1 between [0,T/2] and by polynomial 2 between [T/2,T].

We require a_1, a_2 and a_3 to be positive. In addition, T and \sigma are positive by definition.

I have specified prior distributions for the five model paramters and have simulated data from these prior distributions. Figure Two_datasets.pdf (28.0 KB) shows two plots of simulated data. The Figures show two simulated datasets from the above model. In both plots, the first blue line is at time T/2 (the interior knot) and the second blue line is at time T (the final knot). The first knot is placed at time 0. The red line represents the B-spline without noise.

I am having trouble with some of the datasets. The model appears to be multi-modal. Running the model on dataset 1 works fine. I run eight chains and all chains converge to the true values. I sample the starting values for each chain from the corresponding prior distributions (the same distributions used to generate the true parameter values). Figure Traceplots1.pdf (526.4 KB) shows the traceplots for dataset 1. The Figures below show dataset 1 and posterior simulations. The simulations represent the data well. The black dots are the datapoints, the blue line is the true deterministic part of y(t) (no noise) and the red lines are simulations (with noise) from the model using parameter values obtained from Stan.

The chains in dataset 2 converge to different values. Figure Traceplots2.pdf (505.3 KB) shows the traceplots for dataset 2. Fitting the model to this dataset comes with a warning.

Warning messages:
1: There were 21 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
2: Examine the pairs() plot to diagnose sampling problems

The Figures below show dataset 2 and posterior simulations. Posterior simulations from chains 1, 4, 5 and 6 fit the data well. Posterior simulations from chains 2, 3, 7 and 8 do not look too bad, but they do not capture the trend in the data. Perhaps Stan cannot distinguish between the trend and the noise?

I have read the blog post Identifying non-identifiability and the “Neural network: When ordering is not enough” section seems most similar. Each chain is stuck in a mode.

Is there something obvious that may be causing this model to be multi-modal? It is a simple B-spline model (only one interior knot). I imagine Stan has been used to fit B-splines more complex than this.

Perhaps the final knot and the interior knot (which is at the mid-point of 0 and the final knot) being unknown is the issue? Has anyone had an issue fitting B-splines with unknown knot positions before?

Does anyone have any other advice? Perhaps advice fitting models that find it hard to distinguish between trend and noise.

The Stan code is shown below. Noting that I simulated the true parameter values from the priors you can see in the Stan code and I also simulated from these priors to get the initial values for the chains. Note TT in the stan code below refers to T in the model and V is \sigma^2.

data {
  int<lower=1> M;
  vector[M] y;
  vector[M] TIME;
  int START;
  int END;

parameters {
  real<lower=0> a1;  
  real<lower=0> a2;
  real<lower=0> a3;
  real TT;
  real<lower=0> V;
model {
  for(j in START:END){
    int I1;
    int I2;
    real MU;
    if(TIME[j]>=0 && TIME[j]<(0.5*exp(TT))){
      I1 = 1;
    } else {
      I1 = 0;
    if(TIME[j]>=(0.5*exp(TT)) && TIME[j]<exp(TT)){
      I2 = 1;
    } else {
      I2 = 0;
      MU = (a1 + 4*TIME[j]/exp(TT)*(a2-a1)+TIME[j]^2/exp(TT)^2*(4*a1-6*a2 + 2*a3))*I1+
    (2*(a2-a3) + TIME[j]/exp(TT)*(8*a3-4*a2)+TIME[j]^2/exp(TT)^2*(2*a2-6*a3))*I2;
      y[j] ~ normal(MU,sqrt(V));
  a1 ~ normal(2,0.5);
  a2 ~ normal(1.5,0.75);
  a3 ~ normal(1.5,0.75);
  TT ~ normal(4.5,1);
  V ~ normal(0.05,0.005);

I simulated the data using the code below (which I think are the same priors in my Stan code, so I hoped Stan would recover the true values).

a1 <- data.frame(x = rnorm(N,2, 0.5)) %>% filter(x > 0) %>% pull(x)
a2 <-  data.frame(x = rnorm(N,1.5, 0.75)) %>% filter(x > 0) %>% pull(x)
a3 <- data.frame(x = rnorm(N,1.5, 0.75)) %>% filter(x > 0) %>% pull(x)
TT <- rnorm(N,4.5,1)
V <- data.frame(x = rnorm(N,0.05, 0.005)) %>% filter(x > 0) %>% pull(x)

I suspect the unknown knot positions is the issue – the “wiggliness” in the data might be equally well fit by one spline with its coefficients and a different spline with different coefficients and a different knot location. You may want to experiment with your prior on the knot location. Or just pre-specify knots at quantiles of the predictors, which I think is somewhat common and has some evidence for why it might not be such a bad thing to do, if you have enough data ([Generalized Additive Models]: Comment).

1 Like