Help with Hierarchical Changepoint Model in rstan: Divergent Transitions and BFMI Warnings

I am new to rstan and having difficulty specifying a hierarchical model for changepoint analysis in fish movement data. I would like assistance with model specification or reparameterization, as I am encountering the following warnings after running rstan::sampling():

1: There were 16 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
2: There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
https://mc-stan.org/misc/warnings.html#bfmi-low 

I have already increased adapt_delta, but the warnings persist.

Background:

The goal of my analysis is to detect changepoints (or whether such points exist) in the positions (i.e. state) of river fish over time. My dataset consists of 8 observations per individual (N = 8) for 5 individuals (K = 5), with data collected over consecutive days. The data is structured as follows:

> data
$N
[1] 8

$K
[1] 5

$Y
 [1] 231.84559 234.53067 234.51740 233.56482 237.17536 229.79460
 [7] 228.59779 227.76857  93.28173  99.40232 104.81858 102.88056
[13] 103.60429  98.21964  95.98356  98.00442 609.43382 608.35827
[19] 593.83546 591.99762 610.21665 601.23710 605.45246 607.57787
[25] 604.33400 607.19565 600.49975 594.15830 604.08196 595.68660
[31] 602.18625 597.53365 779.93654 779.93654 779.93654 779.93654
[37] 779.93654 776.38965 779.93654 775.96977

$zero_Dist
[1] 231.84559  93.28173 609.43382 604.33400 779.93654

The variable Y represents the position, specifically the distance from the most downstream point in my research area.

Model Description:

I used Inverse Transform Sampling for generating samples from a Cauchy distribution and specified the prior for state_raw using a normal distribution with individual-specific standard deviations. Below is my current Stan code:

data {
  int<lower=0> N; // Number of observations per individual
  int<lower=0> K; // Number of individuals
  vector[N*K] Y;  // Vector containing the observed values
  vector[K] zero_Dist; // Initial position for each individual on the first day
}

parameters {
  real<lower=0, upper=20> s_v; // Observation error standard deviation
  real<lower=0, upper=20> s_w; // State error standard deviation
  vector<lower= -pi()/2, upper= pi()/2>[(N-1)*K] state_raw; // Raw state deviations
  vector<lower=0>[K] s_K; // Individual-level standard deviations for state_raw
  vector[K] s_zero_Dist_raw; // Initial state deviations for each individual
}

transformed parameters {
  vector<lower=0, upper=850>[K] state_zero; // Initial state for each individual
  vector<lower=0, upper=850>[N*K] state; // State vector for all individuals
  
  for (k in 1:K) {
    // Prior distribution for state_zero (reparameterization)
    state_zero[k] = zero_Dist[k] + 3 * s_zero_Dist_raw[k];  
    
    // State for the first day
    state[1 + (k-1)*N] = state_zero[k];
    
    // State for the second day and beyond
    for (i in 2:N) {
      state[i + (k-1)*N] = state[i-1 + (k-1)*N] + s_w * tan(state_raw[i-1 + (k-1)*(N-1)]); // Inverse Transform Sampling
    }
  }
}

model {
  // Priors
  s_v ~ normal(5, 3); 
  s_w ~ normal(5, 3);
  s_zero_Dist_raw ~ normal(0, 1);
  
  // Hierarchical model for state_raw
  for (k in 1:K) {
    for (i in 1:(N-1)) {
      state_raw[i + (k-1)*(N-1)] ~ normal(0, s_K[k]); // Hierarchical definition of state_raw for individual-specific variation
    }
  }
  
  // Observation model
  for (i in 1:(N*K)) {
    Y[i] ~ normal(state[i], s_v);
  }
}
"

I am also new to this forum and please tell me if more information is needed.
I would appreciate suggestions for reparameterization to address the divergent transitions and BFMI warnings, or any insights on improving the hierarchical structure of my model. Thank you!

How does the model behave when you specify a centered parameterization rather than building it hierarchically?

Thank you for the reply. I ran the following code without the hierarchical part:

data {
  int<lower=0> N; // Number of observations per individual
  int<lower=0> K; // Number of individuals
  vector[N*K] Y;  // Vector containing the observed values
  vector[K] zero_Dist; // Initial position for each individual on the first day
}

parameters {
  real<lower=0, upper=20> s_v; // Observation error standard deviation
  real<lower=0, upper=20> s_w; // State error standard deviation
  vector<lower= -pi()/2, upper= pi()/2>[(N-1)*K] state_raw; // Raw state deviations
  vector<lower=0>[K] s_K; // Individual-level standard deviations for state_raw
  vector[K] s_zero_Dist_raw; // Initial state deviations for each individual
}

transformed parameters {
  vector<lower=0, upper=850>[K] state_zero; // Initial state for each individual
  vector<lower=0, upper=850>[N*K] state; // State vector for all individuals
  
  for (k in 1:K) {
    // Prior distribution for state_zero (reparameterization)
    state_zero[k] = zero_Dist[k] + 3 * s_zero_Dist_raw[k];  
    
    // State for the first day
    state[1 + (k-1)*N] = state_zero[k];
    
    // State for the second day and beyond
    for (i in 2:N) {
      state[i + (k-1)*N] = state[i-1 + (k-1)*N] + s_w * tan(state_raw[i-1 + (k-1)*(N-1)]); // Inverse Transform Sampling
    }
  }
}

model {
  // Priors
  s_v ~ normal(5, 3); 
  s_w ~ normal(5, 3);
  s_zero_Dist_raw ~ normal(0, 1);
  
  // Observation model
  for (i in 1:(N*K)) {
    Y[i] ~ normal(state[i], s_v);
  }
}
"

I still get the same error although the number of divergences decreased from 6 to 4. Increasing adapt_delta from 0.9 to 0.99999 removes this warning, but is setting such a high value OK?
Thank you for your help!

I think one problem is that s_K does not have a prior. Your second model doesn’t even use s_K but still declares it in the parameters block!

Stan User guide has a chapter on change point models which may be helpful.

I take it your data looks like this:
Figure_1
By eye, it’s pretty close to a Gaussian random walk. If you do find change points, they may be sensitive to modeling assumptions.

By the way, indexing would be easier if you replace some of those vectors with matrices:

  matrix[N,K] Y;
  ..
  matrix<lower= -pi()/2, upper= pi()/2>[N-1,K] state_raw;
  ..
  matrix<lower=0, upper=850>[N,K] state;
  ..
      state[i,k] = state[i-1,k] + s_w * tan(state_raw[i-1,k)]);

By centered parameterization, Corey might have meant removing state_raw and writing the distribution directly on state. The exact distribution of state is quite complicated interpolation of normal and Cauchy but e.g. a Student T is qualitatively similar, so maybe something like:

data {
  int<lower=0> N; // Number of observations per individual
  int<lower=0> K; // Number of individuals
  matrix[N,K] Y;  // Vector containing the observed values
  vector[K] zero_Dist; // Initial position for each individual on the first day
}

parameters {
  real<lower=0, upper=20> s_v; // Observation error standard deviation
  real<lower=0, upper=20> s_w; // State error standard deviation
  vector<lower=0>[K] nu_K; // Individual-level degrees of freedom for state
  matrix<lower=0, upper=850>[N,K] state; // State vector for all individuals
}
model {
  // Priors
  s_v ~ normal(5, 3); 
  s_w ~ normal(5, 3);
  nu_K ~ inv_gamma(3,3);
  s_zero_Dist_raw ~ normal(0, 1);

  // Hierarchical model for state
  for (k in 1:K) {
    state[1,k] ~ normal(zero_Dist[k], 3);
    for (i in 2:N) {
      state[i,k] ~ student_t(nu_K[k], state[i-1,k], s_w);
    }
  }

  // Observation model
  for (i in 1:(N*K)) {
    Y[i] ~ normal(state[i], s_v);
  }
}
1 Like

Thank you so much for the advice! I wasn’t familiar with the matrix approach, and your suggestions greatly improved the readability of my code. While the data I uploaded the other day mostly follows a Gaussian random walk, I do have some data that seems to follow a Cauchy random walk, and I’d like to apply that when necessary.

The student-t method you recommended worked perfectly—thank you again!
The next step is to include month effects and I hope you will help me again!
Thank you very much.