Issues with poisson changepoint model

Adon_Rosen · December 5, 2024, 5:27pm

Hi any and all Stan aficionados – I am having some trouble getting the following Poisson change point Stan model to run, the pairs plot seems to suggest that the slope and intercept variables within a change point are perfectly collinear. I am quite a novice for these issues so I was hoping I could defer to someone with more expertise in how to tackle this problem.

Stan code and simulated code to recreate my issue can be found below

data {
  // Define variables in data
  // Number of observations (an integer)
  int<lower=0> N;
  real x[N];
  // Count outcome
  int<lower=0> y[N];
}

parameters {
  real a1;
  real a2;
  real b1;
  real b2;
  real<lower=0> fixedkp;
}

transformed parameters  {
  //
  real lp[N];

  for (j in 1:N) {
    if (x[j] < fixedkp) 
      lp[j] <- a1 + b1*x[j];
    else  
      lp[j] <- a2 + b2*x[j];  
  }

}

model {
  // Prior part of Bayesian inference
  // Flat prior for mu (no need to specify if non-informative)
  a1 ~ normal(0,3);
  a2 ~ normal(0,3);
  b1 ~ normal(0,3);
  b2 ~ normal(0,3);
  fixedkp ~ uniform(1,30);

  // Likelihood part of Bayesian inference
  y ~ poisson_log(lp);
}

n1 <- 1000
n2 <- 1000
a1 <- 4
b1 <- -.2
a2 <- .5
b2 <- .0044
x1 <- sample(1:11, n1, replace=TRUE)
mu <- exp(a1 + b1 * x1 + rnorm(n1,.2,.3))
y1 <- rpois(n=n1, lambda=mu)
x2 <- sample(12:21, n2, replace=TRUE)
mu <- exp(a2 + b2 * x2 + rnorm(n2, .2,.3))
y2 <- rpois(n=n2, lambda=mu)
## Now make the data frame
out.dat = data.frame(y = c(y1, y2), x = c(x1, x2))
stan.data = list(
    N = nrow(out.dat),
    x = out.dat$x,
    y = out.dat$y
)
plot(out.dat$x, out.dat$y)
stanmonitor <- c("a1","a2","b1","b2","fixedkp")
mod <- stan(file="/home/arosen/Documents/pinesRep/scripts/stan_models/singleBasePoisCP.stan", 
    data = stan.data, cores=4,
    pars = stanmonitor, 
    iter=1000, warmup = 500, control = list(adapt_delta = 0.9, max_treedepth = 15))

js592 · December 5, 2024, 6:58pm

There are a few syntax errors in your provided Stan code. The following complies and runs with cmdstanr 0.8.1:

data {
  // Define variables in data
  // Number of observations (an integer)
  int<lower=0> N;
  vector[N] x;
  // Count outcome
  array[N] int<lower=0> y;
}

parameters {
  real a1;
  real a2;
  real b1;
  real b2;
  real<lower=0,upper=21> fixedkp;
}

transformed parameters  {
  //
  vector[N] lp;

  for (j in 1:N) {
    if (x[j] < fixedkp) 
      lp[j] = a1 + b1*x[j];
    else  
      lp[j] = a2 + b2*x[j];  
  }

}

Second, the normal noise added in constructing the mu term, e.g.:

mu <- exp(a1 + b1 * x1 + rnorm(n1,.2,.3))

does not appear in the Stan model, which will probably cause issues with recovering the true parameter values.

Finally, the model as currently specified is probably not identifiable under the uniform prior on the changepoint location. I was able to get things to work with a normal(10,3) prior on the changepoint – you may want to think about if your problem lends itself to at least a weakly informative prior. Alternatively, you may also wish to investigate if the mean function needs to be discontinuous at the changepoint. If not you might have better efficiency working with splines.

Adon_Rosen · December 6, 2024, 3:33pm

Hi @js592 thanks for looking over my code – I couldn’t exactly find where you changed my Stan code, could you provide some hints?

My concern wasn’t getting with the Stan code compiled, although your tips did improve speed of the model slightly, the bigger issue was how my within change point slope parameter estimation distribution are highly correlated, if you look at the pairs plot of the output there appears to be strong correlation between the A1 & B1 parameters.

Is this the model identification issue you’re alluding to?

js592 · December 6, 2024, 6:39pm

Sure. These lp[j] <- a1 + b1*x[j] lines originally used <- instead of =, I switched the real foo[N] notation to vector[N] foo, and the int foo[N] to array[N] int foo. You might be using a version of Rstan that doesn’t incorporate the updated array syntax, so it might not necessarily be that big of an issue.

With regards to identifiability it was more a comment about how if you are modeling knots or changepoints (as opposed to fixing them a priori, which is probably more common) you probably want at least a weakly informative prior – otherwise, the model can end up being flexible enough that many different choices of knot/changepoint location can fit the data well.

However, I suspect most of the issues you may have with this model arise because of how the discontiunity is formulated. I have not explored this concept in depth myself but I believe HMC will struggle becuase the likelihood itself becomes discontinuous (e.g. Modeling Cutpoint for Noisy Covariate - #5 by icostley or https://statmodeling.stat.columbia.edu/2017/05/19/continuous-hinge-function-bayesian-modeling/
Like I said earlier, you may want to think about the physics of what you are trying to model and decide if the true behavior really has an instantaneous jump or actually has a very rapid smooth transition. The latter I think should be much more ammendable to HMC.

js592 · December 6, 2024, 8:37pm

I wanted to add a bit more to my response that I hope will clarify some of the points I raised above. One way to model a smooth transitions between two data regimes looks like:
\log \mu(x) = w(x)f_1(x)+(1-w(x))f_2(x)
where f_1(x) = a_1+b_1x, f_2(x) = a_2 + b_2x, and w(x) is some function that lives between [0,1]. Because our domain expertise informs us that there should only be one changepoint with a clear transition a natural choice for w(x) might be the inverse logit:
w(x) = \frac{1}{1+exp(-(\alpha + \beta x)}
This has the properties of saturating at 0 and 1 which will allow for clear separation between the two regimes. The steepness (how sharp the transition is) is given by \beta and the midpoint (where w(x) = 0.5) is -\alpha/\beta. The stan code below estimates a model where we a priori fix \beta = -5 and pass it as data (one could also choose to estimate it and impose an informative prior to the same effect).

data {
  // Define variables in data
  // Number of observations (an integer)
  int<lower=0> N;
  vector[N] x;
  // Count outcome
  array[N] int<lower=0> y;
  real beta; //fixed gain of logistic changepoint
}

parameters {
  real a1;
  real a2;
  real b1;
  real b2;
  real<lower=-1*beta, upper =-21*beta>  alpha;

}

transformed parameters  {
  vector[N] w = inv_logit(alpha + beta*x);
}

model {
  // Prior part of Bayesian inference
  a1 ~ normal(0,3);
  a2 ~ normal(0,3);
  b1 ~ normal(0,3);
  b2 ~ normal(0,3);
  //implicitly, there is a uniform(1,21) prior on the changepoint (middle of logistic weighting curve)

  vector[N] lp;

  lp = w.*(a1 + b1*x) + (1-w).*(a2 + b2*x);

  // Likelihood part of Bayesian inference
  y ~ poisson_log(lp);
}

For me, this runs quite fast and has no divergent transitions or treedepth warnings even without tinkering with the adapt_delta setting. The parameter recovery is good:

As is the changepoint estimation (here I show the posterior mean and 500 draws):

One thing to note about that uniform prior, however, is very rarely with the random initalization a chain can get stuck on the boundary value of the changepoint. This I think corresponds to essentially no changepoint and may happen because the data is also reasonably modeled by a single continuous linear function for the Poisson mean. I was able to solve this with a diffuse normal prior on \alpha centered in the middle of the interval of x values (e.g. something like N(55,5)).

For the collinearity, it still appears somewhat:

However, I suspect that this may be more of a result of the assumed data generating process instead of an inherent degeneracy. It may be worth exploring different assumed values of the slopes, intercepts, and changepoint and seeing if the problem persists.

PS: With the simulated values for x only taking integer values you might run into some issues where the data cannot inform where exactly in the interval [11,12] the change occurs.

Adon_Rosen · December 7, 2024, 3:52pm

absolutely gorgeous! I very much appreciate your insight!

Adon_Rosen · January 29, 2025, 2:11pm

Hi @js592

I am trying to extend this model into a 2 chnage point model, I thought I could do this using your same inv_logit formulation but now do it as an ordered inv_logit. So my model weights now become:

  vector[N] w1;
  vector[N] w2;
  vector[N] w3;
  w1 = inv_logit(beta + alpha[1]*x);
  w2 = inv_logit(beta + alpha[2]*x) - inv_logit(beta + alpha[1]*x);
  w3 = 1 - inv_logit(beta + alpha[2]*x);

and my model now becomes:

    lp[j] = w1[j]*(a1+b1*x[j])+
      w2[j]*(a2+b2*x[j])+
      w3[j]*(a3+b3*x[j]);

Where a’s represent intercept terms, b’s represent slope terms. This model is returning nonsense, though, I can’t seem to be able to estimate anything meaningful using this formulation. Would you happen to have ideas for how you would approach this problem?

js592 · January 29, 2025, 5:52pm

I’m a bit busy at the moment so I can’t dive too deep into this. My suggestion would be if you want to model multiple switches between two regimes you could stick with original formula but instead of having the a linear function (alpha + beta*x) switch to something that can “wiggle” up and down a bit more. Maybe a linear spline with one or two knots or even a Gaussian process. But I’d expect a lot more identifiability issues. On the other hand, if you want to model multiple regimes I would look into a hidden markov model or something similar.

Adon_Rosen · January 29, 2025, 6:25pm

got it, thanks @js592

Bob_Carpenter · February 4, 2025, 5:32pm

I discussed how to extend to two change points in the User’s Guide at the end of the discussion of change-point models. In short you need a loop over both change points to marginalize them out and you can assume they are ordered.

Adon_Rosen · February 6, 2025, 2:57pm

Hi Bob,

Thanks for this, I had been trying to follow your example from the tutorials, but my problem isn’t following a time series.

I have cross-sectional data, and want to see where the relationship if & when the relationship between my two variables changes.

So while in your worked example you use the following code:

  vector[T] lp;
  lp = rep_vector(log_unif, T);
  for (s in 1:T) {
    for (t in 1:T) {
      lp[s] = lp[s] + poisson_lpmf(D[t] | t < s ? e : l);
    }
  }

I need to translate this so that T no longer indexes time, but the magnitude of my X predictor. I would have multiple counts observed at every level of T and one D for every participant in my study.

Does this clarify my issue?

Thanks,
Adon

Bob_Carpenter · February 11, 2025, 10:40pm

Not enough for me to understand the model. Can you write it down in math?

I’m confused by the original model. Unless beta is negative, real<lower=-1*beta, upper =-21*beta> will have no support. Also -1 * beta = -beta. So I’d suggest an <upper=0> constraint on the beta data point.

But the real issue is that I don’t see the change point in what you called a change-point model. Change point models involve two sets of parameters, one to use on one side of the change point and one to use on the other. It looks instead like you’re building some kind of mixture estimator of lp that’s averaging the two components rather than changing from one to the other at some point. (In Stan, we tend to use lp for log probabilities, so I wouldn’t use that here for a variable).

Also, you can write this one one line:

vector[N] lp = w.*(a1 + b1*x) + (1-w).*(a2 + b2*x);

Adon_Rosen · February 12, 2025, 4:37pm

Right, apologies for any confusion.

Let y_1, ...,y_n & x_1, ...,x_n be the paired observational data where y are the DV & x are the IV. The goal of the analyses is to identify a location parameter r (1 \leq r \leq n) that separates the IV into two distinct groups: y_1,...,y_r & y_{r+1},...,y_n with distinct data generating process. For this analysis, the data generating process is theorized to be from a generalized linear model with a Poisson link. So for x values less than r the model would be: log(E(Y|x_{\lt r})) = \alpha_1+\beta_1 x
& for x values greater than or equal to r would follow: log(E(Y|x_{\geq r})) = \alpha_2+\beta_2 x

More succinctly:

\begin{cases}log(E(Y|x)) = \alpha_1+\beta_1 x & \text{if} & x <r \\ log(E(Y|x)) = \alpha_2+\beta_2 x & \text{if} & x \geq r\end{cases}

So I need to estimate 5 parameters: intercept 1, intercept 2, slope 1, slope 2, and the location of the changepoint

Bob_Carpenter · February 12, 2025, 5:58pm

That’s very similar to the example in the User’s Guide. You can fit this the same way as the example:

data {
  vector[N] x;
  array[N] int<lower=0> y;

parameters { 
  vector[2] alpha;
  vector[2] beta;
  real<lower=min(y), upper=max(y)> r;  // implicit uniform distribution
}

model {
  alpha ~ ...;   beta ~ ...;  // priors
  vector[N] mu;
  for (n in 1:N) {
    mu[n] = x[n] < r 
        ? alpha[1] + beta[1] * x[n]
        : alpha[2] + beta[2] * x[n];
    }
  y ~ poisson_log(mu);

Adon_Rosen · February 12, 2025, 6:40pm

Thanks, @Bob_Carpenter – I was hoping my problem was trivial!

I got this model up and running, again many thanks!

Adon_Rosen · February 13, 2025, 1:21am

@Bob_Carpenter – Would I still need to be concerned about iterating across all possible change points as you do in the example, such as seen in these nested for loops?

  for (s in 1:T) {
    for (t in 1:T) {
      lp[s] = lp[s] + poisson_lpmf(D[t] | t < s ? e : l);
    }
  }

Bob_Carpenter · February 14, 2025, 9:09pm

Not if you treat the change point as continuous. It might not be well identified if it’s just cutting between discrete points, but it should run and do the right thing for the change point model even if the change point itself isn’t well identified. This might show up as low ESS for the change point variable.

Topic		Replies	Views
Issues with fitting this poisson-regression Modeling fitting-issues , poisson	5	506	December 18, 2020
Hierarchical Bayesian Poisson regression model Modeling fitting-issues	7	2405	August 17, 2019
Divergent transitions in state space modelling with Poisson data Modeling	2	934	June 1, 2017
Random Poisson variable is a negative but must be nonnegative! Modeling fitting-issues , poisson	6	400	February 13, 2024
Inhomogeneous Poisson Process with single rate change - Divergent Transitions Modeling	1	605	January 15, 2019

Issues with poisson changepoint model

Related topics