Divergent transitions in Poisson Process model

Gustavo_Lacerda · June 10, 2021, 10:48pm

Hello,

When I run this model, all my transitions are divergent. I’m trying to figure out what a “non-centered” model would be here.

This model is a Poisson Process, in which we have multiple Poisson counts with rate \int_l^r f(x) = F(r) – F(l), for many different intervals [l,r]. These intervals are also unknown: we only know that they partition the unit interval and that their lengths can be assumed to follow a Dirichlet(1, …, 1) prior.

I’m using a positive_ordered to model the cumulative rate function F. Would it be better to instead use real<lower=0> f[] to represent the rate directly? Is one of these parametrizations more likely to cause problems with divergent transitions?

functions {
  int real_to_int(real x) {
    int i = 0;
    while(i < x) {
      i += 1;
    }
    return i;
  }
  // Should we vectorize this and have it done all at once, i.e. real[] x?
  real interpolate(vector F, real x, int T) {
    // i and i+1
    int i;
    real res;
    real flo = floor(x*T); // possible values: {0 ,..., T-1}
    real lambda = x*T - flo;
    i = real_to_int(flo) + 1;
    res = (1 - lambda) * F[i] + lambda * F[i+1];
    return(res);
  }
}

data {
  int<lower=0> N;      // number of X obs, yielding N+1 buckets
  int<lower=0> T;      // number of line segments (granularity of Gaussian Process)
  int<lower=0> Y[N+1];    // Number of Y samples in bucket k
}

transformed data{
  int N_Y = sum(Y);
  real empty_r[0];
  int empty_i[0];
}

parameters {
  simplex[N+1] intervals; //dirichlet
  positive_ordered[T+1] F; // Cumulative density ratio.
  real<lower=0> sigma; //how quickly the log-density ratio can vary
}

transformed parameters {
  real<lower=0> dF[T]; // density ratio
  real ddF[T-1];       // 1st deriv of density ratio
  for (t in 1:T)
    dF[t] = F[t+1] - F[t];
  for (t in 1:(T-1))
    ddF[t] = dF[t+1] - dF[t];
}

// ToDo: change this to a non-centered implementation?
model {
  print("BEFORE: target = ", target());
  {
  real l = 0; // left end of interval
  real r;     // right end of interval
  real rate;  // Poisson rate
  real F_l = 0;    // F(0) = 0
  real F_r;
  print("rate = ", rate);
  intervals ~ dirichlet(rep_vector(1.0, N+1));
  for (k in 1:(N+1)) { // for each bucket
    r = l + intervals[k];
    F_r = interpolate(F, r, T); //interpolate
    rate = F_r - F_l; // the Poisson rate is the integral
    Y[k] ~ poisson(N_Y * rate);
    l = r;
    F_l = F_r;
  }
  /////////////////////
  // Smoothing prior
  for (t in 1:(T-1)) { // for each section of the function approximation
    dF[t+1] ~ lognormal(dF[t], sigma);
  }
  // Set F(0) = 0.1.  Doing this here avoid having to use 'min' above.
  F[1] ~ normal(0.1, 1e-9);
  }
}

nhuurre · June 11, 2021, 10:22am

There’s no difference; under the hood positive_ordered is implemented essentially as an array of lower-bounded reals.

That F[1] ~ normal(0.1, 1e-9) is rather extreme and could be causing numerical problems. Maybe try 1e-3 instead.

You don’t have a prior on sigma so I don’t think the “smoothing prior” does much smoothing. And since you don’t know how large the intervals are I don’t think F is really identifiable from the data.

aakhmetz · June 11, 2021, 11:15am

just to make sure, lognormal(F[t-1], sigma) does not mean that you have log-mean with F[t-1] rather than it is the meanlog, the first parameter of the distribution. So I don’t think that that line does what you implied

ps: i would do smoothing in log-scale and then exponentiate rather than implementing lognormal distribution. you can then do a non-centered parametization

Gustavo_Lacerda · June 11, 2021, 4:00pm

The current smoothness prior is: dF[t+1] ~ lognormal(dF[t], sigma);

Do you mean:in transformed_parameters define log_dF[t] := log(dF[t]) and for the prior do log_dF[t+1] ~ normal(log_dF[t], sigma); ?

aakhmetz · June 11, 2021, 7:04pm

yes and to write log_dF[t+1] = log_dF[t] + sigma * log_dF_raw[t+1], where log_dF_raw ~ std_normal() (this would be a non-centered parametization)

Gustavo_Lacerda · June 12, 2021, 4:59pm

Looks like Stan doesn’t allow transformed parameters on the LHS of assignments.

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Cannot assign to variable outside of declaration block; left-hand-side variable origin=transformed parameter
Illegal statement beginning with non-void expression parsed as
  log_dF[t + 1]

nhuurre · June 12, 2021, 5:41pm

The important part of the message is outside of declaration block. The assignments to transformed parameters must happen inside the transformed parameters block.

Gustavo_Lacerda · June 12, 2021, 5:55pm

Ok, done!

Note that I get this warning:

DIAGNOSTIC(S) FROM PARSER:
Info:
Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.
If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform.
Left-hand-side of sampling statement:
    log_dF_increment ~ normal(...)

It sounds like the autodiff doesn’t compute the Jacobian for me…

Gustavo_Lacerda · June 12, 2021, 6:34pm

My transitions are still divergent. Here’s my updated code:

functions {
  int real_to_int(real x) {
    int i = 0;
    while(i < x) {
      i += 1;
    }
    return i;
  }
  // Should we vectorize this and have it done all at once, i.e. real[] x?
  real interpolate(vector F, real x, int T) {
    // i and i+1
    int i;
    real res;
    real flo = floor(x*T); // possible values: {0 ,..., T-1}
    real lambda = x*T - flo;
    i = real_to_int(flo) + 1;
    res = (1 - lambda) * F[i] + lambda * F[i+1];
    print("res = ", res);
    return(res);
  }
}

data {
  int<lower=0> N;      // number of X obs, yielding N+1 buckets
  int<lower=0> T;      // number of line segments (granularity of Gaussian Process)
  int<lower=0> Y[N+1];    // Number of Y samples in bucket k
}

transformed data{
  int N_Y = sum(Y);
}

parameters {
  simplex[N+1] intervals; //dirichlet
  positive_ordered[T+1] F; // Cumulative density ratio.
  real<lower=0> sigma; //how quickly the log-density ratio can vary
}

transformed parameters {
  real<lower=0> dF[T]; // density ratio
  real log_dF[T]; // log of density ratio
  real log_dF_increment[T-1]; // 1st deriv of log density ratio
  real ddF[T-1];       // 1st deriv of density ratio
  for (t in 1:T) {
    dF[t] = F[t+1] - F[t];
    log_dF[t] = log(dF[t]);
  }
  for (t in 1:(T-1)) {
    ddF[t] = dF[t+1] - dF[t];
    log_dF_increment[t] = log_dF[t+1] - log_dF[t];
  }
}

// ToDo: change this to a non-centered implementation?

model {
  print("BEFORE: target = ", target());
  {
  real l = 0; // left end of interval
  real r;     // right end of interval
  real rate;  // Poisson rate
  real F_l = 0;    // F(0) = 0
  real F_r;
  print("rate = ", rate);
  intervals ~ dirichlet(rep_vector(1.0, N+1));
  for (k in 1:(N+1)) { // for each bucket
    r = l + intervals[k];
    F_r = interpolate(F, r, T); //interpolate
    rate = F_r - F_l; // the Poisson rate is the integral
    print("rate = ", rate);
    Y[k] ~ poisson(N_Y * rate);
    l = r;
    F_l = F_r;
  }
  /////////////////////
  // Smoothness prior
  // ToDo: Define the adjacent differences of logs, in 'transformed_parameters'
  for (t in 1:(T-1)) { // for each section of the function approximation
    log_dF_increment[t] ~ normal(0, sigma);
  }
  sigma ~ exponential(1);
  // Set F(0) = 0.1.  Doing this here avoid having to use 'min' above.
  F[1] ~ normal(1, 1e-3);
  }
}

Gustavo_Lacerda · June 15, 2021, 4:13am

Thanks so much for your help!

Gustavo_Lacerda · June 15, 2021, 4:13am

Thanks so much, nhuurre

Gustavo_Lacerda · June 15, 2021, 6:37pm

Apparently the most suspicious part of my model is the piecewise-linear approximation of F. This might be particularly bad since the model refers to a 2nd derivative of it (namely: log_dF_increment)

I think it would be better to use piecewise quadratic, or piecewise cubic, or splines. The tricky part is enforcing a monotonicity constraint, so that my integral works efficiently.

aakhmetz · June 16, 2021, 4:02am

why not b-spline? You may check Splines In Stan

Gustavo_Lacerda · June 21, 2021, 5:44pm

The B-splines code for Stan isn’t general enough for my problem – it assumes that the set of points X for which the spline is calculated is in the data. In my problem they are a parameter, so the spline needs to be rebuilt at every iteration… I hope this won’t be hopelessly slow.

There is an obvious inefficiency in the recursive implementation whenever the order k>2 … A bottom-up algorithm would be better … or memoization.

Topic		Replies	Views
Inhomogeneous Poisson Process with single rate change - Divergent Transitions Modeling	1	602	January 15, 2019
Divergences in a non-centered computational model Modeling fitting-issues	21	1320	October 30, 2019
Non-centred parametrization of covariance matrix causes many divergent transitions and slows down sampling Modeling specification , performance , cmdstanr , reparametrization	7	661	September 7, 2021
Choosing correct non-centered parametrization Modeling techniques , specification	9	774	October 2, 2020
Divergent Transitions Modeling	9	396	July 28, 2021

Divergent transitions in Poisson Process model

Related topics