How to fit a simple poisson process

I’m trying to use stan to fit a simple poisson process model, and, maybe, to simulate data as well. Starting with the fitting.

In base R, I’d simply write something like:

 Y<- (cumsum(rpois(102,lambda=5)))
  x<-cumsum(c(0,rep(1,100)))
  T <- length(Y)
  plot(stepfun(x,Y),xlim = c(0,100), do.points = F)

This would return a plot like, with the event at time t on the x axis, and the number of counts Y at each t:

It’s not clear to me how to write the cumulative_sum in stan in such a way that we could either simulate or estimate our lambda parameter. A simple attempt to fit might look like this, provided the data above (this fails):

data {
    int<lower=0> T; // number of events
    int Y[T]; // the observed counts at event time T
}
parameters {
  real<lower=-10,upper=10> lambda;
}
model {
  // Priors
  lambda ~ normal(5, 0.5); // our lambda estimate
  
  for (t in 1:T) {
    Y ~ cumulative_sum(poisson(lambda));
  }
}

This doesn’t work, however. The cumulative_sum and poisson variables don’t play nice together. I’ve tried setting it up with target += and poisson_lpmf instead of poisson, but wasn’t able to suss out the correct syntax.

Any advice appreciated. Similar to this question which wasn’t followed-up by the original poster.

The poisson function does not generate Poisson distributed random variables. Rather it is sort of an alias for the log-PMF of the Poisson distribution. Also, you need to restrict lambda to be positive and probably should not restrict lambda to be less than 10. You could do something like

data {
  int<lower = 0> T;
  int<lower = 0> Y[T];
}
parameters {
  real<lower = 0> lambda;
}
model {
  vector[T] lambda_vec;
  lambda_vec[1] = lambda;
  for (t in 2:T) lambda_vec[t] = lambda_vec[t - 1] + lambda;
  target += normal_lpdf(lambda | 5, 0.5);
  target += poisson_lpmf(Y | lambda_vec);
}
2 Likes

Thanks, @bgoodri, that worked really well. Is also really helpful in understanding the lpmf functions a bit better, need to spend some time on that.

After I posted, I also tried fitting the model this way, which seemed to return nearly the same estimates. Not sure if it has any side effects by constituting lambda how it does.

    data {
        int<lower=0> T; // number of events
        int Y[T]; // the observed counts at event time T
        int ages; // number of ages
    }
    parameters {
      real <lower=0>lambda; // will estimate lambda
    }
    model {
      // Priors
      lambda ~ normal(5, 0.5); // our lambda prior
      
      for (t in 1:T) {
        Y ~ poisson(lambda * t);
      }
    }

That way is fine, but slower because it has to call poisson_lpmf T times instead of once, and hence has to do T memory (de)allocations instead of one.

1 Like

Although you could instead do

for (t in 1:T) lambda_vec[t] = t * lambda;

which would reduce the amount of autodiff I think.

1 Like

@bgoodri

Experimenting more I realized neither model is really doing a very good job of estimating lambda.

For the data simulated with a lambda of 3, with your stan code, I’m getting:

                mean se_mean   sd     2.5%     25%     50%     75%   97.5%   n_eff Rhat
lambda          2.86    0.00  0.02    2.82    2.85    2.86    2.88    2.91   934    1

For mine, I’m getting a sd of 0.00, same 50%. This is with a cumulative sum of 285 at t=100. It’s not unusual for rpois draws to be between 280 and 320 cumulatively by t=100, but obviously the CI is fairly small here.

I realized my model doesn’t make sense because it doesn’t assume the sums are correlated but instead just takes the poisson for t*lambda at any given t. That’s not really a poisson process so much as taking the poisson for lambda * t.

In my case, the model is badly overfit. With yours, it seem slightly overfit. I don’t understand well enough what the poisson_lpfm function is doing but it seems like what I really need to do is to somehow indicate more strongly to stan that the sums are cumulative poisson draws.

Something like:

    data {
  int<lower = 0> T;
  int<lower = 0> Y[T];
}
parameters {
  real<lower = 0> lambda;
}
model {
  target += normal_lpdf(lambda | 5, 0.5);
  Y[1] ~ poisson(lambda);
  for (t in 2:T) {
    Y[t] ~ poisson(lambda) + Y[t - 1]; // some way of indicating every t is an additional poisson added to the previous draws
  }
}

For larger initial lambda, the confidence intervals are still very small, about +/- 0.03 at two standard deviations, even when the estimates are off by 10% or so.

For example, if I don’t sum the rpois function… and just simulate 100 poisson draws, then use stan to estimate, the confidence intervals look fine:

              mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff Rhat
lambda        2.76    0.01 0.17    2.44    2.65    2.76    2.87    3.11   870 1.01
1 Like

I ended up here, which seems to do the trick. I’d like to just have a way of indicating cum_sum… but the diff also works. Thank you, poisson memoryless property…

Data generation:

  Y<- cumsum(rpois(100,lambda=3))
  T<- length(Y)

Stan code

data {
  int<lower=0> T; // number of events
  int Y[T]; // the observed counts at event time T
}
parameters {
  real<lower=0> lambda;
}
model {
 // diff container
  int Yt[T];
  // Prior
  lambda ~ gamma(5, 1);
  
  Yt[1] = Y[1];
  for (t in 2:T) Yt[t] = Y[t] - Y[t - 1]; // in the absence of cumsum calculate the diff
  target += poisson_lpmf(Yt | lambda);
}
1 Like