Zero-Truncated Poisson Lognormal Parameter Estimation

MigG · February 1, 2022, 1:13am

Hello Stan users. I started using Stan yesterday, and I still can’t figure out how to fit a Poisson Lognormal Zero-Truncated model. I will show the approach I have used. First, I have simulated some data using R:

set.seed(213211)
N <- 1000
lambdas <- rlnorm(N,-1,3)
site.data <- rpois(N,lambdas) # Generates 1000 samples.

The \lambda_i are drawn from a lognormal distribution with parameters mean of -1 and standard deviation of 3 on the log scale. I expect Stan’s estimations to be close to these values. I fitted the model using no truncation first and that estimated the parameters pretty well(\bar{\mu}\approx-1 and \bar{\sigma}\approx 2.8). I then removed all zeroes from the sample and proceeded to add what I think should be the truncation term to the Stan code. Here is how I remove the zeroes from my sample in R:

site.data <- site.data[site.data > 0]

Here is my Stan code:

data{
        int<lower = 1> N;
        int<lower = 0> y[N]; // Change the lower bound accordingly.
}

parameters{
        real mu;
        real<lower = 0> sigma;
        vector<lower = 0>[N] lambda;
}

model{
        sigma ~ cauchy(0, 5);
        mu ~ cauchy(0, 3);
        lambda ~ lognormal(mu,sigma);
        
        for (n in 1:N)
        {
                // Remove comments to test the code.
                // No truncation.
                // target += poisson_lpmf(y[n] | lambda[n]);
                
                // I have tried using the following expressions to truncate (one at a time)...
                // y[n] ~ poisson(lambda[n]) T[1,];
                // target += poisson_lpmf(y[n] | lambda[n]) - log1m_exp(poisson_lpmf(0 | lambda[n]));
                // target += poisson_lpmf(y[n] | lambda[n]) - log1m_exp(-lambda[n]);
                // target += poisson_lpmf(y[n] | lambda[n]) - poisson_lccdf(0 | lambda[n]);
        }
}

Using the code above and the zero-truncated data, I get \bar{\mu} \approx 1.24 and \bar{\sigma} \approx 2.08. This is far from the parameters I used to generate the data. Where am I going wrong? Thanks a lot for the help!

Bob_Carpenter · February 1, 2022, 8:52pm

The model’s OK for what you’ve said you’re trying to do. I’d recommend the T[1, ] version as simplest.

The first problem you have is that you’re not generating data according to your model. You can’t just generate the lambda[n], then the y[n], then throw away the y[n] == 0 as that produces bias by removing small lambdas. The simplest way to generate data according to your model is with what’s known as rejection sampling:

y <- rep(0, N)
for (n in 1:N)
  while (y[n] == 0) y[n] <- rpois(1, lambda[n])

The second problem you have is that rlnorm(N, -1, 3) is very broad, especially if you’re truncating zeros.

Let’s look at 20 values:

> lambda <- rlnorm(20, -1, 3)
> lambda
 [1]   0.03012604   0.12846612 395.49138336  24.69518887   3.21629447
 [6]   0.16190519   7.13806034 124.01442275   0.06989056  51.45593682
[11]   4.88033527   0.71835095   0.20476892 695.88611462   0.22365563
[16]   0.05643252   3.19557522   2.36739879   0.13947740   0.27208591

Those range in scale from roughly 0.01 to 1000. The really tiny ones are going to generate the value 1 with high probability. And thus all their posterior estimates are going to be the same because they have the same data and same prior.

Finally, you’re going to run into troubles with visual inspection of calibration on this because the variance of the lognormal plus poisson is super high. But if you up the size of N, lower the variance, and shift the lognormal a bit, you’ll be able to recover the parameters with which you simulated the data with relatively high precision. Here’s the scripts and Stan code.

N <- 500
mu <- 0
sigma <- 1
lambda <- rlnorm(N, mu, sigma)
y <- rep(0, N);
for (n in 1:N) {
  while (y[n] == 0) {
      y[n] <- rpois(1, lambda[n])
  }      
}
data = list(N = N, y = y)

library('cmdstanr')
mod <- cmdstan_model('zero-trunc-pois.stan')
fit <- mod$sample(data = data, parallel_chains=4)

data {
  int<lower=0> N;
  int<lower=1> y[N];
}
parameters {
  real mu;
  real<lower = 0> sigma;
  vector<lower = 0>[N] lambda;
}
model {
  sigma ~ cauchy(0, 5);
  mu ~ cauchy(0, 3);
  lambda ~ lognormal(mu, sigma);
        
  for (n in 1:N) {
    y[n] ~ poisson(lambda[n]) T[1, ];
  }
}

This produces

> print(fit, c("mu", "sigma"))
 variable  mean median   sd  mad    q5  q95 rhat ess_bulk ess_tail
    mu    -0.02  -0.02 0.06 0.06 -0.12 0.07 1.00      462      889
    sigma  1.01   1.01 0.05 0.05  0.93 1.10 1.00      398      904

MigG · February 1, 2022, 11:05pm

Thank you very much for this!

Topic		Replies	Views
Zero-Truncated Poisson Lognormal Distribution Modeling rstan , specification	4	1445	January 31, 2022
Fitting a truncated Pareto in stan Modeling	1	436	June 7, 2022
Zero-truncated poisson model very slow Modeling	6	970	June 30, 2020
Poisson Likelihood with normal priors Modeling poisson	7	1126	December 13, 2020
Lognormal model with only summary statistics Modeling specification	6	570	June 18, 2018

Zero-Truncated Poisson Lognormal Parameter Estimation

Related topics