# Zero-truncated poisson model very slow

I’m trying to fit a zero-truncated poisson model with about 200,000 data points, and the model fitting is really slow, on the order of about an hour. This was surprising to me because when I fit a lognormal model with similar amounts of data, it took less than a minute.

Is there anything I can do to speed this process up? I can’t share my data, but I will share Python code to simulate the problem.

from scipy.stats import poisson
import pystan

l = 1.5
poisson_draws = poisson.rvs(l, size=150000)
poisson_draws = poisson_draws[poisson_draws > 0]

stan_data = stan_data = {
"n": len(poisson_draws),
"y": poisson_draws,
}
fit = model.sampling(data=stan_data, iter=4000, warmup=1000)

data {
int<lower=1> n;
int<lower=1> y[n];
}

parameters {
real<lower=0> lambda;
}

model {
lambda ~ gamma(20, 20);

for (a in 1:n) {
y[a] ~ poisson(lambda) T[1,];  // truncated poisson
}
}

generated quantities {
real zero_truncated_mean;
zero_truncated_mean_a = (lambda * exp(lambda)) / (exp(lambda) - 1);
}


1 Like

This might be due to numerical difficulties, depending on the magnitude of the counts. One thing you can do is “compress” the counts. By this I mean you’d count how many times each value occurs in the sample, using, for instance table() in R or the equivalent in Python.
Your program would look something like this:

data {
int<lower=1> n;
int<lower=1> y_value[n];
int<lower=1> y_freq[n];
}

parameters {
real<lower=0> lambda;
}

model {
lambda ~ gamma(20, 20);

for (a in 1:n) {
target += y_freq[a] * ( poisson_lpmf(y_value[a] | lambda) - log1m_exp(-lambda));  // truncated poisson
}
}


Notice that n now will be a much smaller number than the “original” n.

4 Likes

This worked great, thank you! How does the model part work here? I’m new to Stan and I’m not sure what target is or how poisson_lpmf is different from poisson. Sorry in advance for the basic question!

1 Like

No worries. About why you multiply by y_freq this is just because the likelihood appears y_freq times in the expression, like so:

L(Y_i \mid \lambda) = p(y_i)^{f_i}

where f_i is the frequency and p(y) := \operatorname{Pr}(Y=y \mid \lambda).

Hey Max, this is great! Quick question, would you mind clarifying the - log1m_exp(-lambda) for me? Does that implement the zero truncation or something else?