I’m trying to fit a zero-truncated poisson model with about 200,000 data points, and the model fitting is really slow, on the order of about an hour. This was surprising to me because when I fit a lognormal model with similar amounts of data, it took less than a minute.

Is there anything I can do to speed this process up? I can’t share my data, but I will share Python code to simulate the problem.

```
from scipy.stats import poisson
import pystan
l = 1.5
poisson_draws = poisson.rvs(l, size=150000)
poisson_draws = poisson_draws[poisson_draws > 0]
stan_data = stan_data = {
"n": len(poisson_draws),
"y": poisson_draws,
}
fit = model.sampling(data=stan_data, iter=4000, warmup=1000)
```

```
data {
int<lower=1> n;
int<lower=1> y[n];
}
parameters {
real<lower=0> lambda;
}
model {
lambda ~ gamma(20, 20);
for (a in 1:n) {
y[a] ~ poisson(lambda) T[1,]; // truncated poisson
}
}
generated quantities {
real zero_truncated_mean;
zero_truncated_mean_a = (lambda * exp(lambda)) / (exp(lambda) - 1);
}
```

1 Like

This might be due to numerical difficulties, depending on the magnitude of the counts. One thing you can do is “compress” the counts. By this I mean you’d count how many times each value occurs in the sample, using, for instance `table()`

in R or the equivalent in Python.

Your program would look something like this:

```
data {
int<lower=1> n;
int<lower=1> y_value[n];
int<lower=1> y_freq[n];
}
parameters {
real<lower=0> lambda;
}
model {
lambda ~ gamma(20, 20);
for (a in 1:n) {
target += y_freq[a] * ( poisson_lpmf(y_value[a] | lambda) - log1m_exp(-lambda)); // truncated poisson
}
}
```

Notice that `n`

now will be a much smaller number than the “original” `n`

.

4 Likes

This worked great, thank you! How does the model part work here? I’m new to Stan and I’m not sure what `target`

is or how `poisson_lpmf`

is different from poisson. Sorry in advance for the basic question!

1 Like

No worries. About why you multiply by `y_freq`

this is just because the likelihood appears `y_freq`

times in the expression, like so:

L(Y_i \mid \lambda) = p(y_i)^{f_i}

where f_i is the frequency and p(y) := \operatorname{Pr}(Y=y \mid \lambda).

Take a look at this StackOverFlow answer by @Bob_Carpenter and see if it answers your questions.

Hey Max, this is great! Quick question, would you mind clarifying the `- log1m_exp(-lambda)`

for me? Does that implement the zero truncation or something else?

Thanks!

That should account for the truncation at zero, which necessitates dividing by 1-\operatorname{Pr}(Y\leq 0) = 1 - e^{-\lambda}. Please do check my maths.

2 Likes

Neat, that’s a nice approach. Thanks!