Latent Poisson Regression - Possible in Stan?

pwsheppard · June 20, 2024, 12:33pm

Does anyone know how to estimate a latent Poisson regression in Stan?

I am trying to estimate a latent Poisson regression on count-data with under-reporting. I have coded an estimator in R following Winkelmann (1996), but the estimator doesn’t seem very efficient.

I’d like to run the model in Stan to check, but it doesn’t seem possible because Stan won’t generate unobserved integer parameters.

Here’s what I think would be the correct code if it were feasible with HMC:

data {
  int<lower=0> N;
  int<lower=0> K;
  int<lower=0> y[N];
  matrix[N, K] x;
}

parameters {
  //vector[N] ystar;  // Type error
  int<lower=0> ystar[N];  // Parameters cannot be integers
  vector[N] p;
  real alpha;
  vector[K] beta;
}

model {
  ystar ~ poisson_log_glm(x, alpha, beta);
  p ~ beta(y + 1, ystar - y + 1);
  y ~ binomial(ystar, p);
}

Does anyone have a trick to get around this limitation?

prototaxites · June 20, 2024, 2:32pm

This looks similar to what in ecology is often called an N-mixture model - that might help you find relevant implementations and more details. The trick you need is that you have to marginalise (sum up) over all possible values of ystar (from y to infinity; in practicality, you will need to assign an upper bound above which you believe the probability of ystar is negligible), weighted by the probability of that value of ystar under the Poisson likelihood.

The likelihood for a single data point looks like the following, with an upper bound of 10000 for demonstration (this will depend on your data):

int ub = 10000;
vector[ub - y + 1] lp;

for(i in 1:(ub - y + 1)) {
  lp[i] = binomial_lpmf(y | i, p) + poisson_log_glm_lpmf(i | x, alpha, beta);
}
target += log_sum_exp(lp);

You can find an example implementation of an N-mixture model like this here: Calculating abundance from marginalized N-mixture model

edit: for more information on marginalisation, the Stan users guide is very good: Latent Discrete Parameters, as is @mbjoseph’s post about occupancy models, which has a great beginner’s guide to the mathematics: Maxwell B. Joseph: A step-by-step guide to marginalizing over discrete parameters for ecologists using Stan

sonicking · June 20, 2024, 8:16pm

You need to find the unconditional distribution for y by integrating out ystar. Fortunately, it is actually a known result such that y is Possion distributed with parameters p x q, where q = exp(alpha + Xbeta).

I have seen this result in this paper eqn 4, which is also about under-reported Poisson counts.

Therefore, the following code should work:

data {
  int<lower=0> N;
  int<lower=0> K;
  int<lower=0> y[N];
  matrix[N, K] x;
}

parameters {
  real<lower=0, upper=1> p;
  real alpha;
  vector[K] beta;
}

transformed parameters{
vector[N] q = exp(alpha + X*beta);  
}

model {
  
  y ~ poisson(p * q); 
  
  p ~ beta(1, 1);  //I am using a flat prior for p, but ideally you have a better one
   
  alpha ~ std_normal();

  beta ~ std_normal();
}

I am writing this code without using any real or simulated data to test-run. There may be syntax error. You are encouraged to share a sample of your data to allow me to test-run. But I think this code (ensuring proper syntax) should be what you need.

EDIT: I just realized in your original post, you declare p as a vector. In that case, this code should work:

data {
  int<lower=0> N;
  int<lower=0> K;
  int<lower=0> y[N];
  matrix[N, K] x;
}

parameters {
  vector<lower=0, upper=1>[N] p;
  real alpha;
  vector[K] beta;
  real<lower=0> a;
  real<lower=0> b;
}

transformed parameters{
  vector[N] q = exp(alpha + X*beta);  
}

model {
  
  y ~ poisson(p .* q); 
  
  p ~ beta(a, b);  
  
  alpha ~ std_normal();

  beta ~ std_normal();

  a ~ exp(1);

  b ~ exp(1);
  
}

However, in my experience dealing with this type of mixture distributions, they are inherently difficult to estimate in Stan. You certainly want to use some informative priors. When dealing with beta distribution, you should do some reparameterization as suggested here.

pwsheppard · June 20, 2024, 9:05pm

@sonicking, okay that makes sense. I can just drop ystar from the model and estimate directly from the product of p and q.

I’m interested in analyzing ystar though. I think I should be able to use generated quantities to infer ystar from the posteriors for y and p. Does that sound right?

Thanks for your help. I just started picking up Stan.

pwsheppard · June 20, 2024, 9:07pm

@prototaxites, I figured doing something like that would get prohibitive as N gets large.

Have you found an upper bound for N in practice?

Thank you for the pointers on N-mixture models. I’ll take a look.

sonicking · June 21, 2024, 7:08pm

You can…but I am not exactly sure why you need to simulate ystar when you already know its distribution, which is Poisson(q). If anything, it is important to remember that q, in your setup, is going to be different by people as long as their X values are different. Therefore, the expectation for ystar of any single person is just q of that person. Certainly that since alpha and beta are random variables, so will q. But you are going to get all the values of q from the transformed parameters block.

jsocolar · June 21, 2024, 7:26pm

As @sonicking says, a binomial distribution y ~ binomial(n, theta) whose number of trials is Poisson distributed n ~ Poisson(lambda) marginalizes to a Poisson distribution y ~ Poisson(theta * lambda).

However, the likelihood, whether written in two steps as y ~ binomial(n, theta); n ~ Poisson(lambda) or in one step as y ~ Poisson(theta * lambda) contains absolutely no information about theta or lambda individually, only about their product. You will not get inference about theta or lambda except via the prior, and therefore you will not get inference about n either except via the prior (and the knowledge that n must be at least as large as y). The intuition here is that if you observe y = 10 you have nothing other than the prior to distinguish between, say, lambda = 10; theta = 1 versus lambda = 100; theta = .1

@pwsheppard you original code contains the data-dependent prior on theta (your p) p ~ beta(y + 1, ystar - y + 1);. I don’t understand this choice of prior, but there may be domains where this is reasonable for some reason that I don’t understand, and gives meaningful inference about theta and n.

The well-known degeneracy of this likelihood, i.e. the impossibility of identifying theta and lambda, is the raison d’etre for the N-mixture model, which solves the problem by collecting multiple datapoints Y (a vector) that are all binomial-poisson mixtures that share a specific realization of the Poisson model. That is int n ~ poisson(lambda); array[S] int Y ~ binomial(n, theta);, where all S realizations of y share a single value of n.

The intuition here is that, for example the observation Y = [10, 10, 10, 10, 10] is overwhelmingly more likely if lambda = 10, theta = 1 than if lambda = 100, theta = .1.

Although the elements of Y are all marginally Poisson distributed, their joint distribution is obviously not independent, and so we cannot simply write the N-mixture likelihood as Y ~ poisson(theta * lambda). There are several techniques for writing this likelihood. A closed form exists that avoids the need to truncate the sum in a marginalization, but the closed form is based on heinous functions that themselves can be computed only via numerical approximation and would be difficult to implement in Stan. Alternatively, you can just marginalize as @prototaxites suggests, choosing the upper bound of the marginalization to be large enough so that the probability mass that gets truncated is negligible. @maxbiostat has worked on a robust technique to adaptively choose the upper truncation bound to ensure that the result is accurate to within a desired tolerance. Any of these marginalization techniques can be helped by the fact that the terms in the sum can be efficiently computed via recursion.

Hope some of this helps!

nhuurre · June 21, 2024, 8:34pm

That’s the distribution of ystar conditional on q alone. Presumably @pwsheppard is interested in the posterior distribution conditional on all data, and including the specific y refines the distribution. In particular, ystar cannot be less than y whereas Poisson allows for a zero.

The model

ystar ~ poisson(q);
y ~ binomial(ystar, p);

can be marginalized in Stan thusly:

model {
  y ~ poisson(p*q);
}
generated quantities {
  int ystar = y + poisson_rng((1-p)*q);
}

But as @jsocolar points out, this model cannot really estimate p so these predictions are going to be all over the place.

That distribution is identical to the posterior conditional on y and ystar assuming a flat prior, so most likely it’s best-effort guess for the distribution of p. However, using analytic posteriors like that is incorrect. In this case, conditioning on the “downstream” variable y is not appropriate, so the intended model seems to be p ~ beta(1,1).

sonicking · June 22, 2024, 2:14am

Thank you for the explanation and I have learned a lot.

My understanding is that with cross-sectional heterogeneity distributions for both p (the Binomal parameter) and the rate (the Poisson parameter), it is possible to identify them. In the original setup of this thread, p is beta distributed (checks one box) and the rate is heterogeneous because of X (checks the other box). So I thought that everything could be identified. But I was not confident about this.

jsocolar · June 22, 2024, 2:30am

In general this can be true, though correct inference can be very sensitive to small model misspecifications. In this particular case, though, even if the model is perfectly specified, the non-identifiability persists because we can add a constant to the intercept in the regression for the poisson rate (therefore multiplying all of the rates by a constant) and divide all of the binomial probabilities p by the same constant. So this likelihood remains degenerate. If it were the case that, for example, we were predicting p as a logit-linear combination of covariates, then we could not divide all the predicted probabilities p by a constant because doing so would result in something that’s not logit-linear. So if we really trust the specification (i.e. the log-linearity of lambda and the logit-linearity of p) then this is no longer completely degenerate. My advice in general is not to trust the resulting inference absent strong a priori grounds to think that the model specification is extremely close to reality.

Topic		Replies	Views
Mixture Bayesian Poisson Regression Model Modeling	45	3931	April 17, 2019
Mixture of generalized poisson distribution Modeling mixture , poisson	18	1982	December 26, 2022
Hierarchical model with a poisson distributed parameter Modeling cmdstanr	12	1112	December 7, 2022
Specifying a rather complicated hierarchical model in Stan Modeling specification	4	641	July 31, 2022
Estimating parameters from linear combination of 2 indepenedent poisson random variables Modeling bioinformatics	20	2489	February 7, 2020

Latent Poisson Regression - Possible in Stan?

Related topics