Hierarchical model with a poisson distributed parameter

Hi all, I am trying to build an hierarchical model in which the observed outcome dA, detection of new records of group A, is binomial distributed: dA \sim Binomial(yearly\_detections, P).

P is probability of success which is equivalent to the proportion of unrecorded individuals of group A to all the unrecorded individuals (group A + group B): P = \frac{unrecorded_A}{unrecorded_A + unrecorded_B}.

Where unrecorded is simply the difference between the total number of individuals in a group and the cumulative sum of number of yearly records.

Total number of individuals from group B is fixed, but the number of individuals from group A increases with t according to \lambda = exp(\alpha + \beta * t) and has a Poisson distribution y \sim Poisson(\lambda) .

Now, if I ignore the Poisson distribution part and just use \lambda to calculate unrecorded_A, the model “works” but I get this message a lot:

Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 1 Exception: binomial_lpmf: Probability parameter[150] is -0.00198042, but must be in the interval [0, 1].
Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

But given that y is a Poisson distributed variable around the mean, I think I can avoid getting negative P such as here, but I can’t figure out how, since I can’t define an integer parameter y as far as I understand.

Here is the code:

data{
  int <lower = 1> N;  // number of rows in the data
  int <lower = 1> B_total; // assumed number of ind in group B
  array[N] int <lower = 0> dA; // observed number of yearly records A
  array[N] int <lower = 0> dB; // observed number of yearly records B
  vector[N] t;
}

transformed data {
  array[N] int <lower = 0> unrecorded_B;
  array[N] int <lower = 0> yearly_detections;
  array[N] int <lower = 0> recorded_A;  // cumulative recorded A
  array[N] int <lower = 0> recorded_B; // cumulative recorded B
  
  recorded_A = cumulative_sum(dA);
  recorded_B = cumulative_sum(dB);
  
  for (i in 1:N){
    unrecorded_B[i] = B_total - recorded_B[i];
    yearly_detections[i] = dA[i] + dB[i]; 
  }
  
}

parameters {
  real alpha;
  real beta;
}

transformed parameters {
  vector[N] rate = alpha + beta * t;
}

model{
  
  vector[N] y;
  vector[N] unrecorded_A;
  vector[N] P;
  
  //priors
  alpha ~ normal(0, 0.01);
  beta  ~ normal(0, 0.001);
  
  y = exp(rate);
  for (i in 1:N){
    unrecorded_A[i] = cumulative_sum(y)[i] - recorded_A[i];
    P[i] = unrecorded_A[i]/(unrecorded_A[i] + unrecorded_B[i]);
  }
  dA ~ binomial(yearly_detections, P);
}


Ideally I would like to replace y = exp(rate); with the more accurate y ~ poisson_log(rate), but I have no idea how to get it to work.
Is it at all possible?

I keep going back to what I believe are the relevant chapters in the documentation, but I may be missing something entirely. Any guidance will be much appreciated.

If you have data x \sim Binomial(N, p) and N \sim Poisson(\lambda), then you can marginalize over N by using x \sim Poisson(\lambda p)

2 Likes

Yeah, but in my situation the p parameter is derived from the poisson:

Y \sim Poisson(\lambda)

P_t = \frac{\sum_{t = 1}^{n}Y_t - recordedA_t}{\sum_{t = 1}^{n}Y_t - recordedA_t + unrecordedB_t}

x \sim Binom(n, P)

So, if I understand correctly - and thanks again for directing me in that (hopefully right) direction - what I need to do is marginalize over Y (find probability of x given \lambda) somehow define a joint probability P(n,P|Y) and sum over all possible Y \in \{0,1,2,...,\infty\}?

\sum_{y = 0}^{\infty}P(n,P|Y)*P(Y)

This is sadly where I’m clueless as to how to proceed.

1 Like

I think before jumping into calculations, it is important to understand the data-generating process a little better. Here’s what I have understood from your model, in my own notation:

\begin{align} Y_t &\sim \operatorname{Poisson}(\lambda_t),\\ W &= \sum_{t=1}^n Y_t,\\ \theta &= \frac{W -r_A}{W - r_A + r_B},\\ X & \sim \operatorname{Binomial}(n, \theta). \end{align}

Now, one could possibly try and workout the pmf of \theta and thus the marginal of X, but first a few clarifications are in order: just specifying \lambda_t = \exp(\alpha + \beta t) is not enough to guarantee that Y_{t+1} \geq Y_t; but maybe that (strict cumsum) is not what you want and you really want the Y_t to be independent, just have increasing mean. In that case, W is Poisson with rate

\mu = \sum_{t=1}^n \exp(\alpha + \beta t) = \frac{\exp(\alpha + \beta)[\exp(\beta n)-1]}{\exp(\beta)-1},

and the pmf of \theta might be tractable.

1 Like

Your understanding is correct.

If I understand your clarification correctly, the difference between

\begin{align} Y_t &\sim \operatorname{Poisson}(\lambda_t),\\ W &= \sum_{t=1}^n Y_t,\\ \end{align}

and:

\begin{align} \mu &= \sum_{t = 1}^{n}exp(\alpha + \beta t)\\ W_t &\sim \operatorname{Poisson}(\mu),\\ \end{align}

, is the independence of W_t when \mu is being cumulatively summed. Correct?

That does sound a lot more like something I would want - Thanks!

Now, how do I deal with the pmf?

I wonder if there’s a way to somehow redefine \theta so I can sample from a distribution limited to [0,1], or maybe go about and using 1- \theta instead…

We’re inching towards understanding here, but not quite there yet. If the Y_t are independent from one another, then their sum will be a Poisson distribution whose rate is the sum of their rates. In your second display, it does not make much sense to write W_t, because \mu does not depend on t; instead it sums over all of the t=1, 2, \ldots, n and hence W should not be indexed by t.
So the two statements you list are equivalent under independence.

So the question remains: are the numbers of recorded group A indiividuals strictly increasing over the years? Or are they expected to rise each year but might not? Modelling the Y_t independently will make things easier mathematically, but I’m afraid it might violate basic premisses of the data-generating process.

With all that said,

I don’t know yet. It might be a simple calculation to get the pmf of \theta, p_T(\theta), it might not. If we can at least write it down, then we can safely apply truncation to get p_X(x) = \sum_{w \in \Theta} \binom{n}{x}w^x (1-w)^{n-x} p_T(w).

Sorry for the late reply.

Now that I had some rest, I realized that (using your notations throughout) W is strictly defined as W_{t+1} \geq W_t.

Yearly number of individuals of group A (Y_t) are independent with rate \lambda_t, but are not necessarily increasing.

Yearly number of recorded individuals of group A (r_A) is strictly defined as r_{A_{t+1}} \geq r_{A_{t}} since it is the accumulation of the non-negative X.

to summarize:
Y_t are independent: Y_{t+1} can be greater, equal or lower than Y_{t}.
W is the accumulation of Y_t: W_{t+1} \geq W_t.
X are independent, X_{t+1} can be greater, equal or lower than X_{t}.
r_A is the accumulation of X: r_{A_{t+1}} \geq r_{A_{t}}.

All are non-negatives (can be zero tough).

I hope that makes things clearer.

If it is of any use, I’ve written the code in JAGS, where sampling a parameter from a poisson distribution is possible:

model {

  # priors
      b0 ~  dnorm(0, 1/0.1);         
      b1 ~  dnorm(0.01, 1/0.001);         
  
  # likelihood

    
    for (i in 1:N){
        Y[i] ~ dpois(lambda[i]);
        lambda[i] = exp(b0 +  b1 * i); 
      }

    cumulative_y[1] = Y[1];
    for(i in 2:N) {
        cumulative_y[i] <- cumulative_y[i-1] + Y[i];
    }

    for (i in 1:N){
        proportion[i] = cumulative_y[i]/(cumulative_y[i] + (unknown_A[i]));
        dA[i] ~ dbin(proportion[i], yearly_detections[i]);
        }
      }

yearly_detections[i] is equal to dA[i] + dB[i].

The model seems to be running smoothly, and converges to reasonable values.

Also, I realized that

\begin{align} \theta = \frac{unrecorded_a}{(unrecorded_a + unrecorded_b)} \end{align}

is the mean of a beta distribution, so basically I think I can ignore the poisson sampling and instead sample from this beta:

\begin{align} shape1= \sum_{t = 1}^{n}(\exp(\alpha + \beta t) - dA \\ shape2 = unrecorded_b\\ \theta \sim \operatorname{Beta}(shape1 , shape2)\\ dA \sim \operatorname{Binomial}(yearly\_recordings, \theta) \end{align}

So I rewrote the model as:

data{
  int <lower = 1> N;  // number of rows in the data
  int <lower = 1> B_total; // assumed number of ind in group B
  array[N] int <lower = 0> dA; // observed number of yearly records A
  array[N] int <lower = 0> dB; // observed number of yearly records B
  vector[N] t; // time
  real alpha_mu; //prior for alpha
  real beta_mu; // prior for betas
}

transformed data {
  array[N] int <lower = 0> unrecorded_B;
  array[N] int <lower = 0> yearly_detections;
  array[N] int <lower = 0> recorded_A;  // cumulative recorded A
  array[N] int <lower = 0> recorded_B; // cumulative recorded B
  
  recorded_A = cumulative_sum(dA);
  recorded_B = cumulative_sum(dB);
  
  for (i in 1:N){
    unrecorded_B[i] = B_total - recorded_B[i];
    yearly_detections[i] = dA[i] + dB[i]; 
  }
  
}

parameters {
  real alpha;
  real beta;
}

transformed parameters {
  vector<lower = 0>[N] unrecorded_A;
  vector[N] rate = exp(alpha + beta * t);
  
  for (i in 1:N){
    unrecorded_A[i] = cumulative_sum(rate)[i] - recorded_A[i];
  }
  
}

model{
  
  vector[N] theta;
  
  //theta  ~ beta(unrecorded_A, unrecorded_B);
  //dA ~ binomial(yearly_detections, theta);
 // I just realized  beta_binomial is a thing:
   dA ~ beta_binomial(yearly_detections, unrecorded_A, unrecorded_B);
  
  

  //priors
  alpha  ~ normal(alpha_mu, 1);
  beta  ~ normal(beta_mu , 0.01);
}

But I keep getting these errors:

Chain 1 Exception: beta_lpdf: Random variable[1] is nan, but must be in the interval [0, 1] Chain 4 Exception: stan_model_namespace::log_prob: unrecorded_A[8] is -0.89013, but must be greater than or equal to 0.000000

Here is an example data:

data_stan <- list(
B_total = 800, 
N = 150,
dA = c(0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 2, 0, 0, 0, 0, 0, 1, 2, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 2, 1, 1, 4, 0, 1, 1, 0, 1, 1, 2, 1, 0, 0, 4, 2, 2, 2, 2, 1, 1, 1, 6, 0, 4, 0, 0, 1, 3, 1, 1, 4, 0, 1, 1, 3, 2, 2, 1, 0, 2, 1, 2, 0, 2, 3, 1, 2, 1, 1, 2, 0, 1, 2, 2, 4, 3, 3, 7, 1, 5, 4, 1, 2, 4, 3, 2, 6, 4, 3, 4, 3, 9, 9, 8, 2, 2, 1, 0, 6, 8, 1, 4, 3, 9, 1, 6, 4, 4, 1, 7, 2, 3, 13, 3, 2, 6, 2, 2, 3, 6, 3, 6, 2, 4, 1, 10, 5, 1, 2, 7, 6, 4),
dB = c(6, 3, 6, 2, 5, 4, 5, 3, 1, 5, 1, 3, 3, 2, 5, 10, 7, 4, 7, 3, 2, 2, 1, 7, 3, 2, 4, 5, 4, 5, 8, 4, 5, 5, 7, 2, 7, 1, 5, 2, 8, 4, 2, 0, 4, 9, 2, 7, 6, 6, 3, 4, 4, 2, 2, 9, 2, 3, 1, 6, 4, 4, 3, 3, 7, 1, 3, 0, 5, 4, 1, 2, 2, 3, 3, 5, 3, 3, 2, 3, 3, 3, 1, 2, 3, 4, 1, 2, 3, 2, 1, 0, 3, 0, 3, 5, 1, 1, 1, 0, 1, 2, 2, 1, 3, 3, 2, 1, 1, 2, 1, 3, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 2, 0, 1, 0, 0, 2, 1, 2, 1, 1, 1, 2, 1, 1, 0, 2, 0, 0, 2, 0, 0, 0, 1, 1, 0, 0, 1, 0),
t = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150),
alpha_mu = -0.7490987, 
beta_mu = 0.01702397)

It looks like your priors are putting unrecorded_A in a bad position. Time point 8 is the first time that recorded_A is above zero, so that’s where the issue is.

Based on the the quantiles of your priors for alpha and beta, there’s only a very small range where cumsum(rate)[8] - 1 is between 0 and 1 at time point 8 (the purple part of this graph).

1 Like

That’s a nice direction, but I replicated the graph you showed and it seems like you didn’t take the exponent of \alpha + \beta t for your calculations?

Also, I want unrecorded_A to be >= 0.

the following code produces this graph for me:

library(tidyverse)

a <- tibble(quantile_a = seq(0.01,0.99,0.01),
            a = qnorm(p = quantile_a, mean = -0.7490987, sd =  1))

b <- tibble(quantile_b = seq(0.01,0.99,0.01),
                 b = qnorm(p = quantile_b, mean = 0.01702397, sd =  0.01))


a |> 
  mutate(b = list(b)) |> 
  unnest(b) |> 
  rowwise() |> 
  mutate(result = cumsum(exp(a + b * 1:8))[8] - 1,
         res = case_when(
           result > 0 & result < 1 ~ "(0,1)",
           result <= 0 ~ "<=0",
           result >= 1 ~ ">=1"
         ),
         res = factor(res, levels = c("(0,1)", "<=0", ">=1"))
         ) |>
  ggplot()+
  aes(x = quantile_a, y = quantile_b, fill = res) +
  geom_tile(width = 0.0115, height = 0.0115)+
  scale_fill_viridis_d() +
  theme_classic() +
  labs(fill = "Unrecorded_A", x = "Quantile of \U0251", y = "Quantile of \U03b2")+
  coord_equal()

Anyway, now that I’m checking the prior distribution, both in the above method (thanks!) and by actually plotting the distribution, showed that the for \alpha, the mean of the prior distribution was off, and the sd was too big for both \alpha and \beta. So I narrowed them down and done some fixes.

Live update:

The model now runs and often produces the \alpha and \beta I used to simulate the data - but it keeps throwing those messages… I guess they can’t be avoided unless I make my priors really narrow?

I’m coming late to this discussion. I’m referring back to the JAGS code posted earlier, specifically this fragment:

    cumulative_y[1] = Y[1];
    for(i in 2:N) {
        cumulative_y[i] <- cumulative_y[i-1] + Y[i];
    }

    for (i in 1:N){
        proportion[i] = cumulative_y[i]/(cumulative_y[i] + (unknown_A[i]));
        dA[i] ~ dbin(proportion[i], yearly_detections[i]);
        }
      }

I take it that Y[i] is an integer, meaning that cumulative_y[i] is also an integer. proportion[i], of course, is not an integer, since it’s in [0,1]. I haven’t used JAGS in quite a while, but if I recall correctly the N in dbin(p, N) is not restricted to integer values. That means that the JAGS code may estimate unknown_A[i] to be non-integer.

If I’ve got that right, it should be possible to match the JAGS result closely by defining the unknown_A[i] analog in Stan to be real rather than int.

1 Like