Double inflated models in Stan

Say you want to adapt a typical zero inflated poisson model such as:

model {
  for (n in 1:N) {
    if (y[n] == 0) {
      target += log_sum_exp(bernoulli_lpmf(1 | theta),
                            bernoulli_lpmf(0 | theta)
                            + poisson_lpmf(y[n] | lambda));
     } else {
        target += bernoulli_lpmf(0 | theta)
                  + poisson_lpmf(y[n] | lambda);
    }
  }
}

to instead have two inflated values, say the 0 and and the 7.

Is the adaptation as simple as the below code?:

model {
  for (n in 1:N) {
   if (y[n] == 0) {
      target += log_sum_exp(bernoulli_lpmf(1 | theta0),
                            bernoulli_lpmf(0 | theta0) +
                            bernoulli_lpmf(0 | theta7) +
                            poisson_lpmf(y[n] | lambda));
    } else if (y[n] == 7) {
      target += log_sum_exp(bernoulli_lpmf(1 | theta7),
                            bernoulli_lpmf(0 | theta7) +
                            bernoulli_lpmf(0 | theta0) +
                            poisson_lpmf(y[n] | lambda));
    } else {
        target += bernoulli_lpmf(0 | theta0) +
                  bernoulli_lpmf(0 | theta7) +
                  poisson_lpmf(y[n] | lambda);
  }
}

I’ve tried running the model and it seems to converge ok but when I recreate the model in R and run, I seem to be getting an even higher frequency of the inflated values than occurs in the model dataset.

I’m also not quite sure how to write the generated quantities code to create the predictions in Stan. Something like:

generated quantities {
  int y_test[N];
  int <lower=0, upper=1> zero;
  int <lower=0, upper=1> seven;
  for(i in 1:N) {
    y_test[i] = poisson_rng(lambda[i])
    zero = bernoulli_rng(theta0[i]);
    seven = bernoulli_rng(theta7[i]);
    if (zero == 1) {
       y_test[i] = 0
    } else if (seven == 1) {
       y_test[i] = 7
    }
  }
}

It’s the last part I’m really not sure about, especially when both zero and seven equal 1.

Any input would be much appreciated.

theta0 <- 0.1
theta7 <- 0.2
lambda <- 1
(theta0 + (1-theta0) * (1-theta7) * dpois(0, lambda) 
+theta7 + (1-theta0) * (1-theta7) * dpois(7, lambda) + sum((1-theta0) * (1-theta7) * dpois(c(1:6,8:1000), lambda) ))

The sum is greater than 1, thus the math needs rework.

1 Like

Thanks Andre, breaking it down like that makes sense. I can’t figure out though the source of the overlap causing probability density to be >1, any ideas?

That is a nice brain teaser!

You have to deflate the inflation with the/all other.

(theta0*(1-theta7) + (1-theta0) * (1-theta7) * dpois(0, lambda)
+theta7*(1-theta0) + (1-theta0) * (1-theta7) * dpois(7, lambda) +
sum((1-theta0) * (1-theta7) * dpois(c(1:6,8:1000), lambda) ))

Sums to 1.

Which creates the need to rethink about the _rng function.

1 Like

Inflation models are mixture models. With only two components the mixture probabilities are given by a single value which is conveniently modeled with the bernoulli_lpmf. For more than two components you need a simplex. You can either try to construct a simplex conditionally using a stick breaking process or you can just use the simplex type in Stan.

parameters {
  simplex[3] theta;
}
model {
  // don't forget priors

  for (n in 1:N) {
   if (y[n] == 0) {
      target += log_sum_exp(log(theta[1]), log(theta[2]) + poisson_lpmf(y[n] | lambda));
    } else if (y[n] == 7) {
      target += log_sum_exp(log(theta[2]) + poisson_lpmf(y[n] | lambda), log(theta[3]));
    } else {
        target += log(theta[2]) + poisson_lpmf(y[n] | lambda);
  }
}
1 Like

Thanks Michael, I’m now struggling to figure out how I put predictors on each probability of the simplex. For the zero inflation model, I did something like:

model {
  //linear predictor on zero inflation
  for (n in 1:N) {
      theta_linpred[n] = beta0 + beta1 * x1[n];
  }
  for (n in 1:N) {
    if (y[n] == 0) {
      target += log_sum_exp(bernoulli_logit_lpmf(1 | theta_linpred),
                            bernoulli_logit_lpmf(0 | theta_linpred)
                            + poisson_lpmf(y[n] | lambda));
     } else {
        target += bernoulli_logit_lpmf(0 | theta_linpred)
                  + poisson_lpmf(y[n] | lambda);
    }
  }
}

But I can’t yet figure out how to do a similar thing for each part of the simplex variable. The main error I currently get is that I "Cannot assign to variable outside of declaration block", when I try to assign some kind of linear predictor to each element of the simplex variable.

Thanks Andre, yeah that makes sense!

You have to declare variables inside the declaration block for assignment. So change the model to:

model {
  vector[N] theta_linpred = beta0 + beta1 * x1;

and remove a possible declaration of theta_linpred elsewhere. The loop has been removed. This is more clearly and faster.

You can’t add predictors to a simplex as easily when you have multiple components. Instead you have to utilize the stick-breaking construction of the simplex and model the
conditional probabilities. For example let’s say that we have three classes, A, B, and C. We’d model

logit_prob_A = alpha_A  + beta_A * x;
logit_prob_B_given_A = alpha_B  + beta_B * x;
logit_prob_C_given_A_B = alpha_C  + beta_C * x;

then

prob_A = inv_logit(logit_prob_A);
prob_B = inv_logit(logit_prob_B_given_A) * prob_A;
prob_C = inv_logit(logit_prob_C_given_A_B) * prob_B;

For three components there are six different decompositions – which is appropriate depends on which conditional probabilities are most interpretable in your application.

2 Likes

Thanks Michael, but why is prob_B, for instance, conditional on prob_A happening? I was thinking that prob_B was conditional on prob_A not happening? Or in my example given that 0, hasn’t happened what’s the probability of 7 happening?

I’ve tried to translate your solution into Stan code, I’m just still really struggling. Are you proposing anything like what’s written below?

model {
    ///priors...

    //predictors on probabilities of each model component
    vector[N] logit_prob_0 = alpha_0  + beta_0 * x;
    vector[N] logit_prob_7_given_0 = alpha_7  + beta_7 * x;
    vector[N] logit_prob_NA_given_7_0 = alpha_NA  + beta_NA * x;
    //determining conditional probablities of using each model
    vector[N] prob_0 = inv_logit(logit_prob_0);
    vector[N] prob_7 = inv_logit(logit_prob_7_given_0) .* prob_0;
    vector[N] prob_NA = inv_logit(logit_prob_NA_given_7_0) .* prob_7;
    //mixture model
    for (i in 1:N) {
      if (y[i] == 0) {
        target += log_sum_exp(bernoulli_lpmf(1 | prob_theta0[i]),
                              bernoulli_lpmf(0 | prob_theta0[i]) +                                  
                            poisson_lpmf(y[i] | lambda));
      } else if (y[i]  == 7) {
        target += log_sum_exp(bernoulli_lpmf(1 | prob_theta7[i]),
                              bernoulli_lpmf(0 | prob_theta7[i]) +
                            poisson_lpmf(y[i] | lambda));
      } else {
        target += log_sum_exp(bernoulli_lpmf(1 | prob_thetaNA[i]),
                              bernoulli_lpmf(0 | prob_thetaNA[i]) +
                            poisson_lpmf(y[i] | lambda));
      }
    }
}

I was a little sloppy with the distinction between densities and probabilities in my example. As pure probabilities you’d have

\begin{align*} \mathbb{P}[A = 1] &= p_{A} \\ \mathbb{P}[B = 1] &= \mathbb{P}[B = 1 \mid A = 0] \mathbb{P}[ A = 0] \\ &= p_{B = 1\mid A = 0} (1 - p_{A}) \\ \mathbb{P}[C = 1] &= \mathbb{P}[B = C \mid A = 0, B = 0] \mathbb{P}[A = 0, B = 0] \\ &= \mathbb{P}[B = C \mid A = 0, B = 0] \mathbb{P}[B = 0 \mid A = 0] \mathbb{P}[A = 0] \\ &= p_{C = 1 \mid A = 0, B = 0} (1 - p_{B = 1\mid A = 0}) (1 - p_{A}) \end{align*}

As I mentioned above, however, there are six different ways of decomposing a three component distribution into conditionals and hence six difference choices for three necessary parameters that could depend on predictors through a logistic function – and I chose this one arbitrarily just to demonstrate. Which is most appropriate depends on the interpretation of the three components in your mixture and how the interpretation of the predictors’ influence within that interpretation.

1 Like

Thanks Michael, I understand your algebra and I think the logic behind the stick-breaking process but I’m struggling to translate it into a Stan model. I’ve had a look for examples online and the manual but I’m unable to find anything that uses predictors.

This is roughly what I’ve been trying but the posterior predictive checks seem wrong:

model {
    //predictors on probabilities of each model component
    vector[N] logit_prob_A = alpha_A  + beta_A * x;
    vector[N] logit_prob_B = alpha_B  + beta_B * x;
    vector[N] logit_prob_C = alpha_C  + beta_C * x;
    //determining conditional probablities of using each model
    vector[N] prob_A = inv_logit(logit_prob_A);
    vector[N] prob_B = inv_logit(logit_prob_B) .* (1-prob_A);
    vector[N] prob_C = inv_logit(logit_prob_C) .* (1-prob_B);
    //mixture model
    for (i in 1:N) {
      if (y[i] == A) {
        target += log_sum_exp(bernoulli_lpmf(1 | prob_A[i]),
                              bernoulli_lpmf(0 | prob_A[i]) +                                  
                            poisson_lpmf(y[i] | lambda));
      } else if (y[i] == B) {
        target += log_sum_exp(bernoulli_lpmf(1 | prob_B[i]),
                              bernoulli_lpmf(0 | prob_B[i]) +
                            poisson_lpmf(y[i] | lambda));
      } else {
        target += log_sum_exp(bernoulli_lpmf(1 | prob_C[i]),
                              bernoulli_lpmf(0 | prob_C[i]) +
                            poisson_lpmf(y[i] | lambda));
      }
    }
}

Any further guidance on what I might be doing wrong would be much appreciated.

Not

but

and these should really be implemented using the log_logit and log1m_logit functions.

But more importantly you have no priors here so poor posterior predictive checks are not necessarily surprising even if the observation model is correct.

1 Like
lambda <- 1
logit_prob_A <- rnorm(1)
logit_prob_B <- rnorm(1)
logit_prob_C <- rnorm(1)

prob_A = inv_logit(logit_prob_A);
prob_B = inv_logit(logit_prob_B) * (1-prob_A)
prob_C = inv_logit(logit_prob_C) * (1 - inv_logit(logit_prob_B)) * (1-prob_A)

 (  prob_A + (1-prob_A) * dpois(0, lambda)
  + prob_B + (1-prob_B) * dpois(7, lambda)
  + sum(prob_C + (1-prob_C) * dpois(c(1:6,8:1000), lambda)))

Sums to everything else than 1 (almost) every time.
There is still a flaw at least in prob_C, since prob_C covers the remaining elements.
You may put another distribution on the inflation part of prob_C, which must sum to 1 on
\mathbb{N} \smallsetminus \left\{7 \right\}.

If prob_A, prob_B, prob_C form a simplex then the prob_\left\{A,B,C\right\} should sum up to 1.
I’m wondering why not prob_C = 1 - prob_A - prob_B?

1 Like

Ok, thanks Michael, I’ve updated my code to include those functions you mentioned but it seems to be giving me way too much inflation on the inflated figures (about 5x). I’ve also put what I consider reasonable priors on the mixture predictors that the model moves away from in order to generate this surplus inflation. Here is a distillation of my Stan code:

model {
    //predictors on probabilities of each model component
    vector[N] logit_prob_A = beta_A * x; //includes intercept
    vector[N] logit_prob_B = beta_B * x; //includes intercept
    vector[N] logit_prob_C = beta_C * x; //includes intercept

    //determining conditional probablities of using each model
    vector[N] prob_A = log_inv_logit(logit_prob_A);
    vector[N] prob_B = log_inv_logit(logit_prob_B) + log1m_inv_logit(logit_prob_A);
    vector[N] prob_C = log_inv_logit(logit_prob_C) + log1m_inv_logit(logit_prob_B) + log1m_inv_logit(logit_prob_A);
    
    //likelihood
    for (i in 1:N) {
      if (y[i] == 0) {
        target += log_sum_exp(bernoulli_lpmf(1 | prob_A[i]),
                              bernoulli_lpmf(0 | prob_A[i]) +                                  
                            poisson_lpmf(y[i] | lambda[i]));
      } else if (y[i] == 7) {
        target += log_sum_exp(bernoulli_lpmf(1 | prob_B[i]),
                              bernoulli_lpmf(0 | prob_B[i]) +
                            poisson_lpmf(y[i] | lambda[i]));
      } else {
        target += bernoulli_lpmf(1 | prob_C[i]) +
                            poisson_lpmf(y[i] | lambda[i]));
      }
    }
}

The model seems to converge ok and the coefficients are not wildly unreasonable. However when I extract the coefficients in R and try to recreate the model, I’m getting significant over inflation. Maybe this is where my error is. Again I have tried to distill the code to capture the relevant area, specific to the mixture component of the model:

#Model Inputs
StanModCoef <- rstan::extract(mixmod, pars = c("beta_A", "beta_B", "beta_C", "beta_lambda"))
Num_Beta <- length(StanModCoef[["beta_A"]][1,]) #how many predictors (same across all beta A,B,C,lambda)
SimsPerPt <- 50 #sims per data points
Sims <- matrix(NA, nrow = nrow(Data), ncol = SimsPerPt) #store sim data
s <- sample(1:iter, SimsPerPt) #determine index of samples to draw

#Sim predictions
for (i in 1:nrow(Data)) {
    #Determine log odds for each mod component for each sample
    logit_prob_A <- rowSums(sapply(1:Num_Beta, function(x) StanModCoef$beta_A[s,x] * A_modmatrix[i,x]))
    logit_prob_B <- rowSums(sapply(1:Num_Beta, function(x) StanModCoef$beta_B[s,x] * B_modmatrix[i,x]))
    logit_prob_C <- rowSums(sapply(1:Num_Beta, function(x) StanModCoef$beta_C[s,x] * C_modmatrix[i,x]))
    
    #Determine probability of using each model component 
    prob_A <- inv.logit(logit_prob_A)
    prob_B <- inv.logit(logit_prob_B) * (1 - prob_A)
    prob_C <- inv.logit(logit_prob_C) * (1 - inv.logit(logit_prob_B)) * (1 - prob_A)
    
    #Simulate which model component used for each sample based on pre-determined probabilities
    probs <- cbind(prob_A, prob_B, prob_C)
    mod_comp <- apply(probs, 1, function(y) sample(x = c("A", "B", "C"), prob = y, size = 1))

    #Simulate poisson part of model
    lambda_log <- rowSums(sapply(1:NumBeta, function(x) StanModCoef$beta_lambda[s,x] * Lambda_modmatrix[i,x]))
    simpois <- mapply(rpois, n = 1, lambda = exp(lambda_log))

    #Combine Models - subbing in inflated figures for components A and B
    mod_comp[mod_comp == "A"] <- 0
    mod_comp[mod_comp == "B"] <- 7
    mod_comp[mod_comp == "C"] <- simpois[mod_comp == "C"]

    Sims[i,] <- as.numeric(mod_comp)
}

Any comments on where I might be going wrong would obviously be much appreciated. I feel like I’m nearly there, but maybe not.

Ok I think I’ve figured out the problem. I wasn’t exponentiating the log probabilities before inserting them into the bernouilli_lpmf function.

If I instead do the following, the simulations seem more appropriate:

//determining conditional probablities of using each model
vector[N] prob_A = exp(log_inv_logit(logit_prob_A));
vector[N] prob_B = exp(log_inv_logit(logit_prob_B) + log1m_inv_logit(logit_prob_A));
vector[N] prob_C = exp(log_inv_logit(logit_prob_C) + log1m_inv_logit(logit_prob_B) + log1m_inv_logit(logit_prob_A));

I feel like that would negate the benefit of using the log_logit and log1m_logit functions however, please let me know if there’s a superior alternative.

prob_A + prob_B + prob_C must equal one right? In your example it doesn’t but I think that’s because you can’t take random uncorrelated draws for the logit_prob of each component. The Stan model determines these logit_probs with the implied constraint that once they are inv.logit-ified and conditioned they sum to 1.

Regarding the last equation, even if prob_A + prob_B + prob_C does equal one, I don’t think that will ever sum to 1. In fact it will always sum to something > 1 since you are combining all the probabilities distributions of the mixture model.

I believe this is what needs to sum to 1:

(prob_A + prob_B + prob_C * sum(dpois(c(0:1000), lambda)))

And I appreciate that looks obvious when you already assume prob_A + prob_B + prob_C = 1 but hopefully it illustrates what I think is going on in the mixture model. If you wanted to attach a distribution to every component of the mixture model you could perhaps do something like this:

(prob_A * dbinom(1, prob = 1, size = 1) + prob_B * dbinom(1, prob = 1, size = 1) + prob_C * sum(dpois(c(0:1000), lambda)))

where the binomial distributions just occur with 100% probability on the inflated figures (0 and 7).

This is just my sense of it and I could easily have it wrong, interested in your thoughts.