Mixture Bayesian Poisson Regression Model

Hello researchers,

I am trying to run two-component mixture Poisson regression model. The mean of first cluster greater than the second. I have tried the Stan code as follow; but I am not sure how the independent variables are integrated within the model (I could not find any related Stan material over it) and where the truncated (T[0,]) option is put. Finally, I want to put mu[2] = mu[1] + theta; theta will be any postie number; but I don’t know how it can be integrated to my code? Therefore, would you please give me some hint/correction or related materials over my code?

data {

  int<lower = 0> N;           // number of data points
  int <lower=0> y[N];         // observations
  int<lower=1> K;             // number of mixture components
  int<lower=0> p;            // number of Covariates 

    // Covariates 

  int <lower=0> x1[N];  
  real <lower=0> x2[N];
  real<lower=0> x3[N];
  real<lower=0> x4[N];
  real <lower=0> x5[N];
  
  
}

transformed data {

real <lower=0> Amci;
Amci=mean(mci);

}

parameters {
  real<lower=0> theta;
  real beta[p];
  
  simplex[2] pi;        // mixing proportions
  
  
}


transformed parameters{


real lp[N];
ordered mu[2];        // locations of mixture components


for(i in 1:N){

  lp[i] = beta[1] + beta[2]*x1[i] + beta[3]*x2[i] + beta[4]*x3[i] + beta[5]*x4[i];

  for (k in 1:K){

  mu[1] = exp(lp);
  mu[2] = mu[1]+theta;
  
    }
  
  }

}

model {
  
   // Priors
  beta[1] ~ normal(0,1.0E-3);
  beta[2] ~ normal(0,1.0E-3);
  beta[3] ~ normal(0,1.0E-3);
  beta[4] ~ normal(0,1.0E-3);
  beta[5] ~ normal(0,1.0E-3);
  theta ~ normal(0,1.0E-6)T[0,];

  pi ~ beta(Amci,Amci);
 
  
    // Likelihood

  for (i in 1:N) {

      target += log_sum_exp(log(pi) + poisson_lpmf(y | mu[1]),
                log1m(pi) + poisson_lpmf(y | mu[2]));

}

Hi,
inspecting your code, it looks roughly OK to me. I am not sure what is the problem you are describing. Some minor notes:

  • You can use triple bacticks: ``` to mark the beginning and end of your code in the post to have it formatted in monospace for better readability
  • Your priors for beta are very narrow, consider standardizing x so that you can use normal(0, 1);
  • The prior for theta is also very narrow so that mu[1] will be almost equal to mu[2] is that intentional?
  • It could make more sense to have mu[2] = ex[(lp + theta), i.e. that theta would be on the same (log) scale as beta.
  • For improved efficiency, you could code x and beta as matrix[5,N] x; and vector[5] beta; - then you’d be able to use matrix multiplication and have vector[N] lp; and lp = beta * x; (or maybe matrix[N,5] x; I always mess up which coordinate is which :-) )
  • The truncation in theta ~ normal(0,1.0E-6)T[0,]; is unnecessary. When you give bounds in parameter declarations, truncation of distributions is performed automatically by Stan.
  • Instead of simplex[2] pi you probably want to have real<lower=0, upper = 1> pi . It’s surprsing that your code compiles this way.

Hope that helps.

Rather than doing transform and log_sum_exp yourself, just declare mu as positive-ordered. It does the same thing under the hood as what you’re doing.

Also, there’s a log_mix function that’ll simplify the likelihood calculation.

Neither of these will change the results you get though. The tight priors on beta are indeed suspect to me as are the ones on theta. Did you realize Stan uses the scale (standard deviation) as a parameter, not precision? We have a Wiki on prior recommendations and also a lot of advice in the user’s guide.

Thank you Martinmodrak and Bob, for your short and brief comments. I tried to considered the comments; but still I have faced some problems.
I declared mu as positive-ordered and remove my own transformation, but it gives the same error. Therefor, I try to modified my previous code as follow.

data {

 int<lower = 0> N;           // number of data points
 int <lower=1> y[N];         // observations that truncated at 0
 int<lower=0> p;            // number of Covariates 

   // Covariates 

 int <lower=0> x1[N];  
 real <lower=0> x2[N];
 real<lower=0> x3[N];
  
 
}


parameters {
 real<lower=0> ci; // any Positive number added on the mu[1] to get mu[2] 
 real beta[p];
 
 real <lower=0, upper=1> pi;        // mixing proportions

}

transformed parameters{

real lp[N];
positive_ordered[2] mu;        // locations of mixture components

for(i in 1:N){  

 lp[i] = beta[1] + beta[2]*x1[i] + beta[3]*x2[i] + beta[4]*x3[i];
 mu[1] = exp(lp[i]);
 
 }
 mu[2] = mu[1]*ci;

 }


model {
 
  // Priors
 beta[1] ~ normal(0,1.0E1);
 beta[2] ~ normal(0,1.0E1);
 beta[3] ~ normal(0,1.0E1);
 beta[4] ~ normal(0,1.0E1);

 ci ~ normal(0,1.0E1);

 pi ~ beta(Am,Am);

   // Likelihood

 for (i in 1:N) {

     target += log_mix(pi,
                       poisson_lpmf(y | mu[1]),
                       poisson_lpmf(y | mu[2]));


}

}

When I run the above code, I got the following error and it is not finished the process yet. Therefore, would you please give me some hint or modification over it?

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: model3098364d464b_20869da05759798735f6ab7ab1a349fd_namespace::write_array: mu is not a valid ordered vector. The element at 2 is 0, but should be greater than the previous element, 0  (in 'model3098364d464b_20869da05759798735f6ab7ab1a349fd' at line 40)

Chain 1: 
Chain 1: Gradient evaluation took 6.045 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 60450 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 3000 [  0%]  (Warmup)

I think that should be mu[2] = mu[1] + ci;
An alternative to your could on log-scale would be:

 target += log_mix(pi,
                   poisson_log_lpmf(y | l_mu[1]),
                   poisson_log_lpmf(y | log_sum_exp(l_mu[1], l_ci)));

were l_mu[1], l_ci \in \mathbb{R}.

Dear Andre.pfeuffer
I thank you for your help. I considered your comment and got the following error.

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: poisson_log_lpmf: Log rate parameter is nan, but must not be nan!  (in 'model33485ac8637e_2622b2dbe3528d69901619675cb8bf95' at line 73)

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: model33485ac8637e_2622b2dbe3528d69901619675cb8bf95_namespace::write_array: mu is not a valid ordered vector. The element at 2 is inf, but should be greater than the previous element, inf  (in 'model33485ac8637e_2622b2dbe3528d69901619675cb8bf95' at line 40)

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: model33485ac8637e_2622b2dbe3528d69901619675cb8bf95_namespace::write_array: mu is not a valid ordered vector. The element at 2 is inf, but should be greater than the previous element, inf  (in 'model33485ac8637e_2622b2dbe3528d69901619675cb8bf95' at line 40)

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: poisson_log_lpmf: Log rate parameter is nan, but m
Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: poisson_log_lpmf: Log rate parameter is nan, but must not be nan!  (in 'model33485ac8637e_2622b2dbe3528d69901619675cb8bf95' at line 73)

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: model33485ac8637e_2622b2dbe3528d69901619675cb8bf95_namespace::write_array: mu is not a valid ordered vector. The element at 2 is inf, but should be greater than the previous element, inf  (in 'model33485ac8637e_2622b2dbe3528d69901619675cb8bf95' at line 40)

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: poisson_log_lpmf: Log rate parameter is nan, but must not be nan!  (in 'model33485ac8637e_2622b2dbe3528d69901619675cb8bf95' at line 73)

Chain 1: 
Chain 1: Initialization between (-2, 2) failed after 100 attempts. 
Chain 1:  Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."
[1] "error occurred during calling the sampler; sampling not done"

The error message show problems at line 73 (related to target += command) and 40 (related to l_mu[1]; because, I declared this as a real parameter).

but, when I change only the expression of mu[2] = mu[1]*ci to mu[2] = mu[1]*ci I got the following output.

data {
 int<lower = 0> N;           // number of data points
 int <lower=1> y[N];         // observations that truncated at 0
 int<lower=0> p;            // number of Covariates 
   // Covariates 
 int <lower=0> x1[N];  
 real <lower=0> x2[N];
 real<lower=0> x3[N];

}

parameters {
 real<lower=0> ci; // any Positive number added on the mu[1] to get mu[2] 
 real beta[p];
 real <lower=0, upper=1> pi;        // mixing proportions
}
transformed parameters{
real lp[N];
positive_ordered[2] mu;        // locations of mixture components
for(i in 1:N){  
 lp[i] = beta[1] + beta[2]*x1[i] + beta[3]*x2[i] + beta[4]*x3[i];
 mu[1] = exp(lp[i]);
 }
 mu[2] = mu[1]+ci;
 }

model {
// Priors
 beta[1] ~ normal(0,1.0E1);
 beta[2] ~ normal(0,1.0E1);
 beta[3] ~ normal(0,1.0E1);
 beta[4] ~ normal(0,1.0E1);
 
 ci ~ normal(0,1.0E1);
 pi ~ beta(Am,Am);

   // Likelihood
 for (i in 1:N) {
     target += log_mix(pi,
                       poisson_lpmf(y | mu[1]),
                       poisson_lpmf(y | mu[2]));
}
}

SAMPLING FOR MODEL '4730afcc6face2800fd0d2208581e24a' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 3.083 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 30830 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 3000 [  0%]  (Warmup)

Stan works on the unconstrained space anyway. The two solutions may result in the same.

Did you put some priors on the parameters?
The exp in poisson is able to overflow. The sample should be ok, normally.

continuing with your first modelling approach:

for(i in 1:N){
lp[i] = beta[1] + beta[2]*di[i] + beta[3]*gcc[i] + beta[4]*ma[i] + beta[5]*en[i];
mu[1] = exp(lp[i]);
}
mu[2] = mu[1]+ci;
}

What does the loop do? There are two options here::
If you have two mu then then loop is not required.
Then you can restrict restrict mu by
vector<lower=0>[2] mu;
and omit the loop.

or do you have a two vectors the size N for each observation?
Then you can write define mu as

vector<lower=0>[N] mu[2];

with
mu1[1] = exp(beta[1] + beta[2]*x1 + beta[3]*x2 + beta[4]*x3 + beta[5]*x4); and
mu[2] = mu[1] + c;

BTW, If you define x1 as Matrix X you can write mu[1] = exp(X * beta);

From your intro post:

Then mu1 = mu2 + c, with c > 0? But your model says the opposite.

Since you add only a constant you may calculate the log_mix formula by hand and see if you benefit from some simplifications.

If you you want to find out, which signal is true/false for each observation, then shouldn’t be there indicator variables for each observation?

In my model I included the latent variable (will have a Bernoulli distribution) that indicates the category of each observation, but when I read Stan manual it is treated in different way (as far as my understanding) and try to write the code in this way. Therefore, (1)-How can I integrate the effect of indicator variables? (2)- How can I incorporate the latent variable that will allocate the category of dependent variable (classify as a false or true observation)?

I’m not sure, if a poisson-mixture is appropriate here, since I assume a mixture is something which is comes from different sources. (I hope the math guys here forgive me)
I’d first study the literature.
Out of the blue… Aki’s regularized horseshoe and rebar are great tools for sparse signals.
So, if you have that constant c, you may define indicator variables as
vector<lower=0, upper=1> signal[N];

e_mu2[i] = exp(mu1[i]) + signal[i] * c;

and the implementation of signals depends on the expected signal/noise ratio.
If you know you only have a few “errors” then go with reg. horseshoe/rebar.
Otherwise, maybe a Bernoulli distribution suits better. I’m no expert in this field.

Could any one comment on my model specification and R code?

My model is mixture zero truncated Poisson; therefore can I specify similarly to the above truncated normal?

int <lower=1> y[N];      // Zero truncated 
 target += log_mix(pi,
                      poisson_lpmf(y | mu[1]),
                      poisson_lpmf(y | mu[2]));   // here don't we put T[1,] to indicate the truncation?

This is different from the case I was sepaking about, because there, theta was a parameter with bounds in declaration, which is a special case, where Stan does the truncation for you. Here, the left hand side (y) is not a parameter. Also, the T[1,] syntax is AFAIK only allowed for sampling statements so won’t work here. IMHO you need to truncate manually.

You need the pmf to sum to 1. So for the single Poisson case, when y > 0, you get P(y | \mu, y > 0) = \frac{Poisson(y | \mu)}{1 - Poisson(0 | \mu)} moving to log scale and adding the mixture you get:

target += log_mix(pi,
                      poisson_lpmf(y | mu[1]) - log1m_exp(poisson_lpmf(0 | mu[1])),
                      poisson_lpmf(y | mu[2]) - log1m_exp(poisson_lpmf(0 | mu[2])));

Hope that helps.

1 Like

thank you. I get it!!

I have incorporated your comment and got the following out put and error

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: log_mix: lambda1 is nan, but must not be nan!  (in 'model1c386ed860c_f073e219c071f3ff312e50cabd855967' at line 73)

Chain 1: Rejecting initial value:
Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: 
Chain 1: Gradient evaluation took 6.693 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 66930 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 3000 [  0%]  (Warmup)

It has started to warm-up, but the time that estimated is 66930 seconds. I think it is too slow to finish (FYI: I am using big data).
I have also checked with small data set. The time was small and the first chain run correctly and failed in the second chain. Please look the following output.

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: log_mix: lambda1 is nan, but must not be nan!  (in 'model28c8358113f2_f073e219c071f3ff312e50cabd855967' at line 73)

Chain 1: Rejecting initial value:
Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: 
Chain 1: Gradient evaluation took 0.004 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 40 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 1500 [  0%]  (Warmup)
Chain 1: Iteration:  150 / 1500 [ 10%]  (Warmup)
Chain 1: Iteration:  300 / 1500 [ 20%]  (Warmup)
Chain 1: Iteration:  450 / 1500 [ 30%]  (Warmup)
Chain 1: Iteration:  501 / 1500 [ 33%]  (Sampling)
Chain 1: Iteration:  650 / 1500 [ 43%]  (Sampling)
Chain 1: Iteration:  800 / 1500 [ 53%]  (Sampling)
Chain 1: Iteration:  950 / 1500 [ 63%]  (Sampling)
Chain 1: Iteration: 1100 / 1500 [ 73%]  (Sampling)
Chain 1: Iteration: 1250 / 1500 [ 83%]  (Sampling)
Chain 1: Iteration: 1400 / 1500 [ 93%]  (Sampling)
Chain 1: Iteration: 1500 / 1500 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 1699 seconds (Warm-up)
Chain 1:                4111.72 seconds (Sampling)
Chain 1:                5810.71 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'f073e219c071f3ff312e50cabd855967' NOW (CHAIN 2).
Chain 2: Rejecting initial value:
Chain 2:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 2:   Stan can't start sampling from this initial value.
Chain 2: Rejecting initial value:
Chain 2:   Error evaluating the log probability at the initial value.
Chain 2: Exception: log_mix: lambda1 is nan, but must not be nan!  (in 'model28c8358113f2_f073e219c071f3ff312e50cabd855967' at line 73)

Dear Bob_Carpenter
I have declared mu as positive-ordered and I got the following error (I got this error after 24 hrs).

Rejecting initial value:
  Error evaluating the log probability at the initial value.
Exception: validate transformed params: mu is not a valid ordered vector. 
The element at 2 is 0, but should be greater than the previous element, 0  (in 'model15b93039b2e6_b1b7046f166a91eba9965648600041d6' at line 42)

Initialization between (-2, 2) failed after 100 attempts. 
 Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."
error occurred during calling the sampler; sampling not done

is it possible to solve this problem through specifying the initial value (still I am using the default).

I don’t undertand what you want to correspond to what.

You have a two-dimensional parameter mu and a one-dimensional parmaeter pi. You’re looking at draws from the posterior. There’s not a different mu or pi for each Y in the model.

Did you realize that Stan uses the standard deviation to parameterize normal? Also, you can collapse all those statements into

beta ~ normal(0, 1e-3);

it is the probability of the first group and the the second will have 1- pi.
I need the corresponding pi for each observation, therefore how can we get such value?