Two-Component Mixture Bayesian Poisson Regression Model


#1

Hello researchers,

I am trying to run two-component mixture Poisson regression model. The mean of first cluster greater than the second. I have tried the Stan code as follow; but I am not sure how the independent variables are integrated within the model (I could not find any related Stan material over it) and where the truncated (T[0,]) option is put. Finally, I want to put mu[2] = mu[1] + theta; theta will be any postie number; but I don’t know how it can be integrated to my code? Therefore, would you please give me some hint/correction or related materials over my code?


data {

int<lower = 0> N; // number of data points
int <lower=0> y[N]; // observations
int<lower=1> K; // number of mixture components
int<lower=0> p; // number of Covariates

// Covariates 

int <lower=0> x1[N];
real <lower=0> x2[N];
real<lower=0> x3[N];
real<lower=0> x4[N];
real <lower=0> x5[N];

}

transformed data {

real <lower=0> Amci;
Amci=mean(mci);

}

parameters {
real<lower=0> theta;
real beta[p];

simplex[2] pi; // mixing proportions

}

transformed parameters{

real lp[N];
ordered mu[2]; // locations of mixture components

for(i in 1:N){

lp[i] = beta[1] + beta[2]*x1[i] + beta[3]*x2[i] + beta[4]*x3[i] + beta[5]*x4[i];

for (k in 1:K){

mu[1] = exp(lp);
mu[2] = mu[1]+theta;

}

}

}

model {

// Priors
beta[1] ~ normal(0,1.0E-3);
beta[2] ~ normal(0,1.0E-3);
beta[3] ~ normal(0,1.0E-3);
beta[4] ~ normal(0,1.0E-3);
beta[5] ~ normal(0,1.0E-3);
theta ~ normal(0,1.0E-6)T[0,];

pi ~ beta(Amci,Amci);

// Likelihood

for (i in 1:N) {

  target += log_sum_exp(log(pi) + poisson_lpmf(y | mu[1]),
            log1m(pi) + poisson_lpmf(y | mu[2]));

}


#2

Hi,
inspecting your code, it looks roughly OK to me. I am not sure what is the problem you are describing. Some minor notes:

  • You can use triple bacticks: ``` to mark the beginning and end of your code in the post to have it formatted in monospace for better readability
  • Your priors for beta are very narrow, consider standardizing x so that you can use normal(0, 1);
  • The prior for theta is also very narrow so that mu[1] will be almost equal to mu[2] is that intentional?
  • It could make more sense to have mu[2] = ex[(lp + theta), i.e. that theta would be on the same (log) scale as beta.
  • For improved efficiency, you could code x and beta as matrix[5,N] x; and vector[5] beta; - then you’d be able to use matrix multiplication and have vector[N] lp; and lp = beta * x; (or maybe matrix[N,5] x; I always mess up which coordinate is which :-) )
  • The truncation in theta ~ normal(0,1.0E-6)T[0,]; is unnecessary. When you give bounds in parameter declarations, truncation of distributions is performed automatically by Stan.
  • Instead of simplex[2] pi you probably want to have real<lower=0, upper = 1> pi . It’s surprsing that your code compiles this way.

Hope that helps.


#3

Rather than doing transform and log_sum_exp yourself, just declare mu as positive-ordered. It does the same thing under the hood as what you’re doing.

Also, there’s a log_mix function that’ll simplify the likelihood calculation.

Neither of these will change the results you get though. The tight priors on beta are indeed suspect to me as are the ones on theta. Did you realize Stan uses the scale (standard deviation) as a parameter, not precision? We have a Wiki on prior recommendations and also a lot of advice in the user’s guide.


#4

Thank you Martinmodrak and Bob, for your short and brief comments. I tried to considered the comments; but still I have faced some problems.
I declared mu as positive-ordered and remove my own transformation, but it gives the same error. Therefor, I try to modified my previous code as follow.

data {

 int<lower = 0> N;           // number of data points
 int <lower=1> y[N];         // observations that truncated at 0
 int<lower=0> p;            // number of Covariates 

   // Covariates 

 int <lower=0> di[N];  
 real <lower=0> en[N];
 real<lower=0> gcc[N];
 real<lower=0> ma[N];
 real <lower=0> mci[N];
 
 
}

transformed data {

real <lower=0> Amci;
Amci=mean(mci);

}

parameters {
 real<lower=0> ci; // any Positive number added on the mu[1] to get mu[2] 
 real beta[p];
 
 real <lower=0, upper=1> pi;        // mixing proportions

}

transformed parameters{

real lp[N];
positive_ordered[2] mu;        // locations of mixture components

for(i in 1:N){  

 lp[i] = beta[1] + beta[2]*di[i] + beta[3]*gcc[i] + beta[4]*ma[i] + beta[5]*en[i];
 
 mu[1] = exp(lp[i]);
 
 }
 mu[2] = mu[1]*ci;

 }


model {
 
  // Priors
 beta[1] ~ normal(0,1.0E1);
 beta[2] ~ normal(0,1.0E1);
 beta[3] ~ normal(0,1.0E1);
 beta[4] ~ normal(0,1.0E1);
 beta[5] ~ normal(0,1.0E1);
 ci ~ normal(0,1.0E1);

 pi ~ beta(mci,Amci);

   // Likelihood

 for (i in 1:N) {

     target += log_mix(pi,
                       poisson_lpmf(y | mu[1]),
                       poisson_lpmf(y | mu[2]));


}

}

When I run the above code, I got the following error and it is not finished the process yet. Therefore, would you please give me some hint or modification over it?

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: model3098364d464b_20869da05759798735f6ab7ab1a349fd_namespace::write_array: mu is not a valid ordered vector. The element at 2 is 0, but should be greater than the previous element, 0  (in 'model3098364d464b_20869da05759798735f6ab7ab1a349fd' at line 40)

Chain 1: 
Chain 1: Gradient evaluation took 6.045 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 60450 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 3000 [  0%]  (Warmup)

#5

I think that should be mu[2] = mu[1] + ci;
An alternative to your could on log-scale would be:

 target += log_mix(pi,
                   poisson_log_lpmf(y | l_mu[1]),
                   poisson_log_lpmf(y | log_sum_exp(l_mu[1], l_ci)));

were l_mu[1], l_ci \in \mathbb{R}.


#6

Dear Andre.pfeuffer
I thank you for your help. I considered your comment and got the following error.

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: poisson_log_lpmf: Log rate parameter is nan, but must not be nan!  (in 'model33485ac8637e_2622b2dbe3528d69901619675cb8bf95' at line 73)

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: model33485ac8637e_2622b2dbe3528d69901619675cb8bf95_namespace::write_array: mu is not a valid ordered vector. The element at 2 is inf, but should be greater than the previous element, inf  (in 'model33485ac8637e_2622b2dbe3528d69901619675cb8bf95' at line 40)

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: model33485ac8637e_2622b2dbe3528d69901619675cb8bf95_namespace::write_array: mu is not a valid ordered vector. The element at 2 is inf, but should be greater than the previous element, inf  (in 'model33485ac8637e_2622b2dbe3528d69901619675cb8bf95' at line 40)

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: poisson_log_lpmf: Log rate parameter is nan, but m
Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: poisson_log_lpmf: Log rate parameter is nan, but must not be nan!  (in 'model33485ac8637e_2622b2dbe3528d69901619675cb8bf95' at line 73)

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: model33485ac8637e_2622b2dbe3528d69901619675cb8bf95_namespace::write_array: mu is not a valid ordered vector. The element at 2 is inf, but should be greater than the previous element, inf  (in 'model33485ac8637e_2622b2dbe3528d69901619675cb8bf95' at line 40)

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: poisson_log_lpmf: Log rate parameter is nan, but must not be nan!  (in 'model33485ac8637e_2622b2dbe3528d69901619675cb8bf95' at line 73)

Chain 1: 
Chain 1: Initialization between (-2, 2) failed after 100 attempts. 
Chain 1:  Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."
[1] "error occurred during calling the sampler; sampling not done"

The error message show problems at line 73 (related to target += command) and 40 (related to l_mu[1]; because, I declared this as a real parameter).


#7

but, when I change only the expression of mu[2] = mu[1]*ci to mu[2] = mu[1]*ci I got the following output.

data {
 int<lower = 0> N;           // number of data points
 int <lower=1> y[N];         // observations that truncated at 0
 int<lower=0> p;            // number of Covariates 
   // Covariates 
 int <lower=0> di[N];  
 real <lower=0> en[N];
 real<lower=0> gcc[N];
 real<lower=0> ma[N];
 real <lower=0> mci[N];
}

transformed data {
real <lower=0> Amci;
Amci=mean(mci);
}
parameters {
 real<lower=0> ci; // any Positive number added on the mu[1] to get mu[2] 
 real beta[p];
 real <lower=0, upper=1> pi;        // mixing proportions
}
transformed parameters{
real lp[N];
positive_ordered[2] mu;        // locations of mixture components
for(i in 1:N){  
 lp[i] = beta[1] + beta[2]*di[i] + beta[3]*gcc[i] + beta[4]*ma[i] + beta[5]*en[i];
 mu[1] = exp(lp[i]);
 }
 mu[2] = mu[1]+ci;
 }

model {
// Priors
 beta[1] ~ normal(0,1.0E1);
 beta[2] ~ normal(0,1.0E1);
 beta[3] ~ normal(0,1.0E1);
 beta[4] ~ normal(0,1.0E1);
 beta[5] ~ normal(0,1.0E1);
 ci ~ normal(0,1.0E1);
 pi ~ beta(mci,Amci);

   // Likelihood
 for (i in 1:N) {
     target += log_mix(pi,
                       poisson_lpmf(y | mu[1]),
                       poisson_lpmf(y | mu[2]));
}
}

SAMPLING FOR MODEL '4730afcc6face2800fd0d2208581e24a' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 3.083 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 30830 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 3000 [  0%]  (Warmup)

#8

Stan works on the unconstrained space anyway. The two solutions may result in the same.

Did you put some priors on the parameters?
The exp in poisson is able to overflow. The sample should be ok, normally.

continuing with your first modelling approach:

for(i in 1:N){
lp[i] = beta[1] + beta[2]*di[i] + beta[3]*gcc[i] + beta[4]*ma[i] + beta[5]*en[i];
mu[1] = exp(lp[i]);
}
mu[2] = mu[1]+ci;
}

What does the loop do? There are two options here::
If you have two mu then then loop is not required.
Then you can restrict restrict mu by
vector<lower=0>[2] mu;
and omit the loop.

or do you have a two vectors the size N for each observation?
Then you can write define mu as

vector<lower=0>[N] mu[2];

with
mu1[1] = exp(beta[1] + beta[2]*x1 + beta[3]*x2 + beta[4]*x3 + beta[5]*x4); and
mu[2] = mu[1] + c;

BTW, If you define x1 as Matrix X you can write mu[1] = exp(X * beta);

From your intro post:

Then mu1 = mu2 + c, with c > 0? But your model says the opposite.

Since you add only a constant you may calculate the log_mix formula by hand and see if you benefit from some simplifications.


#9

Thank you for your response. Let me explain my model:

I have only one vector y[N] as a dependent variable. But this variable has contained true and false observation and I don’t know which one is true and false signal. Therefore, I want to model these observation as a mixture model. Here, the mean of true observation(previously I call it mu[2]) is greater than the second one (false observation; mu[1]). But both of the groups have the same Covariates. (note: Sorry previously I mentioned reversely what is mu[2] and mu[1], the correct one is mu[2 ]> mu[1]).

I have put these priors in my previous code. But, I did not mentioned any initial value (I don’t know this may affect my current output) in the previous code.


#10

If you you want to find out, which signal is true/false for each observation, then shouldn’t be there indicator variables for each observation?


#11

In my model I included the latent variable (will have a Bernoulli distribution) that indicates the category of each observation, but when I read Stan manual it is treated in different way (as far as my understanding) and try to write the code in this way. Therefore, (1)-How can I integrate the effect of indicator variables? (2)- How can I incorporate the latent variable that will allocate the category of dependent variable (classify as a false or true observation)?


#12

I’m not sure, if a poisson-mixture is appropriate here, since I assume a mixture is something which is comes from different sources. (I hope the math guys here forgive me)
I’d first study the literature.
Out of the blue… Aki’s regularized horseshoe and rebar are great tools for sparse signals.
So, if you have that constant c, you may define indicator variables as
vector<lower=0, upper=1> signal[N];

e_mu2[i] = exp(mu1[i]) + signal[i] * c;

and the implementation of signals depends on the expected signal/noise ratio.
If you know you only have a few “errors” then go with reg. horseshoe/rebar.
Otherwise, maybe a Bernoulli distribution suits better. I’m no expert in this field.


#13
                         _**Short description of the model**_