Mixture Bayesian Poisson Regression Model


#5

I think that should be mu[2] = mu[1] + ci;
An alternative to your could on log-scale would be:

 target += log_mix(pi,
                   poisson_log_lpmf(y | l_mu[1]),
                   poisson_log_lpmf(y | log_sum_exp(l_mu[1], l_ci)));

were l_mu[1], l_ci \in \mathbb{R}.


#6

Dear Andre.pfeuffer
I thank you for your help. I considered your comment and got the following error.

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: poisson_log_lpmf: Log rate parameter is nan, but must not be nan!  (in 'model33485ac8637e_2622b2dbe3528d69901619675cb8bf95' at line 73)

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: model33485ac8637e_2622b2dbe3528d69901619675cb8bf95_namespace::write_array: mu is not a valid ordered vector. The element at 2 is inf, but should be greater than the previous element, inf  (in 'model33485ac8637e_2622b2dbe3528d69901619675cb8bf95' at line 40)

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: model33485ac8637e_2622b2dbe3528d69901619675cb8bf95_namespace::write_array: mu is not a valid ordered vector. The element at 2 is inf, but should be greater than the previous element, inf  (in 'model33485ac8637e_2622b2dbe3528d69901619675cb8bf95' at line 40)

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: poisson_log_lpmf: Log rate parameter is nan, but m
Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: poisson_log_lpmf: Log rate parameter is nan, but must not be nan!  (in 'model33485ac8637e_2622b2dbe3528d69901619675cb8bf95' at line 73)

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: model33485ac8637e_2622b2dbe3528d69901619675cb8bf95_namespace::write_array: mu is not a valid ordered vector. The element at 2 is inf, but should be greater than the previous element, inf  (in 'model33485ac8637e_2622b2dbe3528d69901619675cb8bf95' at line 40)

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: poisson_log_lpmf: Log rate parameter is nan, but must not be nan!  (in 'model33485ac8637e_2622b2dbe3528d69901619675cb8bf95' at line 73)

Chain 1: 
Chain 1: Initialization between (-2, 2) failed after 100 attempts. 
Chain 1:  Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."
[1] "error occurred during calling the sampler; sampling not done"

The error message show problems at line 73 (related to target += command) and 40 (related to l_mu[1]; because, I declared this as a real parameter).


#7

but, when I change only the expression of mu[2] = mu[1]*ci to mu[2] = mu[1]*ci I got the following output.

data {
 int<lower = 0> N;           // number of data points
 int <lower=1> y[N];         // observations that truncated at 0
 int<lower=0> p;            // number of Covariates 
   // Covariates 
 int <lower=0> x1[N];  
 real <lower=0> x2[N];
 real<lower=0> x3[N];

}

parameters {
 real<lower=0> ci; // any Positive number added on the mu[1] to get mu[2] 
 real beta[p];
 real <lower=0, upper=1> pi;        // mixing proportions
}
transformed parameters{
real lp[N];
positive_ordered[2] mu;        // locations of mixture components
for(i in 1:N){  
 lp[i] = beta[1] + beta[2]*x1[i] + beta[3]*x2[i] + beta[4]*x3[i];
 mu[1] = exp(lp[i]);
 }
 mu[2] = mu[1]+ci;
 }

model {
// Priors
 beta[1] ~ normal(0,1.0E1);
 beta[2] ~ normal(0,1.0E1);
 beta[3] ~ normal(0,1.0E1);
 beta[4] ~ normal(0,1.0E1);
 
 ci ~ normal(0,1.0E1);
 pi ~ beta(Am,Am);

   // Likelihood
 for (i in 1:N) {
     target += log_mix(pi,
                       poisson_lpmf(y | mu[1]),
                       poisson_lpmf(y | mu[2]));
}
}

SAMPLING FOR MODEL '4730afcc6face2800fd0d2208581e24a' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 3.083 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 30830 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 3000 [  0%]  (Warmup)

#8

Stan works on the unconstrained space anyway. The two solutions may result in the same.

Did you put some priors on the parameters?
The exp in poisson is able to overflow. The sample should be ok, normally.

continuing with your first modelling approach:

for(i in 1:N){
lp[i] = beta[1] + beta[2]*di[i] + beta[3]*gcc[i] + beta[4]*ma[i] + beta[5]*en[i];
mu[1] = exp(lp[i]);
}
mu[2] = mu[1]+ci;
}

What does the loop do? There are two options here::
If you have two mu then then loop is not required.
Then you can restrict restrict mu by
vector<lower=0>[2] mu;
and omit the loop.

or do you have a two vectors the size N for each observation?
Then you can write define mu as

vector<lower=0>[N] mu[2];

with
mu1[1] = exp(beta[1] + beta[2]*x1 + beta[3]*x2 + beta[4]*x3 + beta[5]*x4); and
mu[2] = mu[1] + c;

BTW, If you define x1 as Matrix X you can write mu[1] = exp(X * beta);

From your intro post:

Then mu1 = mu2 + c, with c > 0? But your model says the opposite.

Since you add only a constant you may calculate the log_mix formula by hand and see if you benefit from some simplifications.


#9

Thank you for your response. Let me explain my model:

I have only one vector y[N] as a dependent variable. But this variable has contained true and false observation and I don’t know which one is true and false signal. Therefore, I want to model these observation as a mixture model. Here, the mean of true observation(previously I call it mu[2]) is greater than the second one (false observation; mu[1]). But both of the groups have the same Covariates. (note: Sorry previously I mentioned reversely what is mu[2] and mu[1], the correct one is mu[2 ]> mu[1]).

I have put these priors in my previous code. But, I did not mentioned any initial value (I don’t know this may affect my current output) in the previous code.


#10

If you you want to find out, which signal is true/false for each observation, then shouldn’t be there indicator variables for each observation?


#11

In my model I included the latent variable (will have a Bernoulli distribution) that indicates the category of each observation, but when I read Stan manual it is treated in different way (as far as my understanding) and try to write the code in this way. Therefore, (1)-How can I integrate the effect of indicator variables? (2)- How can I incorporate the latent variable that will allocate the category of dependent variable (classify as a false or true observation)?


#12

I’m not sure, if a poisson-mixture is appropriate here, since I assume a mixture is something which is comes from different sources. (I hope the math guys here forgive me)
I’d first study the literature.
Out of the blue… Aki’s regularized horseshoe and rebar are great tools for sparse signals.
So, if you have that constant c, you may define indicator variables as
vector<lower=0, upper=1> signal[N];

e_mu2[i] = exp(mu1[i]) + signal[i] * c;

and the implementation of signals depends on the expected signal/noise ratio.
If you know you only have a few “errors” then go with reg. horseshoe/rebar.
Otherwise, maybe a Bernoulli distribution suits better. I’m no expert in this field.


#13

Yi is a random variable and it will follow a mixture of two Poisson distribution, that is, the first mixture component corresponds to the false signal and the second component corresponds to the true signal. Therefore, Yi will be written as a sum of two probability mass functions: Yi~Poi.f(.|mu0i)+𝑃1i.f(.|mu1i). Where, 𝑃0i and P1i are the probability of observation i coming from the 1st or 2nd component respectively.


#14

Could any one comment on my model specification and R code?


#15

My model is mixture zero truncated Poisson; therefore can I specify similarly to the above truncated normal?

int <lower=1> y[N];      // Zero truncated 
 target += log_mix(pi,
                      poisson_lpmf(y | mu[1]),
                      poisson_lpmf(y | mu[2]));   // here don't we put T[1,] to indicate the truncation?

#16

This is different from the case I was sepaking about, because there, theta was a parameter with bounds in declaration, which is a special case, where Stan does the truncation for you. Here, the left hand side (y) is not a parameter. Also, the T[1,] syntax is AFAIK only allowed for sampling statements so won’t work here. IMHO you need to truncate manually.

You need the pmf to sum to 1. So for the single Poisson case, when y > 0, you get P(y | \mu, y > 0) = \frac{Poisson(y | \mu)}{1 - Poisson(0 | \mu)} moving to log scale and adding the mixture you get:

target += log_mix(pi,
                      poisson_lpmf(y | mu[1]) - log1m_exp(poisson_lpmf(0 | mu[1])),
                      poisson_lpmf(y | mu[2]) - log1m_exp(poisson_lpmf(0 | mu[2])));

Hope that helps.


#17

thank you. I get it!!


#18

I have incorporated your comment and got the following out put and error

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: log_mix: lambda1 is nan, but must not be nan!  (in 'model1c386ed860c_f073e219c071f3ff312e50cabd855967' at line 73)

Chain 1: Rejecting initial value:
Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: 
Chain 1: Gradient evaluation took 6.693 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 66930 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 3000 [  0%]  (Warmup)

It has started to warm-up, but the time that estimated is 66930 seconds. I think it is too slow to finish (FYI: I am using big data).
I have also checked with small data set. The time was small and the first chain run correctly and failed in the second chain. Please look the following output.

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: log_mix: lambda1 is nan, but must not be nan!  (in 'model28c8358113f2_f073e219c071f3ff312e50cabd855967' at line 73)

Chain 1: Rejecting initial value:
Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: 
Chain 1: Gradient evaluation took 0.004 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 40 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 1500 [  0%]  (Warmup)
Chain 1: Iteration:  150 / 1500 [ 10%]  (Warmup)
Chain 1: Iteration:  300 / 1500 [ 20%]  (Warmup)
Chain 1: Iteration:  450 / 1500 [ 30%]  (Warmup)
Chain 1: Iteration:  501 / 1500 [ 33%]  (Sampling)
Chain 1: Iteration:  650 / 1500 [ 43%]  (Sampling)
Chain 1: Iteration:  800 / 1500 [ 53%]  (Sampling)
Chain 1: Iteration:  950 / 1500 [ 63%]  (Sampling)
Chain 1: Iteration: 1100 / 1500 [ 73%]  (Sampling)
Chain 1: Iteration: 1250 / 1500 [ 83%]  (Sampling)
Chain 1: Iteration: 1400 / 1500 [ 93%]  (Sampling)
Chain 1: Iteration: 1500 / 1500 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 1699 seconds (Warm-up)
Chain 1:                4111.72 seconds (Sampling)
Chain 1:                5810.71 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'f073e219c071f3ff312e50cabd855967' NOW (CHAIN 2).
Chain 2: Rejecting initial value:
Chain 2:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 2:   Stan can't start sampling from this initial value.
Chain 2: Rejecting initial value:
Chain 2:   Error evaluating the log probability at the initial value.
Chain 2: Exception: log_mix: lambda1 is nan, but must not be nan!  (in 'model28c8358113f2_f073e219c071f3ff312e50cabd855967' at line 73)

#19

Dear Bob_Carpenter
I have declared mu as positive-ordered and I got the following error (I got this error after 24 hrs).

Rejecting initial value:
  Error evaluating the log probability at the initial value.
Exception: validate transformed params: mu is not a valid ordered vector. 
The element at 2 is 0, but should be greater than the previous element, 0  (in 'model15b93039b2e6_b1b7046f166a91eba9965648600041d6' at line 42)

Initialization between (-2, 2) failed after 100 attempts. 
 Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."
error occurred during calling the sampler; sampling not done

is it possible to solve this problem through specifying the initial value (still I am using the default).


#20

let me ask you one technical question. After compiling my model, I can extract samples using this script extract(resStan, pars=c("mu","pi")) and got such kind of output:

          mu.1         mu.2        pi
1    0.004095345     5.554968 0.4993063
2    0.010632282     5.548744 0.5002001
3    0.003430598     5.548196 0.4993079
4    0.002917014     5.551981 0.4995729
5    0.004357083     5.544148 0.4990187
6    0.032742385     5.548469 0.4995748
.
.
.

But,I will use the value of pi to classify Y[N] to be in group[1] if pi < 0.5 else it will be in group[2]. Therefore, is it possible to get the corresponding pi and mu values for my original observation Y like:

Y     mu    pi
2     1.9   0.4
3     3.1   0.2
1     0.7   0.8
9     8.9   0.1
.
.
.

If not, how can I make the categorization based on pi value?
Thank you for your cooperation.


#21

I don’t undertand what you want to correspond to what.

You have a two-dimensional parameter mu and a one-dimensional parmaeter pi. You’re looking at draws from the posterior. There’s not a different mu or pi for each Y in the model.


#22

Did you realize that Stan uses the standard deviation to parameterize normal? Also, you can collapse all those statements into

beta ~ normal(0, 1e-3);

#23

it is the probability of the first group and the the second will have 1- pi.
I need the corresponding pi for each observation, therefore how can we get such value?


#24

I don’t think I can really follow this. Could you post your current Stan code and describe where the data come from and what inferences you are trying to make, which questions are you trying to answer?