Mixture Bayesian Poisson Regression Model

#18

I have incorporated your comment and got the following out put and error

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: log_mix: lambda1 is nan, but must not be nan!  (in 'model1c386ed860c_f073e219c071f3ff312e50cabd855967' at line 73)

Chain 1: Rejecting initial value:
Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: 
Chain 1: Gradient evaluation took 6.693 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 66930 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 3000 [  0%]  (Warmup)

It has started to warm-up, but the time that estimated is 66930 seconds. I think it is too slow to finish (FYI: I am using big data).
I have also checked with small data set. The time was small and the first chain run correctly and failed in the second chain. Please look the following output.

Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: log_mix: lambda1 is nan, but must not be nan!  (in 'model28c8358113f2_f073e219c071f3ff312e50cabd855967' at line 73)

Chain 1: Rejecting initial value:
Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: 
Chain 1: Gradient evaluation took 0.004 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 40 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 1500 [  0%]  (Warmup)
Chain 1: Iteration:  150 / 1500 [ 10%]  (Warmup)
Chain 1: Iteration:  300 / 1500 [ 20%]  (Warmup)
Chain 1: Iteration:  450 / 1500 [ 30%]  (Warmup)
Chain 1: Iteration:  501 / 1500 [ 33%]  (Sampling)
Chain 1: Iteration:  650 / 1500 [ 43%]  (Sampling)
Chain 1: Iteration:  800 / 1500 [ 53%]  (Sampling)
Chain 1: Iteration:  950 / 1500 [ 63%]  (Sampling)
Chain 1: Iteration: 1100 / 1500 [ 73%]  (Sampling)
Chain 1: Iteration: 1250 / 1500 [ 83%]  (Sampling)
Chain 1: Iteration: 1400 / 1500 [ 93%]  (Sampling)
Chain 1: Iteration: 1500 / 1500 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 1699 seconds (Warm-up)
Chain 1:                4111.72 seconds (Sampling)
Chain 1:                5810.71 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'f073e219c071f3ff312e50cabd855967' NOW (CHAIN 2).
Chain 2: Rejecting initial value:
Chain 2:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 2:   Stan can't start sampling from this initial value.
Chain 2: Rejecting initial value:
Chain 2:   Error evaluating the log probability at the initial value.
Chain 2: Exception: log_mix: lambda1 is nan, but must not be nan!  (in 'model28c8358113f2_f073e219c071f3ff312e50cabd855967' at line 73)
#19

Dear Bob_Carpenter
I have declared mu as positive-ordered and I got the following error (I got this error after 24 hrs).

Rejecting initial value:
  Error evaluating the log probability at the initial value.
Exception: validate transformed params: mu is not a valid ordered vector. 
The element at 2 is 0, but should be greater than the previous element, 0  (in 'model15b93039b2e6_b1b7046f166a91eba9965648600041d6' at line 42)

Initialization between (-2, 2) failed after 100 attempts. 
 Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."
error occurred during calling the sampler; sampling not done

is it possible to solve this problem through specifying the initial value (still I am using the default).

#21

I don’t undertand what you want to correspond to what.

You have a two-dimensional parameter mu and a one-dimensional parmaeter pi. You’re looking at draws from the posterior. There’s not a different mu or pi for each Y in the model.

#22

Did you realize that Stan uses the standard deviation to parameterize normal? Also, you can collapse all those statements into

beta ~ normal(0, 1e-3);
#23

it is the probability of the first group and the the second will have 1- pi.
I need the corresponding pi for each observation, therefore how can we get such value?

#24

I don’t think I can really follow this. Could you post your current Stan code and describe where the data come from and what inferences you are trying to make, which questions are you trying to answer?

#25

my current code is the following:

data {

int<lower=0> N;           // number of data points
int <lower=1> y[N];         // observations
matrix[N,4] x;               // Covariates 
real<lower=0> mi[N];
}

transformed data {
real<lower=0> Ami;
Ami=mean(mi);
}

parameters {
real<lower=0> ci; // any Positive number added on the mu[1] to get mu[2] 
vector[4] beta;
real alpha;
vector<lower=0, upper=1>[N] pi;  // mixing proportions; need to have pi's at observation level

}

transformed parameters{
vector<lower=0>[N] mu[2];    // locations of mixture components; need to have mu's for each observation
    mu[1] = exp(alpha + x*beta);
    mu[2] = exp((alpha + x*beta)+ci);
  
}

model {

// Priors
  beta ~ normal(0, 1e1);
  alpha ~ normal(0,1e1);
  ci ~ normal(0, 1e1);
  pi ~ beta(mi,Ami);     // how can we change to Dirichlet distribution using mi and mci?
// Likelihood

  for (i in 1:N) 
      target += log_mix(pi,
                  poisson_lpmf(y[i] | mu[1]) - log1m_exp(poisson_lpmf(0 | mu[1])),
                  poisson_lpmf(y[i] | mu[2]) - log1m_exp(poisson_lpmf(0 | mu[2])));
  
}

When I run the above code, I got this error

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for: 
  log_mix(vector, real, real)
Available argument signatures for log_mix:
  log_mix(real, real, real)
  log_mix(real[], real[])
  log_mix(real[], vector)
  log_mix(real[], row vector)
  log_mix(real[], vector[])
  log_mix(real[], row vector[])
  log_mix(vector, real[])
  log_mix(vector, vector)
  log_mix(vector, row vector)
  log_mix(vector, vector[])
  log_mix(vector, row vector[])
  log_mix(row vector, real[])
  log_mix(row vector, vector)
  log_mix(row vector, row vector)
  log_mix(row vector, vector[])
  log_mix(row vector, row vector[])

  error in 'modeld8755c72_ff542624806624524b7b715fc8665283' at line 56, column 82
  -------------------------------------------------
    54:       target += log_mix(pi,
    55:                   poisson_lpmf(y[i] | mu[1]) - log1m_exp(poisson_lpmf(0 | mu[1])),
    56:                   poisson_lpmf(y[i] | mu[2]) - log1m_exp(poisson_lpmf(0 | mu[2])));
                                                                                         ^
    57:   
  -------------------------------------------------
Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model 'ff542624806624524b7b715fc8665283' due to the above error.
In addition: Warning message:
In file.remove(tf) :
  cannot remove file 'C:\Users\521-04\AppData\Local\Temp\Rtmpc35uis\filed82a886c34', reason 'Permission denied'

I understand the problem here that the log_mix is real and it will not consider vector mixing proportion. But I need the mixing proportion at observation level and how can i handle this.

FYI: When I change the mixing proportion to real<lower=0, upper=1> pi it was working properly and can get the N amount of mu[1] and mu[2] , but could not get the mixing proportion at observation level.

#26

You probably want

target += log_mix(pi[i], ...
2 Likes
#28

Please look the following stan code. I have checked this code with small data set and it is ok. But, when I used my complete data (around 400,000 cases) it will not start the estimation yet (it’s the 5th days). Therefore, would you please give any comment or idea how can I make it fast.

data {
   int<lower=0> N;           // number of data points
  int <lower=1> y[N];         // observations
  matrix[N,4] x;               // Covariates 
 
}

parameters {
  real<lower=0> ci;                         // any Positive number added on the mu[1] to get mu[2] 
  real<lower=0> d;                        // will have uniform distribution that balances the shape parameters
  vector[4] beta;
  real alpha;
  vector<lower=0, upper=1>[N] pi;           // mixing proportions
}

transformed parameters{
  vector<lower=0>[N] mu[2];    // locations of mixture components
  mu[1] = exp(alpha + x*beta);
  mu[2] = exp((alpha + x*beta)+ci);
  
}

model {
  
  // Priors
  beta ~ normal(0, 1e3);
  alpha ~ normal(0,1e3);
  d ~ uniform(0,10000);    
  ci ~ normal(0, 1e3);
  
  
  // Likelihood
  for (i in 1:N) 
    target += log_mix(pi[i],
                      poisson_lpmf(y[i] | mu[1]) - log1m_exp(poisson_lpmf(0 | mu[1])),
                      poisson_lpmf(y[i] | mu[2]) - log1m_exp(poisson_lpmf(0 | mu[2])));
  
}

It keeps this message for the last five days

SAMPLING FOR MODEL 'af252afcc7248409611d59094236d80a' NOW (CHAIN 1).
#29

In no particular order:

  • When you have d ~ uniform(0, 10000), you have to declare real<lower=0, upper = 10000> d; The model statement is then no longer necessary.
  • In general, the Stan team discourages uniform priors. For d you are probably better of with a gamma or exponential style prior.
  • Your priors are too wide. Especially alpha, beta, and ci. exp(400) overflows and exp(-400) underflows. Scaling of Log-likelihood value in Hamiltonian Monte Carlo
  • Irrespective of the numerical issues, I would try to use more informative priors. What is the scale, you alpha, beta, and ci to be on? Can you standardize the x’s to make the betas on a scale closer to normal ~ (0, 1)?
  • You can work on the log scale with poisson_log_lpmf which should help with numerical stability because you avoid the mu = exp() step.
  • I don’t understand the poisson distribution you are using but I wonder whether there are no ways to simplify the distributions in log_mix. mu[1] and mu[2] only differ by term ci. The poisson is also relatively easy to think about at the log scale.
#30

400 000 cases is probably just too much for Stan. While there might be some improvements to speed available, I don’t think you are likely to get Stan to work with that. The problem is not primarily the number of data points (although that is big), but the number of parameters. In a small test I did elsewhere sampling from 100 000 i.i.d. normal, which is probably the easiest model you can think of, takes 40 minutes on my PC. Your model is both more than 4 times larger and waaay more complex. A simple hierarchical model with 25 000 parameters took me about 1.5 hours, and the same model with 100 000 parameters didn’t complete in 16 hours, so that’s just to provide some context.

In my experience it is however OK to split such large datasets randomly into smaller chunks that are manageable for Stan, you just need to check that the estimates of the shared parameters are similar across the chunks (which they will be unless there is some problem with your model).

Some stuff you can do to increase speed:

  • Use poisson_log_lpmf or poisson_log_glm_lpmf to avoid the exponentiation
  • Put tighter priors on everything (especially the uniform prior on d might cause some trouble), I would be surprised if anything wider than normal(0, 5) is actually needed for anything but the intercept, provided your predictors are normalized (which is a good idea anyway).
  • Check that you have enough memory - once you sart swapping to disk, you performance is dead. Note that RStan will allocate memory to store all the posterior draws from your model (which I would guess to be like 18GB/chain in your case).You can reduce that by 66% by moving mu from transformed parameters to model. You can use CmdStan to avoid storing draws in memory at all. You may also consider running fewer chains in parallel.
  • Store the result of alpha + x*beta, and reuse it for mu[2] (or preferrably use poisson_log_lpmf and avoid storing mu at all).
  • Provide good inits
1 Like
#31

Actually with the latest Stan additions, you just might be able to push your model to feasible region with all the datapoints, using map_rect for the likelihood loop (with MPI and/or threading) or offloading the x * beta multiplication to GPU (since 2.19, not yet in rstan). I however fear that this might be non-trivial to get working as the documentation is not quite there yet.

1 Like
#32

I don’t understand how can I considered it.

#33

poisson_lpmf(y[i] | mu[1]) becomes poisson_log_lpmf(y[i] | alpha + x*beta)

1 Like
#34

Wait, something is off I think. Should the original model not be

poisson_lpmf(y[i] | mu[1, i])

and the new one

poisson_log_lpmf(y[i] | alpha + x[i] * beta) ?

I am surprised that the original model worked given the mismatch in dimensions between y[i] and mu[1]. Are you recovering the parameters if you start from simulated data?

1 Like
#35

the original likelihood

// Likelihood

  for (i in 1:N) 
      target += log_mix(pi[i],
                  poisson_lpmf(y[i] | mu[1]) - log1m_exp(poisson_lpmf(0 | mu[1])),
                  poisson_lpmf(y[i] | mu[2]) - log1m_exp(poisson_lpmf(0 | mu[2])));

the dimension of y[i] and mu[1] is similalr, [N] and as you said the

poisson_lpmf(y[i] | mu[1]) becomes poisson_log_lpmf(y[i] | alpha + x*beta)

what about log1m_exp(poisson_lpmf(0 | mu[1]) part?
Will it have log1m_exp(poisson_log_lpmf(0 | alpha + x*beta) form?

#36

That is not correct I think. y is of length N but y[i] is of length 1.

#37

I understand you concern. I want to say y and mu have the same dimension. But in my understand, you are write that how poisson_lpmf(y[i] | mu[1]) was working.
now will use

for (i in 1:N) 
      target += log_mix(pi[i],
                  poisson_log_lpmf(y[i] | alpha + x*beta) - log1m_exp(poisson_log_lpmf(0 | alpha + x*beta)),
                  poisson_log_lpmf(y[i] | alpha + x*beta+ci) - log1m_exp(poisson_log_lpmf(0 | alpha + x*beta+ci)));

therefore, does your concern will have an effect in this analysis? I think the above expression is similar to what you stated before, poisson_lpmf(y[i] | mu[1, i]) .

I have also one question. Is there any options that can express mu[1] < mu[2]?
FYI- In my case I used mu[2]=mu[1]+ci.

#38

I think you are fine with this approach. For more complicated models, you could use the ordered type.

I think what you need is (poisson_lpmf(y[i] | alpha + x[i] * beta) otherwise you are using all data x for every single outcome y[i]. Maybe that is the goal in your analysis but it implies that you have the same expected value for each y[i].

#39

I don’t understand this statement.

Let me justify the model. It is the mixture of truncated Poisson (it was zero truncated, but now I changed it to one truncated). Each observation will have probability pi[1] to be in the first group and pi[2] in the second group. Then, after I got pi from the posterior,I will categorize each observation according to pi's value (eg- below 0.6,will categorize in group 1 and so on).
As I mentioned above the model is change to 1 truncated. Therefore can i write it like this?

 for (i in 1:N) 
  target += log_mix(pi[i],
                    poisson_lpmf(y[i]|mu[1,i])-log1m_exp(poisson_lpmf(0|mu[1,i]))-log1m_exp(poisson_lpmf(1|mu[1,i])),
                    poisson_lpmf(y[i]|mu[2,i])-log1m_exp(poisson_lpmf(0|mu[2,i]))-log1m_exp(poisson_lpmf(1|mu[2,i])));