Mixture Bayesian Poisson Regression Model

#24

I don’t think I can really follow this. Could you post your current Stan code and describe where the data come from and what inferences you are trying to make, which questions are you trying to answer?

#25

my current code is the following:

data {

int<lower=0> N;           // number of data points
int <lower=1> y[N];         // observations
matrix[N,4] x;               // Covariates 
real<lower=0> mi[N];
}

transformed data {
real<lower=0> Ami;
Ami=mean(mi);
}

parameters {
real<lower=0> ci; // any Positive number added on the mu[1] to get mu[2] 
vector[4] beta;
real alpha;
vector<lower=0, upper=1>[N] pi;  // mixing proportions; need to have pi's at observation level

}

transformed parameters{
vector<lower=0>[N] mu[2];    // locations of mixture components; need to have mu's for each observation
    mu[1] = exp(alpha + x*beta);
    mu[2] = exp((alpha + x*beta)+ci);
  
}

model {

// Priors
  beta ~ normal(0, 1e1);
  alpha ~ normal(0,1e1);
  ci ~ normal(0, 1e1);
  pi ~ beta(mi,Ami);     // how can we change to Dirichlet distribution using mi and mci?
// Likelihood

  for (i in 1:N) 
      target += log_mix(pi,
                  poisson_lpmf(y[i] | mu[1]) - log1m_exp(poisson_lpmf(0 | mu[1])),
                  poisson_lpmf(y[i] | mu[2]) - log1m_exp(poisson_lpmf(0 | mu[2])));
  
}

When I run the above code, I got this error

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for: 
  log_mix(vector, real, real)
Available argument signatures for log_mix:
  log_mix(real, real, real)
  log_mix(real[], real[])
  log_mix(real[], vector)
  log_mix(real[], row vector)
  log_mix(real[], vector[])
  log_mix(real[], row vector[])
  log_mix(vector, real[])
  log_mix(vector, vector)
  log_mix(vector, row vector)
  log_mix(vector, vector[])
  log_mix(vector, row vector[])
  log_mix(row vector, real[])
  log_mix(row vector, vector)
  log_mix(row vector, row vector)
  log_mix(row vector, vector[])
  log_mix(row vector, row vector[])

  error in 'modeld8755c72_ff542624806624524b7b715fc8665283' at line 56, column 82
  -------------------------------------------------
    54:       target += log_mix(pi,
    55:                   poisson_lpmf(y[i] | mu[1]) - log1m_exp(poisson_lpmf(0 | mu[1])),
    56:                   poisson_lpmf(y[i] | mu[2]) - log1m_exp(poisson_lpmf(0 | mu[2])));
                                                                                         ^
    57:   
  -------------------------------------------------
Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model 'ff542624806624524b7b715fc8665283' due to the above error.
In addition: Warning message:
In file.remove(tf) :
  cannot remove file 'C:\Users\521-04\AppData\Local\Temp\Rtmpc35uis\filed82a886c34', reason 'Permission denied'

I understand the problem here that the log_mix is real and it will not consider vector mixing proportion. But I need the mixing proportion at observation level and how can i handle this.

FYI: When I change the mixing proportion to real<lower=0, upper=1> pi it was working properly and can get the N amount of mu[1] and mu[2] , but could not get the mixing proportion at observation level.

#26

You probably want

target += log_mix(pi[i], ...
2 Likes
#28

Please look the following stan code. I have checked this code with small data set and it is ok. But, when I used my complete data (around 400,000 cases) it will not start the estimation yet (it’s the 5th days). Therefore, would you please give any comment or idea how can I make it fast.

data {
   int<lower=0> N;           // number of data points
  int <lower=1> y[N];         // observations
  matrix[N,4] x;               // Covariates 
 
}

parameters {
  real<lower=0> ci;                         // any Positive number added on the mu[1] to get mu[2] 
  real<lower=0> d;                        // will have uniform distribution that balances the shape parameters
  vector[4] beta;
  real alpha;
  vector<lower=0, upper=1>[N] pi;           // mixing proportions
}

transformed parameters{
  vector<lower=0>[N] mu[2];    // locations of mixture components
  mu[1] = exp(alpha + x*beta);
  mu[2] = exp((alpha + x*beta)+ci);
  
}

model {
  
  // Priors
  beta ~ normal(0, 1e3);
  alpha ~ normal(0,1e3);
  d ~ uniform(0,10000);    
  ci ~ normal(0, 1e3);
  
  
  // Likelihood
  for (i in 1:N) 
    target += log_mix(pi[i],
                      poisson_lpmf(y[i] | mu[1]) - log1m_exp(poisson_lpmf(0 | mu[1])),
                      poisson_lpmf(y[i] | mu[2]) - log1m_exp(poisson_lpmf(0 | mu[2])));
  
}

It keeps this message for the last five days

SAMPLING FOR MODEL 'af252afcc7248409611d59094236d80a' NOW (CHAIN 1).
#29

In no particular order:

  • When you have d ~ uniform(0, 10000), you have to declare real<lower=0, upper = 10000> d; The model statement is then no longer necessary.
  • In general, the Stan team discourages uniform priors. For d you are probably better of with a gamma or exponential style prior.
  • Your priors are too wide. Especially alpha, beta, and ci. exp(400) overflows and exp(-400) underflows. Scaling of Log-likelihood value in Hamiltonian Monte Carlo
  • Irrespective of the numerical issues, I would try to use more informative priors. What is the scale, you alpha, beta, and ci to be on? Can you standardize the x’s to make the betas on a scale closer to normal ~ (0, 1)?
  • You can work on the log scale with poisson_log_lpmf which should help with numerical stability because you avoid the mu = exp() step.
  • I don’t understand the poisson distribution you are using but I wonder whether there are no ways to simplify the distributions in log_mix. mu[1] and mu[2] only differ by term ci. The poisson is also relatively easy to think about at the log scale.
#30

400 000 cases is probably just too much for Stan. While there might be some improvements to speed available, I don’t think you are likely to get Stan to work with that. The problem is not primarily the number of data points (although that is big), but the number of parameters. In a small test I did elsewhere sampling from 100 000 i.i.d. normal, which is probably the easiest model you can think of, takes 40 minutes on my PC. Your model is both more than 4 times larger and waaay more complex. A simple hierarchical model with 25 000 parameters took me about 1.5 hours, and the same model with 100 000 parameters didn’t complete in 16 hours, so that’s just to provide some context.

In my experience it is however OK to split such large datasets randomly into smaller chunks that are manageable for Stan, you just need to check that the estimates of the shared parameters are similar across the chunks (which they will be unless there is some problem with your model).

Some stuff you can do to increase speed:

  • Use poisson_log_lpmf or poisson_log_glm_lpmf to avoid the exponentiation
  • Put tighter priors on everything (especially the uniform prior on d might cause some trouble), I would be surprised if anything wider than normal(0, 5) is actually needed for anything but the intercept, provided your predictors are normalized (which is a good idea anyway).
  • Check that you have enough memory - once you sart swapping to disk, you performance is dead. Note that RStan will allocate memory to store all the posterior draws from your model (which I would guess to be like 18GB/chain in your case).You can reduce that by 66% by moving mu from transformed parameters to model. You can use CmdStan to avoid storing draws in memory at all. You may also consider running fewer chains in parallel.
  • Store the result of alpha + x*beta, and reuse it for mu[2] (or preferrably use poisson_log_lpmf and avoid storing mu at all).
  • Provide good inits
1 Like
#31

Actually with the latest Stan additions, you just might be able to push your model to feasible region with all the datapoints, using map_rect for the likelihood loop (with MPI and/or threading) or offloading the x * beta multiplication to GPU (since 2.19, not yet in rstan). I however fear that this might be non-trivial to get working as the documentation is not quite there yet.

1 Like
#32

I don’t understand how can I considered it.

#33

poisson_lpmf(y[i] | mu[1]) becomes poisson_log_lpmf(y[i] | alpha + x*beta)

1 Like
#34

Wait, something is off I think. Should the original model not be

poisson_lpmf(y[i] | mu[1, i])

and the new one

poisson_log_lpmf(y[i] | alpha + x[i] * beta) ?

I am surprised that the original model worked given the mismatch in dimensions between y[i] and mu[1]. Are you recovering the parameters if you start from simulated data?

1 Like
#35

the original likelihood

// Likelihood

  for (i in 1:N) 
      target += log_mix(pi[i],
                  poisson_lpmf(y[i] | mu[1]) - log1m_exp(poisson_lpmf(0 | mu[1])),
                  poisson_lpmf(y[i] | mu[2]) - log1m_exp(poisson_lpmf(0 | mu[2])));

the dimension of y[i] and mu[1] is similalr, [N] and as you said the

poisson_lpmf(y[i] | mu[1]) becomes poisson_log_lpmf(y[i] | alpha + x*beta)

what about log1m_exp(poisson_lpmf(0 | mu[1]) part?
Will it have log1m_exp(poisson_log_lpmf(0 | alpha + x*beta) form?

#36

That is not correct I think. y is of length N but y[i] is of length 1.

#37

I understand you concern. I want to say y and mu have the same dimension. But in my understand, you are write that how poisson_lpmf(y[i] | mu[1]) was working.
now will use

for (i in 1:N) 
      target += log_mix(pi[i],
                  poisson_log_lpmf(y[i] | alpha + x*beta) - log1m_exp(poisson_log_lpmf(0 | alpha + x*beta)),
                  poisson_log_lpmf(y[i] | alpha + x*beta+ci) - log1m_exp(poisson_log_lpmf(0 | alpha + x*beta+ci)));

therefore, does your concern will have an effect in this analysis? I think the above expression is similar to what you stated before, poisson_lpmf(y[i] | mu[1, i]) .

I have also one question. Is there any options that can express mu[1] < mu[2]?
FYI- In my case I used mu[2]=mu[1]+ci.

#38

I think you are fine with this approach. For more complicated models, you could use the ordered type.

I think what you need is (poisson_lpmf(y[i] | alpha + x[i] * beta) otherwise you are using all data x for every single outcome y[i]. Maybe that is the goal in your analysis but it implies that you have the same expected value for each y[i].

#39

I don’t understand this statement.

Let me justify the model. It is the mixture of truncated Poisson (it was zero truncated, but now I changed it to one truncated). Each observation will have probability pi[1] to be in the first group and pi[2] in the second group. Then, after I got pi from the posterior,I will categorize each observation according to pi's value (eg- below 0.6,will categorize in group 1 and so on).
As I mentioned above the model is change to 1 truncated. Therefore can i write it like this?

 for (i in 1:N) 
  target += log_mix(pi[i],
                    poisson_lpmf(y[i]|mu[1,i])-log1m_exp(poisson_lpmf(0|mu[1,i]))-log1m_exp(poisson_lpmf(1|mu[1,i])),
                    poisson_lpmf(y[i]|mu[2,i])-log1m_exp(poisson_lpmf(0|mu[2,i]))-log1m_exp(poisson_lpmf(1|mu[2,i])));
#40

I think this should work for well behaved data.

I would also use poisson_log_lpmf instead of poisson_lpmf, tighter priors, and define mu in the model block. You could still run into the memory issues that Martin explained above.

#41

would you please show me how can I write in my case?

#42
data {
  int<lower=0> N;           // number of data points
  int <lower=1> y[N];         // observations
  matrix[N,4] x;               // Covariates 
  vector[N] z;                //shape parameter
  real<lower=0> Az;           //shape parameter 
}

parameters {
  real<lower=0> ci;                         // any Positive number added on the mu[1] to get mu[2] 
  real<lower=0> d;                        // will have uniform distribution that balances the shape parameters
  vector[4] beta;
  real alpha;
  vector<lower=0, upper=1>[N] pi;           // mixing proportions
}

transformed parameters{
}

model {
  vector<lower=0>[N] mu[2];    // locations of mixture components
  mu[1] = alpha + x*beta;
  mu[2] = mu[1] + ci;
  // Priors
  beta ~ normal(0, 2.5); // adapt to your specific knowledge
  alpha ~ normal(0,2.5);
  d ~ normal(0,5);    
  ci ~ normal(0, 2.5);
  pi ~ beta(z*d,Az*d); // a beta(100, 100) places practically all probability mass between .4 and .6
  
  // Likelihood
  for (i in 1:N) 
    target += log_mix(pi[i],
                      poisson_log_lpmf(y[i] | mu[1, i]) - log1m_exp(poisson_log_lpmf(0 | mu[1, i])),
                      poisson_log_lpmf(y[i] | mu[2, i]) - log1m_exp(poisson_log_lpmf(0 | mu[2, i])));
  
}

I haven’t tested it but this is a sensible implementation if I understand you correctly.

2 Likes
#44

Thank you @stijn for your constructive comments. The command is working now. Let me show you some values of Y and mu[1]. Do you think that the estimated Poisson rate parameter is health (seems underestimated?)?

     y      mu[1]
1  104      9.17
2   83      10.77
3   78      9.07
4   60      7.39
5   45      4.82
6   45      8.69
7   41      4.15
8    2      0.70
9    2      0.73
10   2      0.41
11   2      0.84
12   2      0.62
13   2      0.96
14   2      0.73
15   2      0.74
16   2      1.22

Therefore, would you please give me some suggestions over it.

#45

Is it ok if I consider truncated at 1, (y>1)

for (i in 1:N) 
        target += log_mix(pi[i],
                    poisson_lpmf(y[i]|mu[1,i])-log1m_exp(poisson_lpmf(0|mu[1,i]))
                                              -log1m_exp(poisson_lpmf(1|mu[1,i])),
                    poisson_lpmf(y[i]|mu[2,i])-log1m_exp(poisson_lpmf(0|mu[2,i]))
                                              -log1m_exp(poisson_lpmf(1|mu[2,i])));

and please look my dread on the estimated parameter here.