Prior posterior distribution for Zero inflated Poisson model with Random Effects

I am trying to derive prior predictive distributions for a zero inflated poisson model with random effects.

data {
  int<lower=0> N;
  int<lower=0> y[N];
  int<lower=1> N_cov;          
  matrix[N,N_cov] x;
  int<lower=1> J;         // number of groups
  int g[N];               //cluster indicator
  int run_estimation; // 0 for prior predictive, 1 for estimation
}
parameters {
  real<lower=0, upper=1> theta;
  real alpha;
  vector[N_cov] beta;
  vector[J] a_raw; 
  real<lower=0> sigma;  
}
transformed parameters{
	vector[J] a;
	a = a_raw * sigma;
}
model {
	
	alpha~normal(0,2); 
	beta~normal(0,2);
	
	sigma~cauchy(0,2);
	a_raw~normal(0,1);

if(run_estimation==1){
  for (n in 1:N) {
    if (y[n] == 0)
      target += log_sum_exp(bernoulli_lpmf(1 | theta),
                            bernoulli_lpmf(0 | theta)
                            + poisson_log_lpmf(y[n] |a[g[n]]+ alpha + x[n]*beta));
      else
        target += bernoulli_lpmf(0 | theta)
                  + poisson_log_lpmf(y[n] | a[g[n]]+ alpha + x[n]*beta);
  }
 }
}
generated quantities{
	
  int ypred[N]; 
  int zero;
	
	for(n in 1:N){
	     zero=bernoulli_rng(theta);
  
            ypred[n]=(1-zero)*poisson_log_rng(a[g[n]]+ alpha +x[n]*beta);
  }
}

When I run the model with run_evaluation=0 to get prior predictive distributions I get the following
exception

Exception: Log rate parameter is 24.2495, but must be less than 20.79

together with these warnings

The same happens if I reduce the prior variances.

When I fit the model in estimation mode (run_estimation=1), the data seem to have enough information to prevent huge coefficients and I don’t get the exception. I still get the warnings though.

This problem gets even worse with a more complicated model with mixtures of zero inflated Poissons.
There the probability of each component turns to be NA given the high values of coefficients in the poisson model.

Any suggestion?

Did you try replacing the Cauchy prior with something tighter? Like a normal or something? Cauchy’s have heavy tails, and so it’s super conceivable that it would blow up a log scale.

For instance, sampling 20, cauchy(0, 2)s in R:

> rcauchy(20, 0, 2)
 [1]  2.88322466  1.23568309  6.54969674  1.72653796 -0.63361340 -0.21271572 -0.08195728 -0.18053311  2.48106495
[10] -0.81135430 15.45362119 -0.05190334 -0.31957886  0.42332266  1.15329784 -1.60977675 16.99981285 -0.14050915
[19]  3.56674963 -2.23761791
poisson_log_lpmf(y[n] | a[g[n]]+ alpha + x[n]*beta)

Does it make sense to have your parameters on an unconstrained scale and then constrain them with a link function? Like unconstrain a, alpha, and beta, and then do:

poisson_log_lpmf(y[n] | exp(a[g[n]]+ alpha + x[n]*beta))

Or something? Like: https://en.wikipedia.org/wiki/Generalized_linear_model#Link_function

Might be easier to keep things under control that way.

Thank you for your comment.
I did try a normal prior and things blow up anyway.

I see you point about constraining the parameters but I actually
don’t understand your use of both the log
with poisson_log_lpmf and the exponential
on the linear predictor…

I thought the log option is already taking care of exponentiating the linear predictor?

but I actually don’t understand your use of both the log

I made a mistake :D. Good catch. Should be poisson_lpmf.

Ah ok.

Yes I tried that too. Replacing poisson_log_lpmf with poisson_lpmf
and exponentiate the linear predictor myself.
In this case I don’t get the exception but the log rate can still be really large
and in the mixture of 3 zero inflated poissons the mixture components blow up
and I get NAs

In this case I don’t get the exception but the log rate can still be really large

Is this with the data in place or with the prior predictive stuff?

mixture of 3 zero inflated Poissons the mixture components blow up

I’m assuming this is one zero term + three Poissons mixed together?

And are you fitting generated data? If not, maybe start there. It’ll help diagnose sampling problems with the model without worrying about how good a fit the model will actually provide to the data.

Also it might be good to post the full mixture model. Maybe there’s something weird with that.

I am still trying to figure out
what is happening when I sample from the prior predictive.
Here is where I get huge values.

I haven’t fitted generated data yet. I should do it. Thanks.

I’m assuming this is one zero term + three Poissons mixed together?

It is actually a mixture of 4 (sorry four not three) zero inflated poissons not a zero inflation of a poisson mixture.
I have 4 latent classes each with a zero inflated poisson outcome.

Here is the code. I hope it’s not too messy.

data {
  int<lower=1> N;              // sample size
  int<lower=1> N_cov;          // number of covariates
  int<lower=1> N_type;         // number of types
  int<lower=1> J;         // number of groups

  
  int<lower=1,upper=3> z[N];   // treatment assigned
  int<lower=0,upper=1> d[N];   // treatment received  
  int<lower=0> y[N];   // outcomes
  matrix[N,N_cov] x;          // covariates  
  int g[N];    // group indicator
  int run_estimation; 

}
parameters {

  // OUTCOME MODEL PARAMETERS
  real alpha_rc1;  //intercept
  real alpha_rc2;  //intercept
  real alpha_rc3;  //intercept
  real alpha_pc1;  //intercept
  real alpha_pc2;  //intercept
  real alpha_pc3;  //intercept
  real alpha_at1;  //intercept
//  real<lower=alpha_at1> alpha_at2;  //intercept
//  real<lower=alpha_at2> alpha_at3;  //intercept
  real alpha_nt1;  //intercept
  real alpha_nt2;  //intercept
  real alpha_nt3;  //intercept


  real<lower=0, upper=1> phi_rc1; // 
  real<lower=0, upper=1> phi_rc2; // 
  real<lower=0, upper=1> phi_rc3; // 
  real<lower=0, upper=1> phi_pc1; // 
  real<lower=0, upper=1> phi_pc2; // 
  real<lower=0, upper=1> phi_pc3; // 
  real<lower=0, upper=1> phi_at1; // 
//  real<lower=0, upper=1> phi_at2; // 
//  real<lower=0, upper=1> phi_at3; // 
  real<lower=0, upper=1> phi_nt1; // 
  real<lower=0, upper=1> phi_nt2; // 
  real<lower=0, upper=1> phi_nt3; // 

  vector[N_cov] beta_pc;  //slopes (same for all assignment levels)
  vector[N_cov] beta_at;  //slopes (same for all assignment levels)
  vector[N_cov] beta_nt;  //slopes (same for all assignment levels)
  vector[N_cov] beta_rc;  //slopes (same for all assignment levels)
  
  real a[J]; 
  real<lower=0,upper=10> sigma;  

  // COMPLIANCE MODEL PARAMETERS
  real gamma[N_type-1];                   // intercepts for compliance models
  vector[N_cov] delta[N_type-1];      // coefficients for compliance models
} 
transformed parameters {


  vector[N] mu_rc1;    //the linear predictor
  vector[N] mu_rc2;    //the linear predictor
  vector[N] mu_rc3;    //the linear predictor
  vector[N] mu_pc1;    //the linear predictor
  vector[N] mu_pc2;    //the linear predictor
  vector[N] mu_pc3;    //the linear predictor
  vector[N] mu_at1;    //the linear predictor
  vector[N] mu_at2;    //the linear predictor
  vector[N] mu_at3;    //the linear predictor
  vector[N] mu_nt1;    //the linear predictor
  vector[N] mu_nt2;    //the linear predictor
  vector[N] mu_nt3;    //the linear predictor


for(n in 1:N){
  mu_rc1[n] = exp(a[g[n]]+alpha_rc1+x[n]*beta_rc); //using the log link 
  mu_rc2[n] = exp(a[g[n]]+alpha_rc2+x[n]*beta_rc); //using the log link 
  mu_rc3[n] = exp(a[g[n]]+alpha_rc3+x[n]*beta_rc); //using the log link 
  mu_pc1[n] = exp(a[g[n]]+alpha_pc1+x[n]*beta_pc); //using the log link 
  mu_pc2[n] = exp(a[g[n]]+alpha_pc2+x[n]*beta_pc); //using the log link 
  mu_pc3[n] = exp(a[g[n]]+alpha_pc3+x[n]*beta_pc); //using the log link 
  mu_at1[n] = exp(a[g[n]]+alpha_at1+x[n]*beta_at); //using the log link 
  mu_at2[n] = exp(a[g[n]]+alpha_at1+x[n]*beta_at); //using the log link  
  mu_at3[n] = exp(a[g[n]]+alpha_at1+x[n]*beta_at); //using the log link 
  mu_nt1[n] = exp(a[g[n]]+alpha_nt1+x[n]*beta_nt); //using the log link 
  mu_nt2[n] = exp(a[g[n]]+alpha_nt2+x[n]*beta_nt); //using the log link 
  mu_nt3[n] = exp(a[g[n]]+alpha_nt3+x[n]*beta_nt); //using the log link 
   }
   
}


model {

  // TEMPORARY VARIABLES
  vector[N_type] log_pi;
  real mix3[3];
  real mix2[2];

  // PRIORS

  alpha_rc1 ~ normal(0, 2);
  alpha_rc2 ~ normal(0, 2);
  alpha_rc3 ~ normal(0, 2);
  alpha_pc1 ~ normal(0, 2);
  alpha_pc2 ~ normal(0, 2);
  alpha_pc3 ~ normal(0, 2);
  alpha_nt1 ~ normal(0, 2);
  alpha_nt2 ~ normal(0, 2);
  alpha_nt3 ~ normal(0, 2);
  alpha_at1 ~ normal(0, 2);


  beta_rc ~ normal(0, 2);
  beta_pc ~ normal(0, 2);
  beta_at ~ normal(0, 2);
  beta_nt ~ normal(0, 2);

    sigma ~normal(0,sigma);

    a ~ normal(0,sigma);

    
  // compliance model
  gamma ~ normal(0, 2.5);
  for(k in 1:(N_type-1))
    delta[k] ~ normal(0, 2.5);
  
  if(run_estimation==1) {
  // MODELS FOR OUTCOME
  for(n in 1:N){
    
    // Multinomial model (k: 1=nt, 2=rc, 3=pc, 4=at)
    for(k in 1:(N_type-1))
        log_pi[k] = gamma[k] + x[n]*delta[k];
        log_pi[N_type] = 0;

      
    // Always Takers
    if(z[n] == 1 && d[n] == 1){
    	if(y[n] == 0)
      target+=log_pi[4] - log_sum_exp(log_pi)+log_sum_exp(bernoulli_lpmf(1| phi_at1),
         bernoulli_lpmf(0| phi_at1) + poisson_lpmf(y[n]|mu_at1[n]));
      else
      target+=log_pi[4] - log_sum_exp(log_pi)+
         bernoulli_lpmf(0| phi_at1) + poisson_lpmf(y[n]|mu_at1[n]);
    }
    
    // Never Takers
    else if(z[n] == 3 && d[n] == 0){
    	if(y[n] == 0)
      target+=log_pi[1] - log_sum_exp(log_pi)+log_sum_exp(bernoulli_lpmf(1| phi_nt3),
         bernoulli_lpmf(0| phi_nt3) + poisson_lpmf(y[n]|mu_nt3[n]));
      else
      target+=log_pi[1] - log_sum_exp(log_pi)+
         bernoulli_lpmf(0| phi_nt3) + poisson_lpmf(y[n]|mu_nt3[n]);
    }
    
    
    // R-Complier, P-Complier or Never Taker
    else if(z[n] == 1 && d[n] == 0){
    	if(y[n] == 0){
	    	mix3[1]= log_pi[1] - log_sum_exp(log_pi)
	        +log_sum_exp(bernoulli_lpmf(1| phi_nt1),
	         bernoulli_lpmf(0| phi_nt1) + poisson_lpmf(y[n]|mu_nt1[n])); // Never Taker
	
	      mix3[2]=log_pi[2] - log_sum_exp(log_pi)
	        +log_sum_exp(bernoulli_lpmf(1| phi_rc1),
	         bernoulli_lpmf(0| phi_rc1) + poisson_lpmf(y[n]|mu_rc1[n])); // R-Complier
	    
	      mix3[3]=log_pi[3] - log_sum_exp(log_pi)
	        +log_sum_exp(bernoulli_lpmf(1| phi_pc1),
	         bernoulli_lpmf(0| phi_pc1) + poisson_lpmf(y[n]|mu_pc1[n])); // P-Complier
	    	}
	   else{
	    	mix3[1]= log_pi[1] - log_sum_exp(log_pi)
	        +bernoulli_lpmf(0| phi_nt1) + poisson_lpmf(y[n]|mu_nt1[n]); // Never Taker
	
	      mix3[2]=log_pi[2] - log_sum_exp(log_pi)
	        +bernoulli_lpmf(0| phi_rc1) + poisson_lpmf(y[n]|mu_rc1[n]); // R-Complier
	    
	      mix3[3]=log_pi[3] - log_sum_exp(log_pi)
	        +bernoulli_lpmf(0| phi_pc1) + poisson_lpmf(y[n]|mu_pc1[n]); // P-Complier
     }
        
      target+=  log_sum_exp(mix3);
    }
    
    // R-Complier or Never Taker
    else if(z[n] == 2 && d[n] == 0){
    	if(y[n] == 0){
	    	mix2[1]= log_pi[1] - log_sum_exp(log_pi)
	        +log_sum_exp(bernoulli_lpmf(1| phi_nt2),
	         bernoulli_lpmf(0| phi_nt2) + poisson_lpmf(y[n]|mu_nt2[n])); // Never Taker
	
	      mix2[2]=log_pi[2] - log_sum_exp(log_pi)
	        +log_sum_exp(bernoulli_lpmf(1| phi_rc2),
	         bernoulli_lpmf(0| phi_rc2) + poisson_lpmf(y[n]|mu_rc2[n])); // R-Complier
	    	    	}
	   else{
	    	mix2[1]= log_pi[1] - log_sum_exp(log_pi)
	        +bernoulli_lpmf(0| phi_nt2) + poisson_lpmf(y[n]|mu_nt2[n]); // Never Taker
	
	      mix2[2]=log_pi[2] - log_sum_exp(log_pi)
	        +bernoulli_lpmf(0| phi_rc2) + poisson_lpmf(y[n]|mu_rc2[n]); // R-Complier
	    
     }
        
      target+=  log_sum_exp(mix2);
    }

    // P-Complier or Always Taker
    else if(z[n] == 2 && d[n] == 1){
    	if(y[n] == 0){
	    	mix2[1]= log_pi[4] - log_sum_exp(log_pi)
	        +log_sum_exp(bernoulli_lpmf(1| phi_at1),
	         bernoulli_lpmf(0| phi_at1) + poisson_lpmf(y[n]|mu_at2[n])); // Never Taker
	
	      mix2[2]=log_pi[3] - log_sum_exp(log_pi)
	        +log_sum_exp(bernoulli_lpmf(1| phi_pc2),
	         bernoulli_lpmf(0| phi_pc2) + poisson_lpmf(y[n]|mu_pc2[n])); // R-Complier
	    	    	}
	   else{
	    	mix2[1]= log_pi[4] - log_sum_exp(log_pi)
	        +bernoulli_lpmf(0| phi_at1) + poisson_lpmf(y[n]|mu_at2[n]); // Always Taker
	
	      mix2[2]=log_pi[3] - log_sum_exp(log_pi)
	        +bernoulli_lpmf(0| phi_pc2) + poisson_lpmf(y[n]|mu_pc2[n]); // P-Complier
	    
     }
        
      target+=  log_sum_exp(mix2);
    }

    // R-Complier, P-Complier or Always Taker
    else if(z[n] == 3 && d[n] == 1){
    	if(y[n] == 0){
	    	mix3[1]= log_pi[4] - log_sum_exp(log_pi)
	        +log_sum_exp(bernoulli_lpmf(1| phi_at1),
	         bernoulli_lpmf(0| phi_at1) + poisson_lpmf(y[n]|mu_at3[n])); // Always Taker
	
	      mix3[2]=log_pi[2] - log_sum_exp(log_pi)
	        +log_sum_exp(bernoulli_lpmf(1| phi_rc3),
	         bernoulli_lpmf(0| phi_rc3) + poisson_lpmf(y[n]|mu_rc3[n])); // R-Complier
	    
	      mix3[3]=log_pi[3] - log_sum_exp(log_pi)
	        +log_sum_exp(bernoulli_lpmf(1| phi_pc3),
	         bernoulli_lpmf(0| phi_pc3) + poisson_lpmf(y[n]|mu_pc3[n])); // P-Complier
	    	}
	   else{
	    	mix3[1]= log_pi[4] - log_sum_exp(log_pi)
	        +bernoulli_lpmf(0| phi_at1) + poisson_lpmf(y[n]|mu_at3[n]); // Never Taker
	
	      mix3[2]=log_pi[2] - log_sum_exp(log_pi)
	        +bernoulli_lpmf(0| phi_rc3) + poisson_lpmf(y[n]|mu_rc3[n]); // R-Complier
	    
	      mix3[3]=log_pi[3] - log_sum_exp(log_pi)
	        +bernoulli_lpmf(0| phi_pc3) + poisson_lpmf(y[n]|mu_pc3[n]); // P-Complier
     }
        
      target+=  log_sum_exp(mix3);        
    }
     
  }
 }
}


// FOR FINITE SAMPLE INFERENCE
generated quantities{

  real at_y1_bar;
  real at_y2_bar;
  real at_y3_bar;
  real nt_y1_bar;
  real nt_y2_bar;
  real nt_y3_bar;
  real rc_y1_bar;
  real rc_y2_bar;
  real rc_y3_bar;
  real pc_y1_bar;
  real pc_y2_bar;
  real pc_y3_bar;
  real y1_bar;
  real y2_bar;
  real y3_bar;


  real at_y1_pos;
  real at_y2_pos;
  real at_y3_pos;
  real nt_y1_pos;
  real nt_y2_pos;
  real nt_y3_pos;
  real rc_y1_pos;
  real rc_y2_pos;
  real rc_y3_pos;
  real pc_y1_pos;
  real pc_y2_pos;
  real pc_y3_pos;
  real y1_pos;
  real y2_pos;
  real y3_pos;


    // Types
    vector[N] type_at;
    vector[N] type_nt;
    vector[N] type_pc;
    vector[N] type_rc;

    vector[N] ypred;  //for posterior checks

  { // to create temporary variables


    // Generate science table
    vector[N] y1;
    vector[N] y2;
    vector[N] y3;

    vector[N] ybin1;
    vector[N] ybin2;
    vector[N] ybin3;


		int zero1;
		int zero2;
		int zero3;

    // Temporary variables
    vector[N_type] log_pi_tmp;
    vector[2] mix2_pi_tmp;
    vector[3] mix3_pi_tmp;
    int type_tmp;


    // Initialize types to zero
    for(n in 1:N){
      type_at[n] = 0;
      type_nt[n] = 0;
      type_rc[n] = 0;
      type_pc[n] = 0;

    }


    // Loop through observation
    for(n in 1:N){

      // Multinomial model (k: 1=nt, 2=rc, 3=pc, 4=at)
      for(k in 1:(N_type-1))
          log_pi_tmp[k] = gamma[k] + x[n]*delta[k];
          log_pi_tmp[N_type] = 0;


      // Always Takers
      if(z[n] == 1 && d[n] == 1){
        type_at[n] = 1;

        zero1=bernoulli_rng(phi_at1);
        zero2=bernoulli_rng(phi_at1);
        zero3=bernoulli_rng(phi_at1);

        y1[n] = y[n];
        y2[n] = (1-zero2)*poisson_rng(mu_at2[n]);
        y3[n] = (1-zero3)*poisson_rng(mu_at3[n]);

        ypred[n]=(1-zero1)*poisson_rng(mu_at1[n]);

      }

      // Never Takers
      else if(z[n] == 3 && d[n] == 0){
        type_nt[n] = 1;

        zero1=bernoulli_rng(phi_nt1);
        zero2=bernoulli_rng(phi_nt2);
        zero3=bernoulli_rng(phi_nt3);

        y1[n] = (1-zero1)*poisson_rng(mu_nt1[n]);
        y2[n] = (1-zero2)*poisson_rng(mu_nt2[n]);
        y3[n] = y[n];

        ypred[n]=(1-zero3)*poisson_rng(mu_nt3[n]);

      }


      // R-Complier, P-Complier or Never Taker
      else if(z[n] == 1 && d[n] == 0){
    	if(y[n] == 0){
	      // generate type
        mix3_pi_tmp[1] = exp(log_pi_tmp[1] +log_sum_exp(bernoulli_lpmf(1| phi_nt1),
	         bernoulli_lpmf(0| phi_nt1) + poisson_lpmf(y[n]|mu_nt1[n])));   // Never Taker
        mix3_pi_tmp[2] = exp(log_pi_tmp[2] +log_sum_exp(bernoulli_lpmf(1| phi_rc1),
	         bernoulli_lpmf(0| phi_rc1) + poisson_lpmf(y[n]|mu_rc1[n])));   // R-Complier
        mix3_pi_tmp[3] = exp(log_pi_tmp[3] +log_sum_exp(bernoulli_lpmf(1| phi_pc1),
	         bernoulli_lpmf(0| phi_pc1) + poisson_lpmf(y[n]|mu_pc1[n])));   // P-Complier
    	}
    	else{
 	      // generate type
        mix3_pi_tmp[1] = exp(log_pi_tmp[1] +
	         bernoulli_lpmf(0| phi_nt1) + poisson_lpmf(y[n]|mu_nt1[n]));   // Never Taker
        mix3_pi_tmp[2] = exp(log_pi_tmp[2] +
	         bernoulli_lpmf(0| phi_rc1) + poisson_lpmf(y[n]|mu_rc1[n]));   // R-Complier
        mix3_pi_tmp[3] = exp(log_pi_tmp[3] +
	         bernoulli_lpmf(0| phi_pc1) + poisson_lpmf(y[n]|mu_pc1[n]));   // P-Complier

    	}

        mix3_pi_tmp = mix3_pi_tmp/sum(mix3_pi_tmp);
        type_tmp = categorical_rng(mix3_pi_tmp);

        // generate outcomes
        if(type_tmp == 1){ // Never Taker
          type_nt[n] = 1;

          zero1=bernoulli_rng(phi_nt1);
          zero2=bernoulli_rng(phi_nt2);
          zero3=bernoulli_rng(phi_nt3);

          y1[n] = y[n];
          y2[n] = (1-zero2)*poisson_rng(mu_nt2[n]);
          y3[n] = (1-zero3)*poisson_rng(mu_nt3[n]);

          ypred[n]=(1-zero1)*poisson_rng(mu_nt1[n]);

        }

        if(type_tmp == 2){ // RComplier
          type_rc[n] = 1;

          zero1=bernoulli_rng(phi_rc1);
          zero2=bernoulli_rng(phi_rc2);
          zero3=bernoulli_rng(phi_rc3);

          y1[n] = y[n];
          y2[n] = (1-zero2)*poisson_rng(mu_rc2[n]);
          y3[n] = (1-zero3)*poisson_rng(mu_rc3[n]);

          ypred[n]=(1-zero1)*poisson_rng(mu_rc1[n]);

        }
        if(type_tmp == 3){ // PComplier
          type_pc[n] = 1;

          zero1=bernoulli_rng(phi_pc1);
          zero2=bernoulli_rng(phi_pc2);
          zero3=bernoulli_rng(phi_pc3);

          y1[n] = y[n];
          y2[n] = (1-zero2)*poisson_rng(mu_pc2[n]);
          y3[n] = (1-zero3)*poisson_rng(mu_pc3[n]);

          ypred[n]=(1-zero1)*poisson_rng(mu_pc1[n]);

        }

      }

      // R-Complier or Never Taker
      else if(z[n] == 2 && d[n] == 0){
      	if(y[n] == 0){
        // generate type
        mix2_pi_tmp[1] = exp(log_pi_tmp[1] +log_sum_exp(bernoulli_lpmf(1| phi_nt2),
	         bernoulli_lpmf(0| phi_nt2) + poisson_lpmf(y[n]|mu_nt2[n])));   // Never Taker
        mix2_pi_tmp[2] = exp(log_pi_tmp[2] +log_sum_exp(bernoulli_lpmf(1| phi_rc2),
	         bernoulli_lpmf(0| phi_rc2) + poisson_lpmf(y[n]|mu_rc2[n])));   // R-Complier
      	}
      	else{
        // generate type
        mix2_pi_tmp[1] = exp(log_pi_tmp[1] +
	         bernoulli_lpmf(0| phi_nt2) + poisson_lpmf(y[n]|mu_nt2[n]));   // Never Taker
        mix2_pi_tmp[2] = exp(log_pi_tmp[2] +
	         bernoulli_lpmf(0| phi_rc2) + poisson_lpmf(y[n]|mu_rc2[n]));   // R-Complier

      	}
        mix2_pi_tmp = mix2_pi_tmp/sum(mix2_pi_tmp);
        type_tmp = categorical_rng(mix2_pi_tmp);

        // generate outcomes
        if(type_tmp == 1){ // Never Taker
          type_nt[n] = 1;

          zero1=bernoulli_rng(phi_nt1);
          zero2=bernoulli_rng(phi_nt2);
          zero3=bernoulli_rng(phi_nt3);


          y1[n] = (1-zero1)*poisson_rng(mu_nt1[n]);
          y2[n] = y[n];
          y3[n] = (1-zero3)*poisson_rng(mu_nt3[n]);

          ypred[n]=(1-zero2)*poisson_rng(mu_nt2[n]);

        }

        if(type_tmp == 2){ // RComplier
          type_rc[n] = 1;

          zero1=bernoulli_rng(phi_rc1);
          zero2=bernoulli_rng(phi_rc2);
          zero3=bernoulli_rng(phi_rc3);

          y1[n] = (1-zero1)*poisson_rng(mu_rc1[n]);
          y2[n] = y[n];
          y3[n] = (1-zero3)*poisson_rng(mu_rc3[n]);

          ypred[n]=(1-zero2)*poisson_rng(mu_rc2[n]);

        }

      }

      // P-Complier or Always Taker
      else if(z[n] == 2 && d[n] == 1){
       	if(y[n] == 0){
        // generate type
        mix2_pi_tmp[1] = exp(log_pi_tmp[4] +log_sum_exp(bernoulli_lpmf(1| phi_at1),
	         bernoulli_lpmf(0| phi_at1) + poisson_lpmf(y[n]|mu_at2[n])));   // Always Taker
        mix2_pi_tmp[2] = exp(log_pi_tmp[3] +log_sum_exp(bernoulli_lpmf(1| phi_pc2),
	         bernoulli_lpmf(0| phi_pc2) + poisson_lpmf(y[n]|mu_pc2[n])));   // P-Complier
       	}
       	else{
        // generate type
        mix2_pi_tmp[1] = exp(log_pi_tmp[4] +
	         bernoulli_lpmf(0| phi_at1) + poisson_lpmf(y[n]|mu_at2[n]));   // Always Taker
        mix2_pi_tmp[2] = exp(log_pi_tmp[3] +
	         bernoulli_lpmf(0| phi_pc2) + poisson_lpmf(y[n]|mu_pc2[n]));   // P-Complier
       	}
        mix2_pi_tmp = mix2_pi_tmp/sum(mix2_pi_tmp);
        type_tmp = categorical_rng(mix2_pi_tmp);

        // generate outcomes
        if(type_tmp == 1){ // Always Taker
          type_at[n] = 1;

          zero1=bernoulli_rng(phi_at1);
          zero2=bernoulli_rng(phi_at1);
          zero3=bernoulli_rng(phi_at1);

          y1[n] = (1-zero1)*poisson_rng(mu_at1[n]);
          y2[n] = y[n];
          y3[n] = (1-zero3)*poisson_rng(mu_at3[n]);

          ypred[n]=(1-zero2)*poisson_rng(mu_at2[n]);

        }

        if(type_tmp == 2){ // P-Complier
          type_pc[n] = 1;

          zero1=bernoulli_rng(phi_pc1);
          zero2=bernoulli_rng(phi_pc2);
          zero3=bernoulli_rng(phi_pc3);

          y1[n] = (1-zero1)*poisson_rng(mu_pc1[n]);
          y2[n] = y[n];
          y3[n] = (1-zero3)*poisson_rng(mu_pc3[n]);

          ypred[n]=(1-zero2)*poisson_rng(mu_pc2[n]);

        }

      }


      // R-Complier, P-Complier or Always Taker
      else if(z[n] == 3 && d[n] == 1){
      	if(y[n] == 0){
        // generate type
        mix3_pi_tmp[1] = exp(log_pi_tmp[4] +log_sum_exp(bernoulli_lpmf(1| phi_at1),
	         bernoulli_lpmf(0| phi_at1) + poisson_lpmf(y[n]|mu_at3[n])));    // Always Taker
        mix3_pi_tmp[2] = exp(log_pi_tmp[2] +log_sum_exp(bernoulli_lpmf(1| phi_rc3),
	         bernoulli_lpmf(0| phi_rc3) + poisson_lpmf(y[n]|mu_rc3[n])));    // R-Complier
        mix3_pi_tmp[3] = exp(log_pi_tmp[3] +log_sum_exp(bernoulli_lpmf(1| phi_pc3),
	         bernoulli_lpmf(0| phi_pc3) + poisson_lpmf(y[n]|mu_pc3[n])));    // P-Complier
      	}
      	else{
        // generate type
        mix3_pi_tmp[1] = exp(log_pi_tmp[4] +
	         bernoulli_lpmf(0| phi_at1) + poisson_lpmf(y[n]|mu_at3[n]));    // Always Taker
        mix3_pi_tmp[2] = exp(log_pi_tmp[2] +
	         bernoulli_lpmf(0| phi_rc3) + poisson_lpmf(y[n]|mu_rc3[n]));    // R-Complier
        mix3_pi_tmp[3] = exp(log_pi_tmp[3] +
	         bernoulli_lpmf(0| phi_pc3) + poisson_lpmf(y[n]|mu_pc3[n]));    // P-Complier

      	}
        mix3_pi_tmp = mix3_pi_tmp/sum(mix3_pi_tmp);
        type_tmp = categorical_rng(mix3_pi_tmp);

        // generate outcomes
        if(type_tmp == 1){ // Never Taker
          type_at[n] = 1;

          zero1=bernoulli_rng(phi_at1);
          zero2=bernoulli_rng(phi_at1);
          zero3=bernoulli_rng(phi_at1);

          y1[n] = (1-zero1)*poisson_rng(mu_at1[n]);
          y2[n] = (1-zero2)*poisson_rng(mu_at2[n]);
          y3[n] = y[n];

          ypred[n]=(1-zero3)*poisson_rng(mu_at3[n]);

        }

        if(type_tmp == 2){ // RComplier
          type_rc[n] = 1;

          zero1=bernoulli_rng(phi_rc1);
          zero2=bernoulli_rng(phi_rc2);
          zero3=bernoulli_rng(phi_rc3);

          y1[n] = (1-zero1)*poisson_rng(mu_rc1[n]);
          y2[n] = (1-zero2)*poisson_rng(mu_rc2[n]);
          y3[n] = y[n];

          ypred[n]=(1-zero3)*poisson_rng(mu_rc3[n]);

        }
        if(type_tmp == 3){ // PComplier
          type_pc[n] = 1;

          zero1=bernoulli_rng(phi_pc1);
          zero2=bernoulli_rng(phi_pc2);
          zero3=bernoulli_rng(phi_pc3);

          y1[n] = (1-zero1)*poisson_rng(mu_pc1[n]);
          y2[n] = (1-zero2)*poisson_rng(mu_pc2[n]);
          y3[n] = y[n];

          ypred[n]=(1-zero3)*poisson_rng(mu_pc3[n]);

        }

      }


    } // end for loop


    for(n in 1:N){
    	ybin1[n]=y1[n]>0;
    	ybin2[n]=y2[n]>0;
    	ybin3[n]=y3[n]>0;
    }

     // store mean values
    y1_bar = mean(y1);
    y2_bar = mean(y2);
    y3_bar = mean(y3);

    y1_pos = mean(ybin1);
    y2_pos = mean(ybin2);
    y3_pos = mean(ybin3);

    at_y1_bar = sum(type_at .* y1)/sum(type_at);
    at_y2_bar = sum(type_at .* y2)/sum(type_at);
    at_y3_bar = sum(type_at .* y3)/sum(type_at);

    at_y1_pos = sum(type_at .* ybin1)/sum(type_at);
    at_y2_pos = sum(type_at .* ybin2)/sum(type_at);
    at_y3_pos = sum(type_at .* ybin3)/sum(type_at);


    nt_y1_bar = sum(type_nt .* y1)/sum(type_nt);
    nt_y2_bar = sum(type_nt .* y2)/sum(type_nt);
    nt_y3_bar = sum(type_nt .* y3)/sum(type_nt);

    nt_y1_pos = sum(type_nt .* ybin1)/sum(type_nt);
    nt_y2_pos = sum(type_nt .* ybin2)/sum(type_nt);
    nt_y3_pos = sum(type_nt .* ybin3)/sum(type_nt);

   if(sum(type_rc)!=0){
    rc_y1_bar = sum(type_rc .* y1)/sum(type_rc);
    rc_y2_bar = sum(type_rc .* y2)/sum(type_rc);
    rc_y3_bar = sum(type_rc .* y3)/sum(type_rc);

    rc_y1_pos = sum(type_rc .* ybin1)/sum(type_rc);
    rc_y2_pos = sum(type_rc .* ybin2)/sum(type_rc);
    rc_y3_pos = sum(type_rc .* ybin3)/sum(type_rc);

   }
    else {
      rc_y1_bar = 99;
      rc_y2_bar = 99;
      rc_y3_bar = 99;

      rc_y1_pos = 99;
      rc_y2_pos = 99;
      rc_y3_pos = 99;

    }

    if(sum(type_pc)!=0){
	    pc_y1_bar = sum(type_pc .* y1)/sum(type_pc);
	    pc_y2_bar = sum(type_pc .* y2)/sum(type_pc);
	    pc_y3_bar = sum(type_pc .* y3)/sum(type_pc);

	    pc_y1_pos = sum(type_pc .* ybin1)/sum(type_pc);
	    pc_y2_pos = sum(type_pc .* ybin2)/sum(type_pc);
	    pc_y3_pos = sum(type_pc .* ybin3)/sum(type_pc);

    }
    else {
      pc_y1_bar = 99;
      pc_y2_bar = 99;
      pc_y3_bar = 99;

      pc_y1_pos = 99;
      pc_y2_pos = 99;
      pc_y3_pos = 99;
    }

   } // end temporary variables
 }

Here N_type is 4, the 4 latent classes (compliance groups).
We have intercepts alpha that differ across groups (rc, pc, at, nt)
and treatment assignments. (1,2,3),
whereas the beta coefficients for the variables in matrix x are constant
across treatment assignments.

Wow this is a big model! Lemme ask some questions then we’ll probably need to simplify this. There’s gonna be a way to write this out with less manual variable declaration too. Coding in these things manually is gonna be super error prone.

  1. Where are the mixtures coming in? It looks like you have a bunch of different grouping indicators. g, z, and d. Do you consider these to be unreliable and you want to try to figure out what groups people actually might be in? Or is there something else?

It feels almost like there are two models happening at once in here, like mu_rc1[n] = exp(a[g[n]]+alpha_rc1+x[n]*beta_rc); //using the log link ... is predicting the rates of a zero-inflated Poisson distribution as a function of known groups, and

log_pi[k] = gamma[k] + x[n]*delta[k]; is trying to predict group membership as a function of covariates

It seems like either of those could be a model on its own, or is there some other thing linking them together?

  1. For the ifs on z and d, is the situation that you know certain people can’t be in certain groups?

  2. Is it one result per person?

  3. Doing softmaxes like log_pi[4] - log_sum_exp(log_pi) is gonna lead to overflows/underflows. Use the softmax function and if you need things back on a log scale take the log after.

Would a mixture of Negative-Binomials be okay? I guess a mixture of mixtures (ZIPs) will be hard to fit.

You want to use loops and arrays because they’re easier to get right than long cut-and-paste lists.

For this:

I think you meant _at2 and _at3 in the second and third lines on the RHS. That’s an example of the kind of cut-and-paste error this kind of coding will lead to.

I’d also suggest starting with something much simpler to begin. That’s a huge Stan program, even after you get rid of all the redundancy.

Thank you Bob for checkin that out.
That wasn’t a mistake. alpha_at should be the same for all three levels by assumption.
This is more a conceptual issue than a code issue.

I have started with simpler code actually. I have started building this code using
zero inflated poisson without mixture and separately mixture model with normal outcome and
then I combined the two.

Max is right. a mixture of mixtures is hard to fit.
I will try with negative-binomials but ZIP would fit the data better.

Sorry for my really late reply.
But thank you for going through my code.

Some answers to your comments.

  1. Not really. g is a group indicator that I use for random effects.
    We then have 4 strata (at, nt, pc, rc) that are partially defined by
    z and d. I don’t know the membership to these strata.
    Depending on the strata membership the model for Y has different parameters.

  2. For some combinations of z and d I know the strata membership. For others
    unit can be a member of two strata and we don’t know which one.

  3. What do you mean? Each person belongs to a group g and a stratum at,nt,pc or rc. Based on these
    memberships and the covariates in x of each person, I impute the 3 values of Y corresponding
    to the three values of the treatment z.

  4. I am interested in this point. Could you please be more specific?

re 1. If you can do what Max said and avoid the mixtures, excellent

re 3. It’s been awhile and I’ve forgotten what I meant I guess :(

re 4. If you’re dealing with ratios of exponentials, you can often multiply the top and the bottom by a constant e^c to avoid computing the exponential of large numbers.

For instance maybe you want to compute this:

e^(50) / (e^(50) + e^(51))

e^51 is a large number, we can agree, but if you multiply through by e^(-50) on the top and the bottom, you only need to compute:

1.0 / (1.0 + e)

These terms: log_pi[4] - log_sum_exp(log_pi) are basically logs of softmaxes, I think. The usual way to scale softmaxes (of a vector v) is to multiply the numerator and denominator by something like e^(-max(v)). This Q: https://stackoverflow.com/q/34968722 and A: https://stackoverflow.com/a/34969389 are relevant.