Non linear distribution with some zeros in data

@avehtari I just need confirmation that the below function in the brms is not for beta-binomial but beta distribution, right? and I make cutsome_family using Stan?
``
zero_inflated_beta(link = “logit”, link_phi = “log”, link_zi = “logit”)

Yeah, that’s a zero-inflated beta distribution. The zero-inflated beta-binomial can be implemented in brms as a custom family, as you suggest.

To get started, you can have a look at the brms custom family vignette here, which walks through implementing a beta-binomial: Define Custom Response Distributions with brms

Implementing the zero-inflated binomial is in one sense simple, but in another sense pretty complicated, depending on how you’ve encountered zero-inflated models in the past. In particular, Stan will not allow you to implement the zero-inflation indicator as a discrete parameter; instead you need to marginalize over the latent discrete zero-inflation state. Additionally, you will need to make a decision about whether and how to model the zero-inflation as a function of covariates, and whether to allow the directionality of covariate effects on the beta-binomial location parameter to inform the directionality of effects on the zero-inflation parameter, and vice versa (and what form, if any, that sharing of information should take). These decisions will matter more if you have a small number of zeros (or if you have almost all zeros) than if you have a lot of zeros and a lot of non-zeros (with a lot of zeros and non-zeros both, it should be no problem to constrain both the zero-inflation sub-model and the beta-binomial submodel without sharing information).

2 Likes

@jsocolar thanks for clarifying the function.
I think I have relatively large zeros and non-zeros in the data as the distribution above shows. My predictors are monotonic ordered variables.
I have read the vignette thanks for the suggestion.
It seems Paul is too busy to implement it in brms. I try to code Stan and share it here for feedback. It is a bit complicated given that I have to also address monotonic predictors in Stan.
Thanks

2 Likes

@ssalimi For what it’s worth, if you need monotonic predictors, my suggestion is to implemented the zero-inflated beta-binomial as a custom family in brms. This will give you all of the flexibility of brms in terms of how you specify your linear predictors, but will use your custom family as the likelihood conditional on predictors. Happy to help if you run into trouble.

Of course if it’s more comfortable to implement in pure Stan there’s nothing wrong with that either!

1 Like

@jsocolar @mike-lawrence @jd_c @cmcd @avehtari

A very naive attempt for making custom_family ( without monotonic predictor for now). I appreciate feedback on the code:

beta_binomial <- custom_family(
 "zero_inflated_beta_binomial", dpars = c("mu", "phi","zi"),
 links = c("logit", "log","logit"), lb = c(NA, 0,NA),
 type = "int", vars = "vint1[n]"
)

stan_funs <- "

  /* zero-inflated beta-binomial log-PDF of a single response 
   * Args: 
   *   y: the response value 
   *   mu: mean parameter of beta-binomial distribution
   *   phi: shape parameter of beta-binomial distribution
   *   zi: zero-inflation probability
   * Returns:  
   *   a scalar to be added to the log posterior 
   */ 

        real zero_inflated_beta_binomial_lpmf(int y, real mu, real phi,
                                   real  zi) { 
        if (y == 0) { 
        return        log_sum_exp(bernoulli_lpmf(1 | zi), 
                         bernoulli_lpmf(0 | zi) + 
                         beta_binomial_lpmf(0 | mu, phi)); 
    }   else { 
        return     bernoulli_lpmf(0 | zi) +  
                   beta_binomial_lpmf(y | mu, phi); 
    } 
	
}	
		
  /* zero-inflated beta-binomial log-PDF of a single response 
   * logit parameterization of the zero-inflation part
   * Args: 
   *   y: the response value 
   *   mu: mean parameter of beta-binomial distribution
   *   phi: shape parameter of beta-binomial distribution
   *   zi: linear predictor for zero-inflation part
   * Returns:  
   *   a scalar to be added to the log posterior 
   */
	
       real   zero_inflated_beta_binomial_logit_lpmf(int y, real mu, 
                                                           real phi, real zi, int T) { 
        if (y == 0) { 
        return       log_sum_exp(bernoulli_logit_lpmf(1 | zi), 
                      bernoulli_logit_lpmf(0 | zi) + 
                      beta_binomial_lpmf(0 | mu, phi)); 
    }   else { 
	            beta_binomial_rng(real mu, real phi, int T) {
        return   bernoulli_logit_lpmf(0 | zi) +  
	             beta_binomial_rng(T, mu * phi, (1 - mu) * phi);
        }
       
     }
	
	// zero_inflated beta binomial log-CCDF and log-CDF functions
	 
         real   zero_inflated_beta_binomial_lcdf(int y, real mu, real phi, real hu) { 
		 
                  return loglm_exp(zero_inflated_beta_binomial_lccdf(y | mu, phi, hu));
   }
  
}
"
stanvars <- stanvar(scode = stan_funs, block = "functions")

fit1 <- brm(
  event| vint(trials) ~ predictor + (1|id), data = data, 
  family = beta_binomial, stanvars = stanvars
)

I appreciate any feedback on the code as it is my first time developing custom_family. Thanks

2 Likes

This looks great! What’s especially neat about these custom families is that nothing further is required to implement the monotonic effects. brms will construct the linear predictor for the monotonic effects for you just like it does for other classes of model, and it will pass them to your custom family.

Note that some of your comments say zero-inflated negative binomial when you code makes clear that you mean “beta binomial” instead.

1 Like

@jsocolar Thank you, the negative binomial was in the codes’ explanation, I edited them to beta-binomial, but the code is for beta-binomial. I still need to run it and check if I get any bugs. I will share updates.

2 Likes

@cmcd @jsocolar @mike-lawrence
The code above had issues; I tried to modify it as below and a sample data is also attached but I got an error again!



long=read.csv("sample.csv",header=TRUE,sep=",")

 
zero_inflated_beta_binomial<- custom_family(
 "zero_inflated_beta_binomial", dpars = c("mu", "phi","zi"),
 links = c("logit", "log","logit"), lb = c(NA, 0,NA),
 type = "int", vars = "vint1[n]"
)



stan_funs<- "
real zero_beta_binomial_lpmf(int y, real mu, real phi, 
                                       real zi) { 
    if (y == 0) { 
      return log_sum_exp(bernoulli_lpmf(1 | zi), 
                         bernoulli_lpmf(0 | zi) + 
                         beta_binomial_lpmf(0 | mu, phi)); 
    } else { 
      return bernoulli_lpmf(0 | zi) +  
             beta_binomial_lpmf(int y | int T, real mu, real phi); 
    } 
      
  int beta_binomial_rng(int T, real mu, real phi, real zi) {
  
   if (y == 0) { 
      return log_sum_exp(bernoulli_rng(1 | zi), 
                         bernoulli_rng(0 | zi) + 
                         beta_binomial_rng(0 | mu, phi)); 
    } else { 
      return bernoulli_rng(0 | zi) +  
      return beta_binomial_rng(T, mu * phi, (1 - mu) * phi);
  }
  
   } 
  "
  
stanvars <- stanvar(scode = stan_funs, block = "functions")
[sample.csv|attachment](upload://hHD7TG1IzO3HQ1jlCH0ilKA8A4q.csv) (663 Bytes)

fit1 <- brm(
  events| vint(trial) ~ age + (1|id), data = long, 
  family = zero_inflated_beta_binomial, stanvars = stanvars
)

I still get the error

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for: 
  beta_binomial_lpmf(int, real, real)

In this section you are calling beta_binomial_lpmf twice. The first call cannot work because you are missing the variable N. The second call is correct.

The error message you got points in my experience to these problems:

  • The function does not exist
  • The function exists, but one has called it with the wrong arguments

@Guido_Biele Thank you for your feedback. But to be honest, I am a bit lost. With the first call

if (y == 0) { 
      return log_sum_exp(bernoulli_lpmf(1 | zi), 
                         bernoulli_lpmf(0 | zi) + 
                         beta_binomial_lpmf(0 | mu, phi)); 

why I need N ( I called it T in the second call). This part is to account for zeros. I am not sure how to include N/T here.
For the second call

 } else { 
      return bernoulli_lpmf(0 | zi) +  
             beta_binomial_lpmf(int y | int T, real mu, real phi); 
    } 

Should I use zero_beta_binomial instead of beta_binomial? I appreciate your advice.

EDIT: @maxbiostat edited this post for syntax highlighting.

Sorry for the slow response, haven’t been on discourse lately.
I think it should be the same T as in the second call.

@Guido_Biele @mike-lawrence @cmcd @jd_c @jsocolar @avehtari

The below syntax finally worked, I would appreciate it if there is any feedback!

zero_inflated_beta_binomial<- custom_family(
 "zero_inflated_beta_binomial", dpars = c("mu", "phi","zi"),
 links = c("logit", "log","logit"), lb = c(NA, 0,NA),
 type = "int", vars = "vint1[n]"
)


stan_funs<- "
real zero_inflated_beta_binomial_lpmf(int y, real mu, real phi, 
                                       real zi,int T) { 
      if (y == 0) { 
      return             log_sum_exp(bernoulli_lpmf(1 | zi), 
                         bernoulli_lpmf(0 | zi) + 
                         beta_binomial_lpmf(0 | T, mu * phi, (1 - mu) * phi)); 
    } else { 
             return bernoulli_lpmf(0 | zi) +  
             beta_binomial_lpmf(y | T, mu * phi, (1 - mu) * phi); 
   
      } 
	    }
    "
  
   stanvars <- stanvar(scode = stan_funs, block = "functions")
 

  fit1 <- brm(
  events| vint(trial) ~ age + (1|id), data = long, 
  family = zero_inflated_beta_binomial, stanvars = stanvars
)
2 Likes

Great that there are at least no obvious bugs anymore.

I think to be surer that everything works the next steps would be to do some simple (visual) posterior predictive checks. For this you’ll have to make some additional functions as described in the brms vignette for custom likelihoods. You’ll anyhow need those functions for posterior predictive checks for the data you ultimately want to analyse.

The best way to validate your code would be to use simulation based calibration. Here is a vignette about how @hyunji.moon’s SBC package works with brms.

2 Likes

@Guido_Biele Thank you. Yes, I am running the code on my actual data and will use predictive check, model check using LOO. Also, thanks for sharing the SBC package. I will use it for simulation. I appreciate all hints.

2 Likes

@Guido_Biele
To apply “loo” function I used this syntax

    log_lik_zero_inflated_beta_binomial<-function(i, prep) {
      mu <- brms::get_dpar(prep, "mu", i = i)
      phi <- brms::get_dpar(prep, "phi", i = i)
      zi<- brms::get_dpar(prep, "zi", i = i)
      trials <- prep$data$vint1[i]
      y <- prep$data$Y[i]
      T <- prep$data$T[i]
    
   if  (y == 0) { 
      return         
     zero_inflated_beta_binomial_lpmf(0 | T,mu, phi,zi, trials); 
    }   else { 
   
    zero_inflated_beta_binomial_lpmf(y|T, mu,  phi, trials)
  }
   }
  
    ```
But I got the error as :

 Error in mapply(FUN = function (y, mu, phi, zi, T, pstream__ = <pointer: 0x2aaad5e77760>)  : 
  zero-length inputs cannot be mixed with those of non-zero length 

I am not sure how to separate zero from non-zero in this part! I appreciate any guidance!

Sorry for the slow reaction!
It’s hard to debug this without having the code, and I don|t have time to run this on my PC.

It looks as if you are in R, but zero_inflated_beta_binomial_lpmf(y|T, mu, phi, trials) is Stan syntax. Can you try zero_inflated_beta_binomial_lpmf(y, T, mu, phi, trials) (and of course also zero_inflated_beta_binomial_lpmf(0, T, mu, phi, trials)).

Also, you might be able to directly use loo.brmsfit() (documention is here).

1 Like

@Guido_Biele @avehtari
Thanks for the above suggestions. These worked and I did not get errors. However, I have a serious issue with running the model on my data.
When I run the model I get the error
“bernoulli_lpmf: Probability parameter is -1.79029, but must be in the interval [0, 1]”
I chose the priors using get_prior in brms. If this is not due to prior choice, what else could be the reason for this error?
I appreciate any hints.

This sounds as if the logistic transform was not applied yet.
On a practical level, using bernoulli_logit_lpmf should work.
However, given that there still such bugs in the model, I would strongly advise to do these things before analysing data:

  • a simple parameters recovery experiment
  • even better simulation based calibration
  • do a prior predictive check to see if model predictions are reasonable given your data

Jumping over these steps is very risky, because without successfully doing these things you will not be sure that your model accurately estimates what you want it to estimate.

@Guido_Biele @avehtari @paul.buerkner
I could do the steps above and also finally got the LOO, pp_check, and conditional_effects work for zero_inflated_beta_binomila family

The final syntax that worked for me is:

zero_inflated_beta_binomial<- custom_family(
  "zero_inflated_beta_binomial", dpars = c("mu", "phi","zi"),
  links = c("logit", "logit","logit"), lb = c(NA, 0,NA),ub=c(NA,1,NA),
  type = "int", vars = "vint1[n]"
)

stan_funs<- "
           real zero_inflated_beta_binomial_lpmf(int y, real mu, real phi, 
           real zi,int T) { 
           if (y == 0) { 
           return         log_sum_exp(bernoulli_lpmf(1 | zi), 
                          bernoulli_lpmf(0 | zi) + 
                          beta_binomial_lpmf(0 | T, mu, phi)); 
}          else { 
                          return bernoulli_lpmf(0 | zi) +  
                          beta_binomial_lpmf(y | T, mu * phi, (1 - mu) * phi); 
 } 
}
               real  zero_inflated_beta_binomial_logit_lpmf(int y, real mu, 
               real phi, real zi, int T) { 
              if (y == 0) { 
            return     log_sum_exp(bernoulli_logit_lpmf(1 | zi), 
                        bernoulli_logit_lpmf(0 | zi) + 
                        beta_binomial_lpmf(0 | T,mu, phi)); 
}           else { 

            return        bernoulli_logit_lpmf(0 | zi) +  
                          beta_binomial_lpmf(y|T, mu * phi, (1 - mu) * phi);
}

}
"

  stanvars <- stanvar(scode = stan_funs, block = "functions")

  expose_functions(fit1, vectorize = TRUE)

  log_lik_zero_inflated_beta_binomial<-function(i, prep) {
  mu <- brms::get_dpar(prep, "mu", i = i)
  phi <- brms::get_dpar(prep, "phi", i = i)
  zi<- brms::get_dpar(prep, "zi", i = i)
  T <- prep$data$vint1[i]
  y <- prep$data$Y[i]
  zero_inflated_beta_binomial_lpmf(0, mu, phi,zi,T)
  zero_inflated_beta_binomial_lpmf(y,mu, phi,zi,T)
  
}

 loo(fit1)

 posterior_predict_zero_inflated_beta_binomial <- function(i, prep, ...) {
  mu <- brms::get_dpar(prep, "mu", i = i)
  phi <- brms::get_dpar(prep, "phi", i = i)
  zi<- brms::get_dpar(prep, "zi", i = i)
  T <- prep$data$vint1[i]
  ndraws <- prep$ndraws
  theta <- runif(ndraws, 0, 1)
  ifelse(
  theta < zi, 0, 
  rbinom(ndraws, size = T, prob = rbeta(ndraws, shape1 = mu * phi, shape2  = (1 - mu) * phi)))
}
  
  
 posterior_epred_zero_inflated_beta_binomial <- function(prep) {
  mu <- brms::get_dpar(prep, "mu")
  T <- prep$data$vint1
  T <- matrix(T, nrow = nrow(mu), ncol = ncol(mu), byrow = TRUE)
  mu * T
}

conditional_effects(fit1, conditions = data.frame(size = 1))

@Guido_Biele @avehtari @jsocolar Thanks for all your help. I like to make this a Vignette for Zero Inflated Beta Binomial Custom Family, what do you think?

ZIBB_CustomFamily.R (4.2 KB)

5 Likes

Excellent!

That would be very nice. I can give feedback when you have a draft

2 Likes