Marginilisation of variables in Zero-Inflated Beta Regression

ciar · September 15, 2023, 1:36pm

Hello,

I am trying to fit the zero-inflated beta regression of Tang et al., 2023 (Zero-Inflated Beta Distribution Regression Modeling | SpringerLink). This involves extending the Beta distribution to the user defined scale (-a,1). Implementation in STAN is proving tricky due to a couple of variables that need to be marginalised out (namely zi and W in the below).

An indicator variable zi is defined, denoting whether an observation y is truly 0:

Where y > 0, zi = 0.
Where y = 0, zi is modelled as a latent variable, under the rational that y may truly be 0, and thus zi = 1, or that y may be 0 due to chance, and thus zi = 0.

Zeros due to chance are considered to have arisen from the Beta component. So, instead of modelling y , V is modelled. Where y > 0, V is known. Where y = 0 and zi = 0, a latent Beta distributed V is sampled.

The model should look like (I don’t know LaTex, so excuse writing this in line):

a = 1 // Setting a to 1 gives even distribution either side of 0
c = (a / a + 1)

μ = α1 + β1X
φ = α2
π = α3 + β2X

A = μ * φ
B = (1 - μ) * φ

if(y > 0):
  zi = 0 
  V = (y + a) / (a + 1)
  V ~ Beta(A, B)
  zi ~ Bernoulli(pi)

if(y = 0):
  zi ~ Bernoulli( π / (π + (1 - π)P(V ≤ c)) )
  if(zi = 0):
    y = max(0,W)
    W = (a + 1)V - a ∴ P(W ≤ 0) = P(V ≤ c)
    V ~ Beta(A, B) T(0, c)

I believe I have figured out how to marginilise out zi, I am less confident about how to work around y = max(0,W). This is the code I have come up with at the moment.

The model compiles but I get errors about the shape parameters being 0 due to the initial values, which leads to log probabilities of -∞. I’ve tried setting initial values (which doesn’t work) but I doubt this is the root of the problem anyway. Interestingly though, these errors refer to the function for non-zero observations, but I’m far more unsure about the function for the zero observations.

functions{

    /* Log-probability density of a beta distributed response with an upper bound */
    real upper_bound_beta_lpdf(real y, real alpha, real beta_, real ub) {
        real x = y / ub;
        return beta_lpdf(x | alpha, beta_) - log(ub);
    } 

    /* Log-probability density for observations where y > 0 */
    real tang_zi_nonzero_y_beta_lpdf(real V, real mu, real phi, real pi_){

        row_vector[2] shape = [mu * phi, (1 - mu) * phi];
        return bernoulli_logit_lpmf(0 | pi_) + beta_lpdf(V | shape[1], shape[2]);

    }

    /* Log-probability density for observations where y = 0 */
    real tang_zi_zero_y_beta_lpdf(real V, real mu, real phi, real pi_, real c) {

        row_vector[2] shape = [mu * phi, (1 - mu) * phi];
        real PVc = 1 - exp(beta_lccdf(c | shape[1], shape[2]));
        real pi_p = inv_logit(pi_);
        real PT0 = pi_p / (pi_p + (1 - pi_p) * PVc);
        
        return log_sum_exp(bernoulli_lpmf(1 | PT0), 
                            bernoulli_lpmf(0 | PT0) + upper_bound_beta_lpdf(V | shape[1], shape[2], c));

    }

}

data {

  int<lower=1> N;  // Total number of observations
  vector[N] Y;  // Response variable
  int<lower=1> nNonZero; // Number of non-zero observations
  int obsNonZero[nNonZero]; // Indices of non-zero observations
  int obsZero[N - nNonZero]; // Indices of zero observations

  int<lower=1> K;  // Number of population-level covariates for main model mean parameter
  matrix[N, K] X;  // Population-level design matrix for main model mean parameter

  int<lower=1> K_pi;  // Number of population-level covariates for zero-inflation component
  matrix[N, K_pi] X_pi;  // Population-level design matrix for zero-inflation component
  
  real<lower=0> a; // Lower bound of extended Beta distribution
  
}

transformed data {

  matrix[N, K] Xc;  // Centered version of X 
  vector[K] means_X;  // Column means of X before centering
  matrix[N, K_pi] Xc_pi;  // Centered version of X_pi 
  vector[K_pi] means_X_pi;  // Column means of X_pi before centering

  // Scale and centre predictor variables
  for (i in 1:K) { 
    means_X[i] = mean(X[, i]);
    Xc[, i] = (X[, i] - means_X[i]) / sd(X[, i]);
  }
  for (i in 1:K_pi) { 
    means_X_pi[i] = mean(X_pi[, i]);
    Xc_pi[, i] = (X_pi[, i] - means_X_pi[i]) / sd(X_pi[, i]);
  }

  // Tang ZIBR data
  int nZero = (N - nNonZero);
  real<lower=0> c = (a / (a + 1)); // Threshold for whether y = 0 is truly 0; P(Y = 0) = P(V <= c)
  real<lower=0,upper=1>  Vy[nNonZero]; // Where Y > 0, V is known
  for(i in 1:nNonZero){
    Vy[i] = (Y[obsNonZero[i]] + a) / (a + 1);
  }

}

parameters {

  vector[K] b;  // Coefficients
  real Intercept;  // Temporary intercept for centered predictors

  real Intercept_phi;  // Temporary intercept for centered predictors
  
  vector[K_pi] b_pi;  // pi regression coefficients
  real Intercept_pi;  // Temporary intercept for centered predictors

  real<lower=0,upper=1> V0[nZero]; // Latent beta variable V for zero observations

}

model {

  // Linear predictor terms
  vector[N] mu = rep_vector(0.0, N);
  vector[N] phi = rep_vector(0.0, N);
  vector[N] pi_ = rep_vector(0.0, N);

  mu += Intercept + X * b;
  phi += Intercept_phi;
  pi_ += Intercept_pi + X_pi * b_pi;

  mu = inv_logit(mu);
  phi = exp(phi);

  for(n in 1:nNonZero){
    target += tang_zi_nonzero_y_beta_lpdf(Vy[n] | mu[obsNonZero[n]], phi[obsNonZero[n]], pi_[obsNonZero[n]]);
  }

  for(n in 1:nZero){
    target += tang_zi_zero_y_beta_lpdf(V0[n] | mu[obsZero[n]], phi[obsZero[n]], pi_[obsZero[n]], c);
  }
  
  // Priors
  real lprior = 0;
  lprior += normal_lupdf(b | 0,2);
  lprior += normal_lupdf(Intercept | 0,2); 
  lprior += normal_lupdf(Intercept_phi | 0,2);
  lprior += normal_lupdf(b_pi | 0,2);
  // Informative prior on zero-inflation intercept to aid identifiability of zero sources
  lprior += normal_lupdf(Intercept_pi | -5,0.5); // P(Y = 0) expected to be very low at covariate means
  target += lprior;

}

Thanks for any advice,

Ciar

jsocolar · October 12, 2023, 9:20pm

One conceptual issue that you’re running into is that when the observed data are zero, this model is completely certain that the data arise from zero-inflation rather than from the beta component. The reason for this is because, even after extending the beta to the user-defined scale, the beta puts a probability density on zero, which corresponds to infinitesimal probably mass, but the zero-inflation component puts a finite probabily mass on zero. If literal zeros arise from the beta component (or any particular literal value truncated to a finite number of decimal places), that can only happen due to the limits of the precision with which the value is recorded. (A random sample from a beta distribution is irrational with probability 1.)

When you combine this with a zero-inflation component, you need to worry about it. In particular, you need to model the probability mass associated with the region of the beta that gets rounded/discretized to 0.

So to make progress, you need to choose whether you are comfortable modeling all literal 0’s as zero-inflated outcomes, or whether instead you need to discretize the beta to deal with 0’s that arise as limited-precision (rounded) beta variates.

Topic		Replies	Views
Sampling from zero-inflated Beta distribution General	1	478	April 26, 2023
Zero one inflated beta regression in STAN Modeling	18	6276	January 14, 2020
Zero Inflated models with missing values Modeling missing-data	4	728	March 27, 2019
Model specification of a zero-inflated negative binomial random effect model using brms brms specification	4	766	July 2, 2021
Continuous binomial, negative binomial and Poisson Modeling techniques	2	884	May 11, 2020

Marginilisation of variables in Zero-Inflated Beta Regression

Related Topics