Zero one inflated beta regression in STAN

stanito · August 8, 2017, 3:19pm

Can anyone provide any references to modeling a Zero one inflated beta regression in STAN?

Any packages that can run that model through R?

Thank you

bgoodri · August 8, 2017, 4:14pm

There is the cdfquantreg R package that has likelihoods that put positive mass on 0 and / or 1, but it does not use Stan. I started to incorporate it into RStanArm but didn’t finish it. In the meantime you could transform your y with

(y * (n - 1) + .5) / n

to force it interior, where n is the number of observations.

Bob_Carpenter · August 14, 2017, 1:35pm

You’d code it in Stan just like the zero-inflated Poisson, only with two inflated points. Beta regressions are awfully hard to fit. And ones that allow true zero and true one outputs sound dangerous. Where’d the values in (0, 1) come from? If they’re counts, you can (and should) just model those directly.

paul.buerkner · August 17, 2017, 2:40pm

You may want to try out the zero_one_inflated_beta() family in brms.

markhw · November 25, 2017, 11:23pm

Did you ever write code to do this in Stan proper? I’m trying to put together a zero-one inflated beta regression model currently, and I am having trouble taking the various parameterizations across papers and writing it into Stan.

Bob_Carpenter · November 26, 2017, 1:21am

If you just want additional probability for 0 and 1, the generative model will be probability theta[1] of generating 0, probability theta[2] of generating 1, and probability theta[3] of drawing from poisson(lambda) (or whatever the parameter of the Poisson is). In Stan, that’s a mixture that we can simplify because we know some of the mixture probability components are zero in some cases.

data {
  int<lower = 0> N;
  int<lower = 0>[N] y;

parameters {
  simplex[3] theta;  // mixture of 0, 1, poisson (in that order)
  ...
transformed parameters {
  vector<upper = 0>[3] log_theta = log(theta);  
  ...
model {
  for (n in 1:N) {
    real poisson_lp = log_theta[3] + poisson_lpmf(y[n] | lambda);
    if (y[n] == 0)
       target += log_sum_exp(log_theta[1], poisson_lp);
    else if (y[n] == 1)
      target += log_sum_exp(log_theta[2], poisson_lp);
    else
      target += poisson_lp;
  }
  ...

This could be generalized to 0 to K inflation, and written as:

simplex[K + 2] theta;
...
real poisson_lp = log_theta[K + 2] + poisson_lpdf(y[n] | lambda);
...
if (y[n] <= K)
  target += log_sum_exp(log_theta[y[n] + 1], poisson_lp);
else
  target += poisson_lp

markhw · November 27, 2017, 6:01am

Sounds good, thanks! I need to read more of the manual on the simplex and what (I think) appears to be incrementing, as I’ve been adapting a lot of things from JAGS and BUGS. For example, I’ve adapted this JAGS code to a zero-one inflated beta regression model in Stan by:

data {
  int n;
  int x[n];
  vector<lower=0, upper=1>[n] y;
}
transformed data {
  int<lower=0, upper=1> is_discrete[n]; 
  int<lower=-1, upper=1> y_discrete[n];

  for (i in 1:n) { 
    if (y[i] == 0) {
      is_discrete[i] = 1;
      y_discrete[i] = 0;
    } else if (y[i] == 1) {
      is_discrete[i] = 1;
      y_discrete[i] = 1;
    } else {
      is_discrete[i] = 0;
      // throws error if passed to bernoulli
      y_discrete[i] = -1;
    }
  }
}
parameters {
  vector[2] coef_a;
  vector[2] coef_g;
  vector[2] coef_m;
  vector[2] coef_p;
}
transformed parameters {
  vector<lower=0, upper=1>[n] alpha;
  vector<lower=0, upper=1>[n] gamma;
  vector[n] mu;
  vector<lower=0>[n] phi;
  vector<lower=0>[n] p;
  vector<lower=0>[n] q;

  for (i in 1:n) {
    alpha[i] = inv_logit(coef_a[1] + coef_a[2] * x[i]);
    gamma[i] = inv_logit(coef_g[1] + coef_g[2] * x[i]);
    mu[i] = inv_logit(coef_m[1] + coef_m[2] * x[i]);
    phi[i] = exp(coef_p[1] + coef_p[2] * x[i]);
    p[i] = mu[i] * phi[i];
    q[i] = phi[i] - mu[i] * phi[i];
  }
}
model {
  coef_a ~ normal(0, 1);
  coef_g ~ normal(0, 1);
  coef_m ~ normal(0, 1);
  coef_p ~ normal(0, 1);
  
  is_discrete ~ bernoulli(alpha);
  for (i in 1:n) {
    if (is_discrete[i] == 1) {
      y_discrete[i] ~ bernoulli(gamma[i]);
    } else {
      y[i] ~ beta(p[i], q[i]);
    }
  }
}

Unfortunately, I am having a hard time getting accurate estimates using that code from data I generated using the same model specification, so I was wondering if anyone had code for the zero-and-one case.

mbjoseph · November 27, 2017, 7:43am

Hey @markhw - any more details on what seems to be going wrong with parameter recovery, e.g., which parameters are not recoverable, is there consistent bias, how do things scale with N?

Another strategy would be to simplify the model (e.g., remove all covariate effects and add them back to the model one-by-one), so that you can get a better sense for what specifically is problematic.

I tweaked the R script that you had sent earlier, and it seems like the model recovers the true parameters for large N: https://gist.github.com/mbjoseph/a0961d20263ad50dd953c11defa59736

*Note that I’ve changed the data declaration for x from int x[n] to vector[n] x

markhw · November 27, 2017, 3:10pm

I’ve attached the R code that generates the data (similar to what you’ve seen). zoib_generation.R (1.7 KB)

It calls a file called zoib.stan, which is what I posted in the above comment. It looks like it is capturing the beta distribution well, but not the Bernoulli, as the coefficients predicting alpha and gamma are off, but mu and phi coefficients are not:

  coefficient estimated  real difference
        <chr>     <dbl> <dbl>      <dbl>
1   coef_a[1]      0.12 -0.13       0.25
2   coef_a[2]     -0.34 -0.05      -0.29
3   coef_g[1]     -0.02  0.00      -0.02
4   coef_g[2]      0.45  0.30       0.15
5   coef_m[1]      0.01  0.00       0.01
6   coef_m[2]      0.32  0.30       0.02
7   coef_p[1]      1.86  1.80       0.06
8   coef_p[2]      0.29  0.30      -0.01

It does look like you’re right about large N, though. When I upped the N to 10,000 instead of 1,000, I got better recovery of the coefficients:

  coefficient estimated  real difference
        <chr>     <dbl> <dbl>      <dbl>
1   coef_a[1]     -0.13 -0.13       0.00
2   coef_a[2]     -0.13 -0.05      -0.08
3   coef_g[1]     -0.09  0.00      -0.09
4   coef_g[2]      0.44  0.30       0.14
5   coef_m[1]     -0.02  0.00      -0.02
6   coef_m[2]      0.32  0.30       0.02
7   coef_p[1]      1.77  1.80      -0.03
8   coef_p[2]      0.32  0.30       0.02

I’m wondering if the method is really that biased for certain data, or if there’s something in how Ospina and Ferrari define the likelihood that we are missing?

mbjoseph · November 27, 2017, 5:33pm

I think the likelihood is correct based on my reading of the paper, but if somebody can find a mistake that we’ve missed, that would be great!

In looking at the marginal posterior distributions of each parameter relative to the true values, things don’t look terrible to me:

zoib-recovery

One thing to consider also is that only the discrete observations provide information on gamma with this current parameterization, so that in this case there are < 1000 independent data points that can contribute to the likelihood for coef_g.

Bob_Carpenter · November 27, 2017, 5:45pm

You want to replace this pattern:

gamma = inv_logit(...);
y ~ bernoulli(gamma);

with

y ~ bernoulli_logit(...);

I don’t understand the is-discrete thing here, but can’t you just vectorize these to:

y_discrete ~ bernoulli(gamma);
y ~ beta(p, q);

Otherwise, it seems like you wind up with unmodeled gaps in either y_discrete or y.

markhw · November 27, 2017, 6:33pm

Would replacing inv_logit and bernoulli to bernoulli_logit actually decrease the bias in estimation, increase the efficiency of the computing, or is it to make the code more readable?

The is_discrete variable is modeling whether or not the y is {0, 1}. If it is, then y_discrete is the value (0 or 1) that the discrete variable takes. y is then for all other observations (0, 1).

I was worried about unmodeled gaps, too. We are trying to model alpha, gamma, mu, and phi, however, where the values of 0 and 1 have the probabilities:

(taken from 6 here: https://arxiv.org/pdf/0705.0700.pdf)

That might help explain the is_discrete variable. So if we condition on alpha and then to the if-else, we incorporate the alpha * gamma probability of being one.

I felt that the unmodeled gaps would be an issue, so I tried to model it:

is_discrete ~ bernoulli(alpha);
  for (i in 1:n) {
    if (is_discrete[i] == 1) {
      y[i] ~ bernoulli(gamma[i]);
    } else {
      y[i] ~ beta(p[i], q[i]);
    }
  }

But it throws an error for using two different likelihoods, depending on the case?

mbjoseph · November 27, 2017, 6:51pm

Trying this:

  for (i in 1:n) {
    if (is_discrete[i] == 1) {
      y[i] ~ bernoulli(gamma[i]);
    } else {
      y[i] ~ beta(p[i], q[i]);
    }
  }

should raise a type error, because bernoulli_lpmf needs an integer y. Declaring y_discrete as an integer array in the transformed data block, and y as a vector in the data block is a workaround. Every element in y is modeled (there are no gaps), but the conditional logic ensures that bernoulli_lpmf and beta_lpdf receive data of the right type.

mbjoseph · November 27, 2017, 7:21pm

I suppose if you wanted to have the code better match the definition in the paper, you might consider something like this: zoib.stan (981 Bytes)

Where instead of declaring y_discrete you use the same conditional logic outlined in eq. 6 to increment the log probability:

  for (i in 1:n) {
    if (y[i] == 0) {
      target += log(alpha[i]) + log1m(gamma[i]);
    } else if (y[i] == 1) {
      target += log(alpha[i]) + log(gamma[i]);
    } else {
      target += log1m(alpha[i]) + beta_lpdf(y[i] | p[i], q[i]);
    }
  }

Bob_Carpenter · November 27, 2017, 8:18pm

It’ll make the arithmetic more stable and make the program more efficient. It shouldn’t affect the answer if you’re not getting any warnings as is.

Thanks for pasting in the definition. That’s more like the hurdle model. Rather than using two (0,1) parameters alpha and gamma, I just used one simplex

simplex[3] theta = [alpha * (1 - gamma), alpha * gamma, 1 - alpa]';  // 0, 1, (0,1)

It’d make sense to decompose the simplex into two (0, 1) constrained parameters if you had informative priors for alpha or gamma.

Beta regressions are really hard to fit, zero inflated or not. You don’t want to check point estimates, but posterior coverage.

To implement what you pasted in, you need this:

model {
  real log_zero = log(alpha * (1 - gamma));
  real log_one = log(alpha * gamma);
  real log_other = log(1 - alpha);
  ...
  for (i in 1:N) {
    if (y[i] == 0) target += log_zero;
    else if (y[i] == 1) target += log_one;
    else target += log_other + beta_lpdf(y[i] | mu, phi);
  }
...
}

You’ll find this all much much more efficient if in transformed data you create this:

transformed data {
  int<lower = 0> N_zero;  // number of y[i] == 0
  int<lower = 1> N_one;  / number of y[i] == 1
  vector[num_non_zero(y)] y_non_zero;
  ...
}

where y_non_zero holds the non-zero elements of y and num_non_zero(y) computes how many entries there are so that you can declare it. This only gets computed once.

That lets the likelihood be simplified to this much more efficient form:

target += N_zero * log_zero;
target += N_one * log_one;
target += beta_lpdf(y_non_zero | mu, phi);

I think that’s probably because y[i] is continuous, but bernoulli() requires a discrete variate. This won’t be a problem with the definition I coded above.

Bob_Carpenter · November 27, 2017, 8:19pm

Exactly! I missed @mbjoseph’s reply before replying.

linas · October 30, 2019, 11:13pm

Hi,

I am working on a zero-one inflated beta regression. The model I am using defines mu for every observation. I would like to calculate likelihood of y given mu. Is the following correct (sorry for reviving old thread but I already spent weeks and weeks trying to use 0-1 inflated beta regression for the problem I have)? What distr_lpdf should I use?

Many thanks for any comments.

model {
  ...
  for (i in 1:N) {
    if (y[i] == 0) target += distr_lpdf(0 | mu[i], alpha * (1 - gamma));
    else if (y[i] == 1) target += distr_lpdf(1 | mu[i], alpha);
    else target += beta_proportion_lpdf(y[i] | mu[i], sigma);
  }
...
}

mbjoseph · October 31, 2019, 3:12pm

Hi @linas! The model that I posted (zoib.stan) also has \mu varying for each observation.

If that doesn’t solve your problem, could you please clarify what your intended model is (e.g., using math notation), so we can get a better sense for whether your Stan model matches your intended model?

saudiwin · January 14, 2020, 10:51am

Just thinking out loud… couldn’t an easier way to fit this would be as a joint distribution logit model/beta regression? Simply have two independent models with shared covariates that contribute to the log posterior. As long as you have the same link function should still be interpretable.

Topic		Replies	Views
Sampling from zero-inflated Beta distribution General	1	404	April 26, 2023
Double inflated models in Stan Modeling	16	806	October 13, 2018
Zero-inflated negative and positive data - Zero-Inflated Gaussian? Modeling specification	3	625	October 10, 2020
Regression to the mean with mixed effects for zero inflation data Modeling	1	446	September 5, 2018
Predict in generated quantities for mixture models Modeling rstan	1	497	December 18, 2021

Zero one inflated beta regression in STAN

Related Topics