I am trying to estimate a copula model where some of the variables are distributed as zero-one-inflated beta. My goal is to use the Chib and Greenberg approach (as with blavaan) where discrete values are modeled as continuous values within the bounds implied by the other model parameters. In this case…
- Observations where X=0 have a continuous parameter associated with them that are bounded between 0 and the probability that X=0
- Observations where 0<X<1 are directly transformed by the scaled beta CDF
- Observations where X=1 have a continuous parameter associated with them that are bounded between 1 - the probability that X=1 and 1
The code snipped below demonstrates this idea.
parameters{
// Zero-one inflated beta parameters
simplex[3] theta;
real<lower = 0, upper = 1> mu;
real<lower = 0> sigma;
// Transformation of raw values to cumulative probabilities
vector<lower = 0, upper = theta[1]>[N0] P0;
vector<lower = 1.0 - theta[3], upper = 1>[N1] P1;
}
transformed parameters{
real a = mu * sigma;
real b = (1.0 - mu) * sigma;
vector[N_not01] P_not01_implied; // Direct estimates of CDF value based on theta, a, and b
for(n in 1:N_not01){
P_not01_implied[n] = theta[1] + beta_cdf(X_not01[n], a, b) * theta[2];
}
}
However, this produces biased estimates (see attached model2.stan
, "Modeled P(X=0) P(X=1) in the plot below). My understanding is that I need a Jacobian adjustment because the parameters are bounded by another parameter and I am only modeling 2 of the 3 segments (X=0 and X=1 but not 0<X<1). If I include a set of parameters for 0<X<1, then I get unbiased estimates (model3.stan
, “Modelled P(X=0) P(0<X<1) P(X=1)”). So there seems to be some Jacobian adjustment I need that is a function of those probabilities which should be more efficient than modeling the continuous values for P(0<X<1) which are ignored anyway.
I suspect that the information I need is described here in the user guide or here in the reference manual, but I’m not understanding practically what I need to be doing.
Thanks for your help and time!
example.R (2.1 KB)
model1.stan (842 Bytes)
model2.stan (1.0 KB)
model3.stan (1.1 KB)