I have some data y that is either 0 or has some positive continuous values. (x1, x2) are covariates that can explain y. I had this idea to model this as a mixture
target += log_mix(lambda[i],
normal_lpdf(y[i] | a[2] + b[2] * x2[i], sigma),
zero_lpdf(y[i]));
where lambda = inv_logit(… x …) and I create a new boring zero distribution
real zero_lpdf(real y) {
if (y == 0) {
return log(1.);
} else {
return log(1. / pow(10, 10));
}
}
This thing actually converges. I predict as pred=lambda*(a2 + b2*x2).
Now a few questions
- does the zero distribution make sense? am I reinventing something here?
- is there a more sophisticated way of making predictions?
Hey!
How would you generate data from this, i.e. what’s the generative process? Put differently how would you predict new y values?
Cheers,
Max
The DGP is roughly
pos = 1 * (inv_logit(…x) > .5)
y = pos * (a + bx + e)
For new values I would use my prediction equation
pred = inv_logit(x) * (a + b * x2) where all x are new values (out of sample).
I think this does (only) predict the expectation of y and not its full distribution. You would actually never predict pred = 0, although 0 outcomes are part of the DGP, right?
Also, I think the concept of “inflation” is a bit weird here, because p(y = 0) under the normal model is 0, i.e. you cant really inflate something from a continuous distribution. Thus we only have “inflated” models for discrete distributions with PMFs. (Maybe, a hurdle model with a log-normal distribution could be applied in your case?)