I have some data y that is either 0 or has some positive continuous values. (x1, x2) are covariates that can explain y. I had this idea to model this as a mixture

```
target += log_mix(lambda[i],
normal_lpdf(y[i] | a[2] + b[2] * x2[i], sigma),
zero_lpdf(y[i]));
```

where lambda = inv_logit(… x …) and I create a new **boring** zero distribution

```
real zero_lpdf(real y) {
if (y == 0) {
return log(1.);
} else {
return log(1. / pow(10, 10));
}
}
```

This thing actually converges. I predict as pred=lambda*(a2 + b2*x2).

Now a few questions

- does the zero distribution make sense? am I reinventing something here?
- is there a more sophisticated way of making predictions?

Hey!

How would you generate data from this, i.e. what’s the generative process? Put differently how would you predict *new* y values?

Cheers,

Max

The DGP is roughly

pos = 1 * (inv_logit(…x) > .5)

y = pos * (a + bx + e)

For new values I would use my prediction equation

pred = inv_logit(x) * (a + b * x2) where all x are new values (out of sample).

I think this does (only) predict the expectation of y and not its full distribution. You would actually never predict pred = 0, although 0 outcomes are part of the DGP, right?

Also, I think the concept of “inflation” is a bit weird here, because p(y = 0) under the normal model is 0, i.e. you cant really inflate something from a continuous distribution. Thus we only have “inflated” models for discrete distributions with PMFs. (Maybe, a hurdle model with a log-normal distribution could be applied in your case?)