Truncated Normal Distribution Regression

In Stan you can implement a regression model using the truncated normal distribution. When modeling a continuous non-negative response variable is it appropriate to use the truncated normal distribution? Asking for a friend ;)

Most of the popular methods to model non-negative response variables talk about using Gamma regression or the Lognormal regression. However those distributions can have awkward properties that might not line up with how the data was generated. Often, in industry, it is sufficient for me to assume that the data generation process involves a non-negative normal distribution. Just wanna know if there’s something horrendously wrong with this approach.

1 Like

I’ve no specially deep experienced with truncated distributions but I recently wrote a model with a truncated frechet so I can share some insights.

IMO it’s appropriate to use a truncated normal when your data is truncated normal . I.e where you believe it is somehow inherent to the data.

If you don’t have this prior then the gamma is positive and much more flexible. It fits more prospective densities than the normal and I’d guess it to be a better default as a result. IMHO.

image

1 Like

I have the same problem. I tried to fit a HGLM with a Gamma distribution and it doesn’t fit for some groups, showing a longer tail than expected.
I tried to write down a truncated normal distribution but it’s telling me I can only use it for univariate responses :/.

Not the question but related: https://statmodeling.stat.columbia.edu/2020/01/10/linear-or-logistic-regression-with-binary-outcomes/

1 Like

@Juan_Ignacio_de_Oyarbide if you loop over each of the observations you should be able to get the model to run. See the code in the doc for more info https://mc-stan.org/docs/2_18/stan-users-guide/truncated-data-section.html.

2 Likes

I believe the way to go is to do a posterior predictive check (the dens_overlay check from bayesplot might be a good start). If one of the models is better aligned with the data generating process, you should be able to find a PP check(s) that fails for one but work for the other. If you can’t, the distribution probably doesn’t matter much. If both fail, something more is going on.

1 Like

@martinmodrak this question actually arose from a posterior predictive check. I fit a normal and gamma regression and wasn’t satisfied with the posterior predictive distribution. Then I experimented with a truncated normal and it seemed to fit better and be more aligned with the DGP.

Thanks all for the feedback/validation on my approach!

1 Like

In remote sensing applications which often work with pixel data (range 0 to onwards), it’s not uncommon to use a mixture of gammas to fit weird univariate distributions with positive support. If you’re just fit hacking then it may be worth a shot.

Just a minor suggestion on top: one thing where I would expect truncated normal to be potentially problematic would be behavior around zero and behavior in upper tail, so a few PP checks on something like P(y < small_number), min(y) and max(y) might be a prudent approach.