LOO-pit for folded predictions: gamma regression vs gaussian regression

I am following the workflow described in this notebook by Aki Vehtari. I’m going through the section comparing loo-pit for folded predictions plots. I am comparing the fit of a gamma regression to a gaussian regression for a very right-skewed outcome variable.

The the gaussian folded loo-pit

looks pretty good. But the gamma looks terrible

ppc_pit_ecdf(y = abs(outVec-median(outVec)),
             yrep = abs(posterior_predict(modFit_gamma)-median(outVec)))

where outVec is a vector of the outcome variables and modFit_gamma is the model.

In the workbook, which concerns a betabinomial regression mainly, a function is necessary to get the ppc_pit_ecdf() function to work properly (without the function the line looks crazy. I am wondering, is something like this necessary for the gamma regression or is it just that the gamma is not as good as the gaussian.

I should note that the posterior predictive replicates for the gamma look pretty good.

compared to the gaussian

And neither is notably better than the other

                elpd_diff se_diff
fit_cpqAv_gauss   0.0       0.0  
fit_cpqAv_gamma -59.8      49.0

I should also note that to allow the comparison I added 0.000001 to the to any 0 scores on the outcome to allow me to run the gamma and compare it to the gaussian. I don’t know if this is kosher but it seems to work.

Some guidance would be much appreciated.

Note: it seems like the workbook has changed a little over the last few years, and the ppc_pit_ecdf() function to have become a new type = "loo_pit_ecdf" argument within the pp_check() function.

It was changed also this spring to use the more appropriate method from the paper LOO-PIT predictive model checking. I recommend to switch to those. As the new approach is ycorrectly calibrated, I also dropped the folded variant as the basic variant did already spot the discrepancy.

The results can be sensitive to the value added, and thus it is not generally recommended. A better approach would be to use zero-inflation model, and making such as a hurdle-type model is easy (one Bernoulli model for zero vs non-zero and one gamma model for non-zeros).

I disagree as the replicates have more mass near 0, which is clearly visible also in the LOO-PIT-ECDF plot.

Instead of Gaussian you could try truncated Gaussian which is possible with brms, too.

Using the new dependency-aware LOO-PIT-ECDF plots and tests, would give you better assessment on the discrepancy