Summarizing skewed, unimodal PPD

I believe I have worked out the right approach to this, but would appreciate the insight of others.

To summarize, I am working with the skew normal distribution, which is both a good choice for my subject matter and reasonably easy to fit. The model fits well with good mixing, and the posterior-predictive checks look good. But, my usual approach to summarizing predictions of taking the mean or median across all draws of the PPD is producing “wrong” or at least highly misleading values. I suspect the answer is that I need to calculate the mode of the density of the PPD draws rather than the more traditional (and much easier) mean or median.

Consider a simple model with no covariates:



### generate fake data ###

skew_df <- data.frame(y = rskew_normal(1e4, 89, 14, -8)) # mean, sd, shape

### fit simple y-intercept model and call results ###

skew_mod <- brm(y ~ 1, 
                family = skew_normal(), 
                cores = 4, chains = 4, 
                seed = 13579,
                iter = 3000,
                warmup = 1000,
                data = skew_df)


If one cares to look at the model call, the parameters are substantially recovered, which is good. So now do a posterior predictive check:

pp_check(skew_mod, ndraws = 100) + ggtitle("PP Check, 100 draws")

Again, it looks good.

But, now compare the densities of (1) the original y variable, (2) a single random draw from each row of the PPD, and (3) the density of colMeans(posterior_predict(skew_mod)):

Obviously, taking the mean of the PPD draws is not going to work.

Reasoning this out, I posit the following:

  1. The goal in summarizing at least a unimodal distribution is to find the most likely probable value for each y.
  2. With unimodal distributions, this typically is the mean of the draws, or the median, which will either give a similar answer to the mean but accommodate a bit of messiness in the distribution.
  3. But when the distribution is unimodal and skewed, by definition the mean is no longer a sufficient descriptor of location, because if it were, you would just be using the normal distribution in the first place.
  4. The “most likely” location in a unimodal density is the mode, and we use the mean and median only as convenient substitutes for the vast majority of situations because they are usually very similar to the mode in value and far easier to calculate.
  5. But when a unimodal distribution is skewed, the mean and median are no longer sufficient to describe the most likely value, and we revert to the mode.

Am I correct? Incorrect? Sort of correct?

That leaves the question of how to efficiently calculate the mode, particularly across many thousands of rows. Perhaps this is a good start, but if there is an existing, fast function someone else has already written in R, I’d appreciate a reference.

Thanks much.

  • edited to fix mismatched legend

This is one possible goal among many, and its not a common goal in Bayesian statistics.

As you not below, this isn’t actually true; it additional depends on the unimodel distribution being symmetric.

Sufficient is not defined here, but I think your thoughts are along the right lines.

No! We rarely care much about the mode, except when as a matter of happenstance it happens to be a good summary of where the important probability mass is.

Again no. In general we come up with a good way to describe the actual shape of the distribution, rather than summarizing it down to a point.

Thanks for these insights.

But if we put nomenclature to one side, a model that can’t predict y isn’t very useful. Taking the means or medians of draws from this model cannot predict y whatsoever, despite the model fitting the data well. So, how would you be inclined to extract predicted values of y in this situation with reasonable accuracy, matching the approximate distribution of y, like the single random draw from each row of y is able to do?

I recognize that any value of y has uncertainty and I am used to capturing that also. But here, we aren’t even able to do that. How do we accurately predict y from this model across multiple draws? Having multiple draws should be making the prediction for each value of y more accurate, not less. Which means there must be a better way to summarize those rows.

I’m not sure what you mean by “predict” here. To predict y from this model, we would take row-wise draws from the posterior predictive distribution. The rest of your question looks to me like it stems from a misplaced belief that there ought to be a good general purpose formula for summarizing arbitrary univariate probability distributions using a single scalar.

OK. And it sounds like, in your view, there is no measure of central or predominant tendency that would most likely recover the original y, with some level of uncertainty around that number, from those row-wise draws of a skewed normal distribution.

I remain puzzled that there is no meaningful alternative for this distribution to colMeans(posterior_predict) as we would use for your typical exponential distribution, which also seems to be the foundation of the predict function included with some of the Stan front ends. But that disconnect seems to be on my end. Again, I appreciate the responses.

If the goal is to choose a number that is within \epsilon of the true y with maximal probability, for some fixed \epsilon, then this is a well-defined problem that has a clear solution, though the solution will not in general correspond to the mean, median, or mode, and will depend on the details of the distribution an the choice of \epsilon.

1 Like