How can I find the posterior SD with a non-normal distribution?

JohsEnevoldsen · April 28, 2022, 7:24am

I have a model with an outcome that is a truncated normal. I estimate Sigma for each level of a categorical variable. How do I compare the posterior SD for each group?

I cannot simply compare Sigma, as the SD will also depend on mu due to the truncation.

For comparing group means, posterior_epred() solves the problem that the mean of a truncated normal is not equal to the mu parameter. Is there anything similar for the SD?

Below is a simple example, that should illustrate the problem.

library(brms)

a <- rnorm(1000, 1, 2)
b <- rnorm(1000, 6, 2)

df <- data.frame(gr = rep(c("a","b"), each = 1000), 
                 value = c(a,b))

# Remove values below 0
df <- df[df$value > 0,] 

sd(df$value[df$gr == "a"])
#> [1] 1.393734
sd(df$value[df$gr == "b"])
#> [1] 1.938442

m1 <- brm(bf(value | trunc(lb = 0) ~ 0+gr, 
             sigma ~ 0+gr),
          data = df, silent = TRUE)
#> Compiling Stan program...

plot(m1)

The Sigma values above are log(Sigma).

To compare means, I can simply use conditional_effects(), which use the posterior distribution of epreds by default, so this gives the correct posterior means for each group.

# Mean of posterior distributions
conditional_effects(m1)

To compare group SD, I could simulate several posterior predictions for each draw and calculate the SD for each group within each draw, but that seems excessive. Also, I’m unsure how many simulations I should make for each draw, to make the the uncertainty from the simulations negligible.

library(ggplot2)
library(tidybayes)

predicted_draws(m1, newdata = data.frame(gr = rep(c("a", "b"), each = 100))) |> 
  group_by(.draw, gr) |> 
  summarise(sd = sd(.prediction)) |> 
  ggplot(aes(gr, sd)) +
  stat_eye()
#> `summarise()` has grouped output by '.draw'. You can override using the
#> `.groups` argument.

^{Created on 2022-04-28 by the reprex package (v2.0.0)}

JohsEnevoldsen · April 28, 2022, 8:46pm

… or should I just stick to the Gaussian if I want to estimate standard deviations?

caesoma · April 29, 2022, 8:33am

I am not very familiar with the brms syntax, but I am assuming that by truncated normal output you mean that the likelihood has that form. In any case, and regardless of the model and its likelihood function, the “uncertainty in the posterior” can be computed directly from it, whether it’s for a parameter in the model directly or a function of one or more parameters. Uncertainty can be a somewhat vague term, so it’s important to be precise about what you need to quantify – the main point here is that the uncertainty from the posterior pertains to the parameters, not the observations.

You are probably fitting the output of a model as the mean of a normal distribution, that is why you can easily compute the expected value and variance for that, but sigma is kind of added on top of that to estimate how far from the mean the data for each group is. You can definitely compare the sigmas (their expected value or entire distributions), but it is true that depending on how the distribution is truncated and some other details of the data it may not accurately reflect the difference in the variation between groups – at least as far as their meaning for a gaussian distribution.

What you are probably alluding to is the posterior predictive distribution. MCMC inference readily allows you to quantify uncertainty in the parameters, but not in the data, you need something else to go about that and you are right that is it through simulation. What you need exactly depends on what you are trying to do, if you want to build a distribution that closely resembles the “true” distribution from which the data would be drawn you probably need a lot of samples, may be excessive and I am not sure how often this is done, but computation-wise it’s probably feasible and a lot less costly than the inference itself. Alternatively, you can simulate that distribution for the same number of data points you have in the data (or a few fold larger sample), that way you can compare the summaries of the data with the pseudodata and see if they resemble each other (resemble here is subjective, but if you need to validate all comparisons objectively you may end up in a infinite recursion).

I hope this answers your question; let us know if you had something else in mind.

JohsEnevoldsen · April 29, 2022, 8:50am

Thank you for the answer. As you may have guessed, I’m quite inexperienced with Bayesian models, and have difficulty expressing a question precisely.

What you are probably alluding to is the posterior predictive distribution.

Yes, but I think my problem is simpler than describing the shape of the entire posterior distribution.
If my response distribution was normal, I would simply compare the posterior distribution for Sigma for each group, but as you seem to agree with, the truncation makes this a bit more complicated.

Here are the posterior predictive check for my actual model with a truncated vs non-truncated response distribution.

Truncated normal

Normal

Maybe the normal is “good enough” event though values cannot be negative?

caesoma · April 29, 2022, 12:13pm

I hope that didn’t come across as obnoxious; when I talk about precisely stating your goal it’s because bayesian inference is complex/sophisticated enough that there are at least a few concepts that may get mixed up between contexts – uncertainty is certainly one of them, and also one of the most important concepts in bayesian inference.

I am not sure if you are suggesting using a different model because it’s easier to interpret. Personally, I would not do that, and instead choose the most appropriate model. I think you could still get a measure of variance through sigma, especially since for the gaussian distribution the mean will tend to accommodate symmetrically among the data.

Something else you could do is a transformation of the distribution to a log-normal, for instance, I don’t know enough about the model to say if that is an option for you. Maybe there are other clever ideas that are not coming to mind to, but the most straightforward/standard is probably just to simulate the posterior predictive distribution – it shouldn’t be too intensive even for a lot of samples.

jsocolar · April 29, 2022, 2:05pm

Do I have it right that the question boils down to: "how can I compute the standard deviation of a truncated normal distribution given the \mu, \sigma, L, and U, i.e. the mean and sd of the underlying normal and the lower and upper truncation bounds?

You can find the necessary formulas for the variance here Truncated normal distribution - Wikipedia

JohsEnevoldsen · April 29, 2022, 3:31pm

Not at all :)

The lognormal did not fit. I may overlook something with the posterior simulation. I simulation n (e.g. n=100 but should probably be more) observations for each posterior draw (e.g 2000) for each group (a total of 200.000 simulated observations). Then I calculate the SD of each draw to get the posterior distribution of SD.

JohsEnevoldsen · April 29, 2022, 3:42pm

Indeed. I hoped that a general solution existed, e.g. as with epred for the expected value of a distibution.

Also, my math knowledge is limited, and I could not find a formula that requires only mu, sigma and the bounds.

It seems like the Wikipedia page does provide enough information to piece a formula together, but I would not be comfortable doing that without understanding the individual parts.

jsocolar · April 29, 2022, 4:19pm

I don’t think brms has a built-in for this, here’s the Wikipedia formula wrapped as an R function. little_phi and big_phi are just to show you the connection to Wikipedia’s notation; they are wrappers to dnorm and pnorm respectively.

little_phi <- function(x){
  dnorm(x)
}

big_phi <- function(x){
  pnorm(x)
}

trunc_sd <- function(mu = 0, sigma = 1, a = -Inf, b = Inf) {
  if(b <= a){stop("b must be strictly greater than a.")}
  alpha <- (a - mu)/sigma
  beta <- (b - mu)/sigma
  Z <- big_phi(beta) - big_phi(alpha)
  term_2 <- (alpha * little_phi(alpha) - beta * little_phi(beta))/Z
  term_3 <- ((little_phi(alpha) - little_phi(beta))/Z)^2
  variance <- sigma^2 * (1 + term_2 - term_3)
  sqrt(variance)
}

Here’s a simulation function that you can use to verify the correctness of the above:

trunc_sd_sim <- function(mu = 0, sigma = 1, a = -Inf b = Inf) {
  if(b <= a){stop("b must be strictly greater than a.")}
  x <- rnorm(1e7, mu, sigma)
  sd(x[x > a & x < b])
}

JohsEnevoldsen · April 29, 2022, 9:18pm

Wow. Thank you for this! I will try it out Monday and report back.

JohsEnevoldsen · May 2, 2022, 7:01am

The function worked with both bounds set, but with e.g. b=Inf, beta (=Inf) * little_phi(beta) (=0) = NaN.
I only have a lower bound, so I simply rewrote it as:

trunc_lb_sd <- function(mu = 0, sigma = 1, a) {
  alpha <- (a - mu)/sigma
  Z <- 1 - big_phi(alpha)
  term_2 <- (alpha * little_phi(alpha))/Z
  term_3 <- ((little_phi(alpha))/Z)^2
  variance <- sigma^2 * (1 + term_2 - term_3)
  sqrt(variance)
}

Also, I found that tidybayes::add_epred_draws can return the distribution parameters by setting dpar = c("mu", "sigma"), which is quite convenient for my use case.

groups |> 
  add_epred_draws(mod, re_formula = NA, incl_autocor = FALSE, value = ".value",
                  dpar = c("mu", "sigma")) |> 
  mutate(corrected_sd = trunc_lb_sd(mu, sigma, a = 0))

jsocolar · May 2, 2022, 1:55pm

Nice! Sorry for not catching the 0 * Inf error! Your implementation is better, but both implementations become numerically unstable when the lower bound is large compared to the mean (so that truncation crops out all but the extreme upper tail). Look like trunc_lb_sd returns NaN near 7.5-sigma and then incorrect results above that. If this is an issue for you, it should be possible to rewrite the function to be more numerically stable on the log scale, or to special-case these extreme tail truncations by sampling from directly from an exponential distribution.

JohsEnevoldsen · May 2, 2022, 3:06pm

Thanks again. It all makes a lot more sense to me formulated as code, and functions that I’m more familiar with than the corresponding mathematical notation.

Thanks for the warning about numerical instability. I think I see how floating point precision becomes a problem when we have to represent values very close to 1 with high precision. Nevertheless, in my use case, mu is always higher than a.

Topic		Replies	Views
Difference in posterior distributions for group effects for brms and hand coded stan model Modeling rstan , brms	2	940	February 17, 2022
Expected Value of Posterior vs. Posterior of Expected Value with epred brms brms	14	4521	September 15, 2022
Differences between posterior_smooths vs posterior_predict vs posterior_epred Modeling brms	6	618	November 2, 2023
Posterior_epred with zero inflated count model Modeling brms	2	186	April 4, 2024
Approach to estimating contrasts using brms model Modeling	0	32	May 15, 2025

How can I find the posterior SD with a non-normal distribution?

Related topics