Drop draws below a threshold rather than setting a lower bound on a prior for nu

Solomon · October 8, 2025, 8:49pm

Say I have some fat-tail Student-t data, I fit a model using the Student-t likelihood, and the bulk of the \nu posterior is sitting well in the low-end of the range. Further say I want to use those posterior draws from \nu and \sigma to compute a posterior for the standard deviation using the formula \sigma \sqrt{\nu / (\nu - 2)} for \nu > 2 (see Student's t-distribution - Wikipedia). For the prior on \nu, I use the brms default \operatorname{gamma}(2, 0.1), but in two different ways:

set the lower bound at 2 (lb = 2), or
set the lower bound at 0, but just drop all posterior draws for which \nu \le 2.

Are both methods equally valid?

Here’s a quick example. I’m using brms here, but this is a more general Stan question.

# Load packages
library(tidyverse)
library(brms)
library(ggdist)

# Simulate fat-tail t data
set.seed(1)

student_data <- data.frame(y = rstudent_t(n = 100, df = 3))

# Fit the two versions of the model
fit_nu_lb_0 <- brm(
  data = student_data,
  family = student,
  y ~ 1,
  prior = prior(gamma(2, 0.1), class = nu, lb = 0),
  cores = 1, seed = 1, warmup = 500, iter = 5500)

fit_nu_lb_2 <- brm(
  data = student_data,
  family = student,
  y ~ 1,
  # Notice the different `lb` setting
  prior = prior(gamma(2, 0.1), class = nu, lb = 2),
  cores = 1, seed = 1, warmup = 500, iter = 5500)

# Extract and save the posterior draws
draws_lb_0 <- as_draws_df(fit_nu_lb_0)
draws_lb_2 <- as_draws_df(fit_nu_lb_2)

For the model that used lb = 0, there were indeed several draws for \nu below the 2 threshold.

draws_lb_0 |> 
  summarize(n_below_2_lb = sum(nu <= 2),
            percent_below_2_lb = 100 * mean(nu <= 2))

draws_lb_0 |> 
  ggplot(aes(x = nu, fill = nu > 2, group = nu > 2)) +
  geom_histogram(breaks = seq(from = 0, to = 60, by = 0.1)) +
  scale_x_continuous(expression(nu), breaks = 0:5 * 2) +
  scale_y_continuous(NULL, breaks = NULL) +
  coord_cartesian(xlim = c(0, 10))

# A tibble: 1 × 2
  n_below_2_lb percent_below_2_lb
         <int>              <dbl>
1          617               3.08

However, if one dropped all those draws for which \nu \le 2 from the fit_lb_0 version of the model, the posteriors for \nu look very similar for both versions of the model.

bind_rows(draws_lb_0, draws_lb_2) |> 
  mutate(lb = rep(c("lb = 0", "lb = 2"), each = n() / 2)) |> 
  # Drop those unwanted draws
  filter(nu > 2) |> 
  
  ggplot(aes(x = nu, y = lb)) +
  stat_histinterval(breaks = seq(from = 0, to = 60, by = 0.1)) +
  scale_x_continuous(expression(nu), breaks = 0:5 * 2) +
  scale_y_discrete(NULL, expand = expansion(mult = 0.1)) +
  coord_cartesian(xlim = c(1, 10))

Is this valid?

Solomon · October 8, 2025, 8:52pm

Note. I get that setting lb = 0 and then just dropping unwanted draws is wasteful. I don’t care about that, here. I’m interested in whether this is valid.

jsocolar · October 9, 2025, 2:24am

Yes, this is valid. Setting lb = 2 is precisely the same thing as truncating the posterior at 2, which is also precisely the distribution you are approximating by MCMC sampling while rejecting values below 2. The latter technique is a form of rejection sampling.

The key intuition here is that samples from a truncated distribution look exactly like samples from the untruncated distribution with all of the samples violating the truncation constraint rejected.

Solomon · October 9, 2025, 2:31am

Very helpful; thanks!

caesoma · October 9, 2025, 1:17pm

I’d say that is similar to sampling a parameter that is statistically valid, but that doesn’t make sense to the model, e.g. estimating the value of a biological rate that cannot be negative – ideally the data itself would preclude nonsensical values, but we know it’s perfectly possible that estimated values deviate from reality even in the best of the feasible experiments.

The point it what is more sensible, constraining the parameter beforehand or dealing with those values post hoc, maybe both are alright and if the sampler doesn’t get stuck at “weird” values, and the latter is helpful to know where inference tends to take the markov chain as a way of challenging your assumptions about the model

You could also compute a posterior for the values of \nu that result in finite variance and have a distribution that doesn’t add to one, complementing those with values where the variance is infinity and undefined (of if it’s useful you could even add that to the plot or summary)

)

Topic		Replies	Views
Looking for a way to specify a truncated lognormal prior to improve performance Modeling techniques	5	2813	September 10, 2018
Truncated normal not mixing sometimes Modeling fitting-issues , performance	29	2859	November 27, 2017
Sampling from truncated normal distribution Modeling fitting-issues , specification	5	877	October 11, 2021
Prior recommendation for scale parameters in hierarchical models too strong? Modeling	25	8542	January 31, 2018
Puzzling error message Modeling rstan	3	520	May 16, 2020

Drop draws below a threshold rather than setting a lower bound on a prior for nu

Related topics