Conditional truncation including -Inf

• Operating System: MacOS
• brms Version: 2.18.0

I have a problem where I’ve collected data at 2 conditions. For one condition, the data are truncated. They’re not truncated for the second condition. Ignoring truncation for a moment, my substantive model is:

\begin{align*} y_i & \sim \mathcal N(\mu_i, \sigma) \\ \mu_i & = \beta_0 + \beta_1 x_i, \end{align*}

where x_i is an 0/1 dummy variable for condition. To handle the truncation, I’d like to a fit a truncated normal version of the model where there is truncation for when x = 0, but no truncation otherwise. Here are some simulated data of that form, and my attempt to fit the model with brm().

# load
library(tidyverse)
library(brms)

# simulate
set.seed(1)
dat <- tibble(y = rnorm(n = 1000)) %>%
# truncate
filter(y > -2) %>%
# subset to just 100 cases from the truncated distribution
slice_sample(n = 100) %>%
bind_rows(tibble(y = rnorm(n = 100))) %>%
mutate(lb = rep(c(-2, -Inf), each = 100),
# add a predictor variable, which is confounded with the truncation periods
x = rep(0:1, each = 100))

# fit the model with weakly regularizing priors
fit <- brm(
data = dat,
family = gaussian,
y | trunc(lb = lb) ~ 0 + Intercept + x,
prior = prior(normal(0, 1), class = b, coef = Intercept) +
prior(normal(0, 1), class = b, coef = x) +
prior(exponential(1), class = sigma),
seed = 1, chains = 1,
# intentionally set small for debugging
iter = 10
)


I’ve tried tricks lke setting init_r = 0.2 or init = 0. I’ve even manually set my init values, like init = list(list(b = c(0, 0), sigma = 1)). No matter what, I always end up with warnings like this:

Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.


However, if I give up on the conditional truncation idea and manually set something like

 y | trunc(lb = -Inf) ~ 0 + Intercept + x


or

 y | trunc(lb = -5) ~ 0 + Intercept + x


the model fits just fine. What am I missing?

1 Like

There’s something odd about how truncation, Stan’s normal_lccdf(), and -Inf are interacting. You can make your model work by replacing -Inf by a large negative number.

I extracted the Stan code from your brms model via fit\$model. It looks like this:

// generated with brms 2.18.0
functions {
}
data {
int<lower=1> N;  // total number of observations
vector[N] Y;  // response variable
real lb[N];  // lower truncation bounds;
int<lower=1> K;  // number of population-level effects
matrix[N, K] X;  // population-level design matrix
int prior_only;  // should the likelihood be ignored?
}
transformed data {
}
parameters {
vector[K] b;  // population-level effects
real<lower=0> sigma;  // dispersion parameter
}
transformed parameters {
real lprior = 0;  // prior contributions to the log posterior
lprior += normal_lpdf(b[1] | 0, 1);
lprior += normal_lpdf(b[2] | 0, 1);
lprior += exponential_lpdf(sigma | 1);
}
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = rep_vector(0.0, N);
mu += X * b;
for (n in 1:N) {
target += normal_lpdf(Y[n] | mu[n], sigma) -
normal_lccdf(lb[n] | mu[n], sigma);
}
}
// priors including constants
target += lprior;
}
generated quantities {
}


Here, normal_lpdf(Y[n] | mu[n], sigma) - normal_lccdf(lb[n] | mu[n], sigma); means that the probability density is divided by the valid probability mass (i.e. the total mass that lies above the truncation threshold). This makes sense but I suspect that part of HMC for this model is to calculate gradients of normal_lccdf() w.r.t. mu and sigma, and evaluate them at the current values of mu, sigma and lb. Where lb is -Inf, the gradient is then not finite.

Your model should theoretically work but this problem arises due to how brms translates the model and how Stan interprets it, so this may be considered a bug.

2 Likes

I can confirm that the following works:

fit <- brm(
# recode the lb vector to swap -Inf for -1e5
data = dat %>% mutate(lb = ifelse(lb == -Inf, -1e5, lb)),
family = gaussian,
y | trunc(lb = lb) ~ 0 + Intercept + x,
prior = prior(normal(0, 1), class = b, coef = Intercept) +
prior(normal(0, 1), class = b, coef = x) +
prior(exponential(1), class = sigma),
seed = 1, chains = 1,
iter = 10
)


I agree with @wesley; this seems like a bug. @paul.buerkner, would you agree?

1 Like

Interestingly, when you do y | trunc(lb = -Inf) ~ 0 + Intercept + x, brms is smart enough to skip the normal_lccdf line, which strongly suggests that autodiffing normal_lccdf with -Inf as the input is the problem.

If that’s true, then the bug is at the Stan level, not the brms level, because the gradient of the normal lccdf with respect to either mu or sigma should clearly be zero when the lower bound is negative infinity. My c++ isn’t really up to snuff, but I didn’t see anything obviously untoward in math/normal_lccdf.hpp at 92075708b1d1796eb82e3b284cd11e544433518e · stan-dev/math · GitHub

The only potentially suspect thing that I noticed is at line 81, where Stan math has to be smart enough to understand that exp(-Inf) is zero. Is that not the case? Maybe this is something that needs to be special-cased? @rok_cesnovar @stevebronder

Edit to add: doesn’t look like exp(-Inf) should be a problem… 'scuse me while I learn the barest basics of c++ Compiler Explorer

1 Like

I think that something beyond just the normal_lccdf issues reported previously is at work here (maybe?). If it were just the lccdf stuff, then it should work to negate the response and the bounds and fit a model with upper-bound truncation using normal_lcdf. But that also fails to initialize on my system. Here’s the reprex. I use cmdstanr on the backend to make sure I have the latest version of normal_lcdf.

Generated Stan code follows the reprex. What do you think @spinkney?

# load
library(tidyverse)
library(brms)

# simulate
set.seed(1)
dat <- tibble(y = rnorm(n = 1000)) %>%
# truncate
filter(y > -2) %>%
# subset to just 100 cases from the truncated distribution
slice_sample(n = 100) %>%
bind_rows(tibble(y = rnorm(n = 100))) %>%
mutate(lb = rep(c(-2, -Inf), each = 100),
# add a predictor variable, which is confounded with the truncation periods
x = rep(0:1, each = 100),

y2 = -y,
ub = -lb)

# fit the model with weakly regularizing priors
fit <- brm(
data = dat,
family = gaussian,
y2 | trunc(ub = ub) ~ 0 + Intercept + x,
prior = prior(normal(0, 1), class = b, coef = Intercept) +
prior(normal(0, 1), class = b, coef = x) +
prior(exponential(1), class = sigma),
seed = 1, chains = 1,
# intentionally set small for debugging
iter = 10,
backend = "cmdstanr"
)

// generated with brms 2.18.0
functions {

}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
array[N] real ub; // upper truncation bounds
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
int prior_only; // should the likelihood be ignored?
}
transformed data {

}
parameters {
vector[K] b; // population-level effects
real<lower=0> sigma; // dispersion parameter
}
transformed parameters {
real lprior = 0; // prior contributions to the log posterior
lprior += normal_lpdf(b[1] | 0, 1);
lprior += normal_lpdf(b[2] | 0, 1);
lprior += exponential_lpdf(sigma | 1);
}
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = rep_vector(0.0, N);
mu += X * b;
for (n in 1 : N) {
target += normal_lpdf(Y[n] | mu[n], sigma)
- normal_lcdf(ub[n] | mu[n], sigma);
}
}
// priors including constants
target += lprior;
}
generated quantities {

}

1 Like

Note that a fix in brms might be possible by special-casing infinite inputs to the truncation bounds and dropping the corresponding cdf terms.

Yeah when you supply a literal -Inf as the truncation value, brms is smart enough to ignore it but when truncation is a variable that doesn’t happen.

Checking the data for -Inf when the Stan code is generated would fix this issue but would still have problems if the truncation was a parameter rather than data (not sure how you’d write that in brms but assuming it could be done).

A more general fix would be to generate an if statement in the Stan code that handles the truncation appropriately based on the value of the truncation variable at the time the gradients are evaluated. Dunno if this would greatly harm Stan’s ability to optimize the model code at compile time.

If truncation is a parameter then it will never be infinite.

Yes. What is needed on the Stan side is implementations of lccdf and lcdf that special-case infinite inputs.

The if statement thing is do-able but I don’t know how get this to work with opecncl because I don’t know opencl and everything is vectorized. Agh. I have an open PR for a similar fix in the weibul_lcdf.

Could you open an issue for this in stan-math?

Also tagging @rok_cesnovar for the opencl piece.

3 Likes

Thank you for the thoughts, @spinkney. Someone else will have to open this issue in stan-math. I don’t have the technical language to express this properly to others.

2 Likes