Conditional truncation including -Inf

Solomon · March 5, 2023, 9:10pm

Operating System: MacOS
brms Version: 2.18.0

I have a problem where I’ve collected data at 2 conditions. For one condition, the data are truncated. They’re not truncated for the second condition. Ignoring truncation for a moment, my substantive model is:

\begin{align*} y_i & \sim \mathcal N(\mu_i, \sigma) \\ \mu_i & = \beta_0 + \beta_1 x_i, \end{align*}

where x_i is an 0/1 dummy variable for condition. To handle the truncation, I’d like to a fit a truncated normal version of the model where there is truncation for when x = 0, but no truncation otherwise. Here are some simulated data of that form, and my attempt to fit the model with brm().

# load
library(tidyverse)
library(brms)

# simulate
set.seed(1)
dat <- tibble(y = rnorm(n = 1000)) %>% 
  # truncate
  filter(y > -2) %>% 
  # subset to just 100 cases from the truncated distribution
  slice_sample(n = 100) %>% 
  # add in non-truncated cases
  bind_rows(tibble(y = rnorm(n = 100))) %>% 
  # add a truncation index
  mutate(lb = rep(c(-2, -Inf), each = 100),
         # add a predictor variable, which is confounded with the truncation periods
         x = rep(0:1, each = 100))

# fit the model with weakly regularizing priors
fit <- brm(
  data = dat,
  family = gaussian,
  y | trunc(lb = lb) ~ 0 + Intercept + x,
  prior = prior(normal(0, 1), class = b, coef = Intercept) +
    prior(normal(0, 1), class = b, coef = x) +
    prior(exponential(1), class = sigma),
  seed = 1, chains = 1,
  # intentionally set small for debugging
  iter = 10
)

I’ve tried tricks lke setting init_r = 0.2 or init = 0. I’ve even manually set my init values, like init = list(list(b = c(0, 0), sigma = 1)). No matter what, I always end up with warnings like this:

Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.

However, if I give up on the conditional truncation idea and manually set something like

 y | trunc(lb = -Inf) ~ 0 + Intercept + x

or

 y | trunc(lb = -5) ~ 0 + Intercept + x

the model fits just fine. What am I missing?

wesley · March 5, 2023, 10:41pm

There’s something odd about how truncation, Stan’s normal_lccdf(), and -Inf are interacting. You can make your model work by replacing -Inf by a large negative number.

I extracted the Stan code from your brms model via fit$model. It looks like this:

// generated with brms 2.18.0
functions {
}
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  real lb[N];  // lower truncation bounds;
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
}
parameters {
  vector[K] b;  // population-level effects
  real<lower=0> sigma;  // dispersion parameter
}
transformed parameters {
  real lprior = 0;  // prior contributions to the log posterior
  lprior += normal_lpdf(b[1] | 0, 1);
  lprior += normal_lpdf(b[2] | 0, 1);
  lprior += exponential_lpdf(sigma | 1);
}
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] mu = rep_vector(0.0, N);
    mu += X * b;
    for (n in 1:N) {
      target += normal_lpdf(Y[n] | mu[n], sigma) -
        normal_lccdf(lb[n] | mu[n], sigma);
    }
  }
  // priors including constants
  target += lprior;
}
generated quantities {
}

Here, normal_lpdf(Y[n] | mu[n], sigma) - normal_lccdf(lb[n] | mu[n], sigma); means that the probability density is divided by the valid probability mass (i.e. the total mass that lies above the truncation threshold). This makes sense but I suspect that part of HMC for this model is to calculate gradients of normal_lccdf() w.r.t. mu and sigma, and evaluate them at the current values of mu, sigma and lb. Where lb is -Inf, the gradient is then not finite.

Your model should theoretically work but this problem arises due to how brms translates the model and how Stan interprets it, so this may be considered a bug.

Solomon · March 5, 2023, 11:06pm

I can confirm that the following works:

fit <- brm(
  # recode the lb vector to swap -Inf for -1e5
  data = dat %>% mutate(lb = ifelse(lb == -Inf, -1e5, lb)),
  family = gaussian,
  y | trunc(lb = lb) ~ 0 + Intercept + x,
  prior = prior(normal(0, 1), class = b, coef = Intercept) +
    prior(normal(0, 1), class = b, coef = x) +
    prior(exponential(1), class = sigma),
  seed = 1, chains = 1,
  iter = 10
)

I agree with @wesley; this seems like a bug. @paul.buerkner, would you agree?

jsocolar · March 6, 2023, 2:45am

Interestingly, when you do y | trunc(lb = -Inf) ~ 0 + Intercept + x, brms is smart enough to skip the normal_lccdf line, which strongly suggests that autodiffing normal_lccdf with -Inf as the input is the problem.

If that’s true, then the bug is at the Stan level, not the brms level, because the gradient of the normal lccdf with respect to either mu or sigma should clearly be zero when the lower bound is negative infinity. My c++ isn’t really up to snuff, but I didn’t see anything obviously untoward in math/normal_lccdf.hpp at 92075708b1d1796eb82e3b284cd11e544433518e · stan-dev/math · GitHub

The only potentially suspect thing that I noticed is at line 81, where Stan math has to be smart enough to understand that exp(-Inf) is zero. Is that not the case? Maybe this is something that needs to be special-cased? @rok_cesnovar @stevebronder

Edit to add: doesn’t look like exp(-Inf) should be a problem… 'scuse me while I learn the barest basics of c++ Compiler Explorer

spinkney · March 6, 2023, 2:47am

I believe this is the same issue as Accuracy of normal_lccdf much worse than normal_lcdf · Issue #1985 · stan-dev/math · GitHub and Errors when right-censoring: bug in normal_lccdf? · Issue #2847 · stan-dev/math · GitHub

jsocolar · March 6, 2023, 3:03am

I think that something beyond just the normal_lccdf issues reported previously is at work here (maybe?). If it were just the lccdf stuff, then it should work to negate the response and the bounds and fit a model with upper-bound truncation using normal_lcdf. But that also fails to initialize on my system. Here’s the reprex. I use cmdstanr on the backend to make sure I have the latest version of normal_lcdf.

Generated Stan code follows the reprex. What do you think @spinkney?

# load
library(tidyverse)
library(brms)

# simulate
set.seed(1)
dat <- tibble(y = rnorm(n = 1000)) %>% 
  # truncate
  filter(y > -2) %>% 
  # subset to just 100 cases from the truncated distribution
  slice_sample(n = 100) %>% 
  # add in non-truncated cases
  bind_rows(tibble(y = rnorm(n = 100))) %>% 
  # add a truncation index
  mutate(lb = rep(c(-2, -Inf), each = 100),
         # add a predictor variable, which is confounded with the truncation periods
         x = rep(0:1, each = 100),
         
         y2 = -y,
         ub = -lb)

# fit the model with weakly regularizing priors
fit <- brm(
  data = dat,
  family = gaussian,
  y2 | trunc(ub = ub) ~ 0 + Intercept + x,
  prior = prior(normal(0, 1), class = b, coef = Intercept) +
    prior(normal(0, 1), class = b, coef = x) +
    prior(exponential(1), class = sigma),
  seed = 1, chains = 1,
  # intentionally set small for debugging
  iter = 10,
  backend = "cmdstanr"
)

// generated with brms 2.18.0
functions {
  
}
data {
  int<lower=1> N; // total number of observations
  vector[N] Y; // response variable
  array[N] real ub; // upper truncation bounds
  int<lower=1> K; // number of population-level effects
  matrix[N, K] X; // population-level design matrix
  int prior_only; // should the likelihood be ignored?
}
transformed data {
  
}
parameters {
  vector[K] b; // population-level effects
  real<lower=0> sigma; // dispersion parameter
}
transformed parameters {
  real lprior = 0; // prior contributions to the log posterior
  lprior += normal_lpdf(b[1] | 0, 1);
  lprior += normal_lpdf(b[2] | 0, 1);
  lprior += exponential_lpdf(sigma | 1);
}
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] mu = rep_vector(0.0, N);
    mu += X * b;
    for (n in 1 : N) {
      target += normal_lpdf(Y[n] | mu[n], sigma)
                - normal_lcdf(ub[n] | mu[n], sigma);
    }
  }
  // priors including constants
  target += lprior;
}
generated quantities {
  
}

jsocolar · March 6, 2023, 3:06am

Note that a fix in brms might be possible by special-casing infinite inputs to the truncation bounds and dropping the corresponding cdf terms.

wesley · March 6, 2023, 6:11am

Yeah when you supply a literal -Inf as the truncation value, brms is smart enough to ignore it but when truncation is a variable that doesn’t happen.

Checking the data for -Inf when the Stan code is generated would fix this issue but would still have problems if the truncation was a parameter rather than data (not sure how you’d write that in brms but assuming it could be done).

A more general fix would be to generate an if statement in the Stan code that handles the truncation appropriately based on the value of the truncation variable at the time the gradients are evaluated. Dunno if this would greatly harm Stan’s ability to optimize the model code at compile time.

jsocolar · March 6, 2023, 2:53pm

If truncation is a parameter then it will never be infinite.

Yes. What is needed on the Stan side is implementations of lccdf and lcdf that special-case infinite inputs.

spinkney · March 6, 2023, 4:02pm

The if statement thing is do-able but I don’t know how get this to work with opecncl because I don’t know opencl and everything is vectorized. Agh. I have an open PR for a similar fix in the weibul_lcdf.

Could you open an issue for this in stan-math?

Also tagging @rok_cesnovar for the opencl piece.

Solomon · March 6, 2023, 4:23pm

Thank you for the thoughts, @spinkney. Someone else will have to open this issue in stan-math. I don’t have the technical language to express this properly to others.

jsocolar · March 6, 2023, 5:59pm

github.com/stan-dev/math

normal_lcdf and normal_lccdf give infinite gradients for infinite inputs.

opened 05:58PM - 06 Mar 23 UTC

jsocolar

## Description Consider a program: ``` data{ real y; } parameters{ …real mu; real<lower = 0> sigma; } model{ target += normal_lcdf(y | mu, sigma); } ``` When `y` is infinite (and given that `mu` and `sigma` are finite, here ensured by declaring them as parameters), then the gradient is zero. However, it seems that the Stan yields infinite gradients here. This came up "in the wild" here: https://discourse.mc-stan.org/t/conditional-truncation-including-inf/30627 If possible, it would be nice to special-case infinite inputs and return the correct gradients. However, there's some question beyond my expertise of whether this special-casing would play nicely with vectorization in opencl.

Topic		Replies	Views
Trunc() and conditional_effects() brms	4	1980	February 24, 2021
Getting around truncation limitation for mixture models in brms brms	5	2445	June 19, 2018
Understanding truncated lognormal in brms brms	4	1611	November 19, 2020
Prior Truncation in brms [lb,ub] brms	6	3260	March 23, 2019
Zero-truncated count data brms fitting-issues	7	2555	March 7, 2022

Conditional truncation including -Inf

Related topics