Getting around truncation limitation for mixture models in brms


#1

I am aware that truncated distributions are not yet supported in brms formulas for mixture models, but was wondering if there was a strategy for getting around this limitation.

mix <- mixture(gaussian, gaussian)
prior <- c(
  prior(normal(mup1, varp1), Intercept, dpar = mu1),
  prior(normal(mup2, varp2), Intercept, dpar = mu2)
)
fit1 <- brm(bf(response ~ 1, theta1 ~ s(days, k=15)), data = ts, family = mix,
            prior = prior, chains = 2, stanvars = stanvars)

However, the distribution of my response variable looks like this, because the instrument that measures these values has a saturation limit:

11

I suspect this can be done by making mix a mixture of custom distributions, but I’m not sure how to go about it.

  • Operating System: Debian 9
  • brms Version: 2.3.1

#2

Yeah would you could do is to define a truncated normal distribution as a custom family and then put that into your mixture model. You can see how a truncated normal looks like in stan code my running

make_stancode(y | trunc(lb = 0, ub = 5) ~ 1, data = data.frame(y = 1), family = gaussian())

#3

Thanks, @paul.buerkner. Following the fix on the GitHub version, I was able to implement this. Here’s a working example:

library(dplyr)
library(brms) #using GitHub brms, following fix of #453

#make some data
dat <- data_frame(response = c(rnorm(200, 3, 1), rnorm(200, 6, 1))) %>%
  filter(response < 8)

# Define a distribution of a normal with a truncated upper bound
normal_ub <- custom_family("normal_ub", dpars = c("mu", "sigma", "upper"),
                           links = c("identity","identity","identity"),
                           type = "real",
                           lb = c(NA, 0, NA), ub = c(NA, NA, NA))
stan_funs <- "
real normal_ub_lpdf(real y, real mu, real sigma, real upper) {
return normal_lpdf(y | mu, sigma) - normal_lcdf(upper | mu, sigma);
}
real normal_ub_rng(real mu, real sigma, real upper) {
real p = normal_cdf(upper, mu, sigma);  // cdf for bounds
real u = uniform_rng(0, p);
return inv_Phi(u)*sigma + mu;  // inverse cdf for value
}
"

mix <- mixture(gaussian, normal_ub, order = 'mu')
prior <- c(
  prior(normal(3, 1), class="Intercept", dpar = "mu1"),
  prior(normal(6, 1), class="Intercept", dpar = "mu2"),
  prior(normal(8.1, 0.0001), class="upper2") #use get_prior() to find the name
)

trunc_mod <- brm(bf(response ~ 1), data = dat, family = mix,
                 stan_funs = stan_funs, prior = prior,
                 iter = 1000, warmup = 500, chains = 1,
                 control = list(adapt_delta = 0.999, max_treedepth=12),
                 save_model = "test_truncated.stan",
                 save_dso = TRUE,
                 seed = 89)

#Get our distribution functions to use post-fitting
expose_functions(trunc_mod, vectorize = TRUE)

log_lik_normal_ub <- function(i, draws) {
  mu <- draws$dpars$mu[, i]
  sigma <- draws$dpars$sigma
  upper <- draws$dpars$upper
  y <- draws$data$Y[i]
  normal_ub_lpmf(y, mu, sigma, upper)
}

predict_normal_ub <- function(i, draws, ...) {
  mu <- draws$dpars$mu[, i]
  sigma <- draws$dpars$sigma
  upper <- draws$dpars$upper
  normal_ub_rng(mu, sigma, upper)
}

pp_check(trunc_mod)

#4

Continuing this odyssey! I realized that since the effect is saturating rather than cut-off, what I really need is a nonlinear-transformation of the gaussian mixture, which I’ve attempted to formulate like so:

library(dplyr)
library(ggplot2)
library(brms) #current github version

#make some data
gamma_r = 7
eta_r = 1
dat <- data_frame(days = rep(10*(0:10), each=50),
                 frac = 0.5 + 0.5*sin((days*2*pi/50))) %>%
  group_by(days) %>%
  mutate(seropos = runif(n()) > frac) %>%
  group_by() %>%
  mutate(titer = rnorm(n(), mean = if_else(seropos, 2, 6), sd = 1),
         response = gamma_r * exp(-titer/gamma_r)^eta_r)


ggplot(dat, aes(x=days, y = response, group=days)) +
  geom_violin() +
  geom_point()


gmix <- mixture(gaussian, gaussian, order = 'mu')
nl_mix <- bf(
  response ~ 1,
  mu1 ~ gamma * exp(-t1/gamma)^eta,
  mu2 ~ gamma * exp(-t2/gamma)^eta,
  t1 ~ 1,
  t2 ~ 1,
  eta ~ 1,
  gamma ~ 1,
  theta1 ~ s(days,bs="tp", k = 15, m=2),
  nl = TRUE
)

However, when I try to run get_prior() on this to understand what priors to specify, I get an error:

get_prior(formula = nl_mix,
          family = gmix,
          data = dat)
Error in terms.formula(as.formula(x)) : invalid power in formula

I’m not sure if I’m specifying the formula incorrectly or once again stretching the limits of brms’s code-writing machinery. I tried wrapping the power terms in I() as you would a regular R formula, but then I get

Error: The following variables are missing in 'data':
'gamma', 't1', 'eta', 't2'

#5

I am not sure if I should think of this as indented behavior or not, but the problem is the following.

You no longer predict “mu” itself but “mu1” and “mu2”. When you specifiy non-linear formulas for “mu1” and “mu2”, the nl = TRUE doesn’t automatically apply for them and you get this weird error.

If you try

nl_mix <- bf(
  response ~ 1,
  nlf(mu1 ~ gamma * exp(-t1/gamma)^eta),
  nlf(mu2 ~ gamma * exp(-t2/gamma)^eta),
  t1 ~ 1,
  t2 ~ 1,
  eta ~ 1,
  gamma ~ 1,
  theta1 ~ s(days,bs="tp", k = 15, m=2),
  nl = TRUE
)

you see the actual error message telling you what mixture do not (yet) support non-linear formulas. Feel free to open an issue on github if you want me to remember to get this implemented at some point.


#6

Thanks! I’ll open that issue and I guess go to modifying the Stan code for now.