# Getting around truncation limitation for mixture models in brms

#1

I am aware that truncated distributions are not yet supported in brms formulas for mixture models, but was wondering if there was a strategy for getting around this limitation.

``````mix <- mixture(gaussian, gaussian)
prior <- c(
prior(normal(mup1, varp1), Intercept, dpar = mu1),
prior(normal(mup2, varp2), Intercept, dpar = mu2)
)
fit1 <- brm(bf(response ~ 1, theta1 ~ s(days, k=15)), data = ts, family = mix,
prior = prior, chains = 2, stanvars = stanvars)
``````

However, the distribution of my response variable looks like this, because the instrument that measures these values has a saturation limit:

I suspect this can be done by making `mix` a mixture of custom distributions, but I’m not sure how to go about it.

• Operating System: Debian 9
• brms Version: 2.3.1

#2

Yeah would you could do is to define a truncated normal distribution as a custom family and then put that into your mixture model. You can see how a truncated normal looks like in stan code my running

``````make_stancode(y | trunc(lb = 0, ub = 5) ~ 1, data = data.frame(y = 1), family = gaussian())
``````

#3

Thanks, @paul.buerkner. Following the fix on the GitHub version, I was able to implement this. Here’s a working example:

``````library(dplyr)
library(brms) #using GitHub brms, following fix of #453

#make some data
dat <- data_frame(response = c(rnorm(200, 3, 1), rnorm(200, 6, 1))) %>%
filter(response < 8)

# Define a distribution of a normal with a truncated upper bound
normal_ub <- custom_family("normal_ub", dpars = c("mu", "sigma", "upper"),
type = "real",
lb = c(NA, 0, NA), ub = c(NA, NA, NA))
stan_funs <- "
real normal_ub_lpdf(real y, real mu, real sigma, real upper) {
return normal_lpdf(y | mu, sigma) - normal_lcdf(upper | mu, sigma);
}
real normal_ub_rng(real mu, real sigma, real upper) {
real p = normal_cdf(upper, mu, sigma);  // cdf for bounds
real u = uniform_rng(0, p);
return inv_Phi(u)*sigma + mu;  // inverse cdf for value
}
"

mix <- mixture(gaussian, normal_ub, order = 'mu')
prior <- c(
prior(normal(3, 1), class="Intercept", dpar = "mu1"),
prior(normal(6, 1), class="Intercept", dpar = "mu2"),
prior(normal(8.1, 0.0001), class="upper2") #use get_prior() to find the name
)

trunc_mod <- brm(bf(response ~ 1), data = dat, family = mix,
stan_funs = stan_funs, prior = prior,
iter = 1000, warmup = 500, chains = 1,
control = list(adapt_delta = 0.999, max_treedepth=12),
save_model = "test_truncated.stan",
save_dso = TRUE,
seed = 89)

#Get our distribution functions to use post-fitting
expose_functions(trunc_mod, vectorize = TRUE)

log_lik_normal_ub <- function(i, draws) {
mu <- draws\$dpars\$mu[, i]
sigma <- draws\$dpars\$sigma
upper <- draws\$dpars\$upper
y <- draws\$data\$Y[i]
normal_ub_lpmf(y, mu, sigma, upper)
}

predict_normal_ub <- function(i, draws, ...) {
mu <- draws\$dpars\$mu[, i]
sigma <- draws\$dpars\$sigma
upper <- draws\$dpars\$upper
normal_ub_rng(mu, sigma, upper)
}

pp_check(trunc_mod)
``````

#4

Continuing this odyssey! I realized that since the effect is saturating rather than cut-off, what I really need is a nonlinear-transformation of the gaussian mixture, which I’ve attempted to formulate like so:

``````library(dplyr)
library(ggplot2)
library(brms) #current github version

#make some data
gamma_r = 7
eta_r = 1
dat <- data_frame(days = rep(10*(0:10), each=50),
frac = 0.5 + 0.5*sin((days*2*pi/50))) %>%
group_by(days) %>%
mutate(seropos = runif(n()) > frac) %>%
group_by() %>%
mutate(titer = rnorm(n(), mean = if_else(seropos, 2, 6), sd = 1),
response = gamma_r * exp(-titer/gamma_r)^eta_r)

ggplot(dat, aes(x=days, y = response, group=days)) +
geom_violin() +
geom_point()

gmix <- mixture(gaussian, gaussian, order = 'mu')
nl_mix <- bf(
response ~ 1,
mu1 ~ gamma * exp(-t1/gamma)^eta,
mu2 ~ gamma * exp(-t2/gamma)^eta,
t1 ~ 1,
t2 ~ 1,
eta ~ 1,
gamma ~ 1,
theta1 ~ s(days,bs="tp", k = 15, m=2),
nl = TRUE
)
``````

However, when I try to run `get_prior()` on this to understand what priors to specify, I get an error:

``````get_prior(formula = nl_mix,
family = gmix,
data = dat)
``````
``````Error in terms.formula(as.formula(x)) : invalid power in formula
``````

I’m not sure if I’m specifying the formula incorrectly or once again stretching the limits of brms’s code-writing machinery. I tried wrapping the power terms in `I()` as you would a regular R formula, but then I get

``````Error: The following variables are missing in 'data':
'gamma', 't1', 'eta', 't2'
``````

#5

I am not sure if I should think of this as indented behavior or not, but the problem is the following.

You no longer predict “mu” itself but “mu1” and “mu2”. When you specifiy non-linear formulas for “mu1” and “mu2”, the nl = TRUE doesn’t automatically apply for them and you get this weird error.

If you try

``````nl_mix <- bf(
response ~ 1,
nlf(mu1 ~ gamma * exp(-t1/gamma)^eta),
nlf(mu2 ~ gamma * exp(-t2/gamma)^eta),
t1 ~ 1,
t2 ~ 1,
eta ~ 1,
gamma ~ 1,
theta1 ~ s(days,bs="tp", k = 15, m=2),
nl = TRUE
)
``````

you see the actual error message telling you what mixture do not (yet) support non-linear formulas. Feel free to open an issue on github if you want me to remember to get this implemented at some point.

#6

Thanks! I’ll open that issue and I guess go to modifying the Stan code for now.