Hi all,
I’ve been having trouble with fitting some data using a non-linear function with the Beta family. In this case the data are proportional loss in area, so this is a true “proportion” and cannot be validly modeled with a binomial (there are no trials). It works fine a lot of the time, but I have quite a lot of examples where the predicted values for the fitted model do not fit the upper and lower extremes of the observed data at all.
To generate some simulated data that can reproduce this problem I used:
fct <- function(b_top, b_bot, b_ec50, b_beta, x) {
b_top + (b_bot - b_top) /
(1 + exp((b_ec50 - x) * exp(b_beta)))
}
linear_rescale <- function(x, r_out) {
p <- (x - min(x)) / (max(x) - min(x))
r_out[[1]] + p * (r_out[[2]] - r_out[[1]])
}
set.seed(10)
x <- sort(runif(100, 1, 3))
a <- 0.2
y_mean <- fct(b_beta = 3, b_bot = 0.001, b_ec50 = 2, b_top = 0.999, x)
y <- linear_rescale(rbeta(100, a, (a-a*y_mean)/y_mean), c(0.001, 0.999))
dat <- data.frame(x, y)
plot(dat$x, dat$y)
To fit the model in brms I’ve been using the following code - which basically uses exactly the same model formula to generated the simulated data. Note that the priors for the “top” and “bot” parameters are on the logit scale, as I am using a logit link.
library(brms)
bfmod <- brms::bf(y ~ top + (bot - top) / (1 + exp((ec50 - x) * exp(beta))),
bot + ec50 + top + beta ~ 1,
nl = TRUE)
my_prior <- c(prior_string("normal(5, 5)", nlpar = "top"),
prior_string("normal(-5, 5)", nlpar = "bot"),
prior_string("gamma(5, 0.04)", nlpar = "ec50"),
prior_string("normal(0, 3)", nlpar = "beta"))
tt <- brm(formula = bfmod, data = dat, prior = my_prior,
family = Beta(link="logit"))#, sample_prior='only')
tt_ce <- conditional_effects(tt)
plot(tt_ce, points = TRUE)
So you can see that the estimated upper asymptote is well below where it should be, and the lower one is way to high. I did play around with the prior for “beta”. Making this really vague causes it to take a really long time to run and while it did generate a more realistically sharp decline, it had no impact on the “top” and “bot” estimates. Making the priors on top and
Running directly in stan produces the same lack of fit outcome, with estimates for “bot” (transformed using the inverse logit) no where near the simulation values of 0.001 and 0.999.
sink("new1.stan")
cat(stancode(tt, version = FALSE))
sink()
fit <- stan(file = 'new1.stan', data = standata(tt))
summary(fit)$summary
I tried modelling this using a binomial, with the same link function and priors, using an arbitrary number of trials. I get a much better fit (albeit this is probably over-dispersed). So this suggests that the priors are ok on the non-linear model parameters in the context of the logit link function for the binomial. But I really can’t use a binomial here because I can’t justify and given selection of “trials” - and I’m not aware of any other bounded distributions that I can use here.
dat$trials <- 10
dat$success <- as.integer(round(dat$y*dat$trials))
bfmod_binom <- brms::bf(success | trials(trials) ~ top + (bot - top) / (1 + exp((ec50 - x) * exp(beta))),
bot + ec50 + top + beta ~ 1,
nl = TRUE)
ttbinom <- brm(formula = bfmod_binom, data = dat, prior = my_prior,
family = binomial(link="logit"))#, sample_prior='only')
plot(conditional_effects(ttbinom))[[1]] +
geom_point(data = ttbinom$data, mapping = aes(x = x, y = y), inherit.aes = FALSE)
I tried an exremely vague prior for phi, but still with no improvement (see code below). Does anyone have any other suggestions, or know what I am doing wrong here?
a <- 0.2
y_mean <- fct(b_beta = 3, b_bot = 0.001, b_ec50 = 2, b_top = 0.999, x)
y <- linear_rescale(rbeta(100, a, (a-a*y_mean)/y_mean), c(0.001, 0.999))
dat <- data.frame(x, y)
plot(dat$x, dat$y)
library(brms)
bfmod <- brms::bf(y ~ top + (bot - top) / (1 + exp((ec50 - x) * exp(beta))),
bot + ec50 + top + beta ~ 1,
nl = TRUE)
my_prior <- c(prior_string("normal(8, 1)", nlpar = "top"),
prior_string("normal(-8, 1)", nlpar = "bot"),
prior_string("gamma(5, 0.04)", nlpar = "ec50"),
prior_string("uniform(0.0000001, 100000)", nlpar = "beta"),
prior(uniform(0.000001, 10000), class = "phi"))
tt3 <- brm(formula = bfmod, data = dat, prior = my_prior,
family = Beta(link="logit"))#, sample_prior='only')
tt_ce <- conditional_effects(tt)
plot(tt_ce, points = TRUE)