summary I can tickle brms
until it gives me a lognormal regression. However the result is so weird looking I wonder if it’s not just wrong or needless. Does this look weird to you too?
I have some data that I want to fit with a lognormal distribution – the data are positive, continuous numbers that have several extreme values. I want to model the mean of my response
As with many other distributions, the two parameters of the lognormal are each a function of both the mean \mu and standard deviation \sigma. I’ll call these two parameters a and b:
a is the median of the lognormal distribution. Its also the mean of the normal distribution you’d get if you took the log of these lognormal values. b is the standard deviation of those values.
faking data
I want to simulate data from so-called Power Law curve, which is a classic in ecology for modelling e.g. how metabolism scales with mass
With known values M = 17, z = .3 and \sigma = 11
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(tidybayes))
suppressPackageStartupMessages(library(brms))
calculate_a_lnorm <- function(mean, sd){
log(mean^2/sqrt(sd^2 + mean^2))
}
calculate_b_lnorm <- function(mean, sd){
sqrt(log(sd^2 / mean^2 + 1))
}
known_M <- 17
known_z <- .3
known_s <- 11
fake_metabolism_data <- tibble(x = 1:500,
mean_response = known_M*x^(known_z),
obs_response = rlnorm(500,
meanlog = calculate_a_lnorm(mean = mean_response, sd = known_s),
sdlog = calculate_b_lnorm(mean = mean_response, sd = known_s))
)
fake_metabolism_data |>
ggplot(aes(x = x, y = obs_response)) + geom_point()
Created on 2022-02-24 by the reprex package (v2.0.0)
recovering parameters
We can tickle brms until it fits this equation and beautifully recovers the parameters. However the result is so wild-looking that it makes me doubt my wisdom, good taste, and sanity:
formula_metabolic_matching <- bf(obs_response ~ 2*log(M*x^z) - .5*log(s^2 + (M*x^z)^2),
family = lognormal(link = "identity",
link_sigma = "identity"),
nl = TRUE) +
nlf(sigma ~ sqrt(log(s^2/(M*x^z)^2 + 1))) +
lf(M ~ 1) +
lf(z ~ 1) +
lf(s ~ 1)
get_prior(formula = formula_metabolic_matching,
data = fake_metabolism_data)
metabolism_priors <- c(prior(normal(10,2), class = "b", nlpar = "M"),
prior(beta(3, 4), class = "b", nlpar = "z", lb = 0, ub = 1),
prior(exponential(2), class = "b", nlpar = "s", lb = 0))
matching_model <- brm(formula_metabolic_matching, prior = metabolism_priors,
data = fake_metabolism_data, backend = "cmdstan", refresh = 0 , silent = 2, file = here::here("lognormal_demo.rds"))
big_dataset <- fake_metabolism_data |>
add_predicted_draws(matching_model, ndraws = 1000)
big_dataset |>
ggplot(aes(x = x, y = .prediction)) + stat_lineribbon() +
geom_point(aes(x = x, y = obs_response), data = fake_metabolism_data, pch = 21, fill = "orange") +
scale_fill_brewer(palette = "Greens", direction = 1) +
theme_dark()
Does this make sense to do? Do most people content themselves with modelling the median, not the mean, of a lognormal, and just relax about it? Is there another parameterization of the LogNormal that would make this pain go away?