# Lognormal regression and moment matching

summary I can tickle brms until it gives me a lognormal regression. However the result is so weird looking I wonder if it’s not just wrong or needless. Does this look weird to you too?

I have some data that I want to fit with a lognormal distribution – the data are positive, continuous numbers that have several extreme values. I want to model the mean of my response

As with many other distributions, the two parameters of the lognormal are each a function of both the mean \mu and standard deviation \sigma. I’ll call these two parameters a and b:

a = \ln\left(\frac{\mu^2}{\sqrt{\sigma^2 + \mu^2}}\right)
b = \sqrt{\ln\left(\frac{\sigma^2}{\mu^2} + 1\right)}

a is the median of the lognormal distribution. Its also the mean of the normal distribution you’d get if you took the log of these lognormal values. b is the standard deviation of those values.

## faking data

I want to simulate data from so-called Power Law curve, which is a classic in ecology for modelling e.g. how metabolism scales with mass

\begin{align} \text{metabolism} &\sim \text{LogNormal}\left(\ln\left(\frac{\mu^2}{\sqrt{\sigma^2 + \mu^2}}\right),\sqrt{\ln\left(\frac{\sigma^2}{\mu^2} + 1\right)}\right) \\ \mu &= M \times x ^z \\ \end{align}

With known values M = 17, z = .3 and \sigma = 11

suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(tidybayes))
suppressPackageStartupMessages(library(brms))

calculate_a_lnorm <- function(mean, sd){
log(mean^2/sqrt(sd^2 + mean^2))
}

calculate_b_lnorm <- function(mean, sd){
sqrt(log(sd^2 / mean^2 + 1))
}

known_M <- 17
known_z <- .3
known_s <- 11

fake_metabolism_data <- tibble(x = 1:500,
mean_response = known_M*x^(known_z),
obs_response = rlnorm(500,
meanlog = calculate_a_lnorm(mean = mean_response, sd = known_s),
sdlog = calculate_b_lnorm(mean = mean_response, sd = known_s))
)

fake_metabolism_data |>
ggplot(aes(x = x, y = obs_response)) + geom_point()


Created on 2022-02-24 by the reprex package (v2.0.0)

## recovering parameters

We can tickle brms until it fits this equation and beautifully recovers the parameters. However the result is so wild-looking that it makes me doubt my wisdom, good taste, and sanity:

formula_metabolic_matching <- bf(obs_response ~ 2*log(M*x^z) - .5*log(s^2 + (M*x^z)^2),
nl = TRUE) +
nlf(sigma ~ sqrt(log(s^2/(M*x^z)^2 + 1))) +
lf(M ~ 1) +
lf(z ~ 1) +
lf(s ~ 1)

get_prior(formula = formula_metabolic_matching,
data = fake_metabolism_data)

metabolism_priors <- c(prior(normal(10,2), class = "b", nlpar = "M"),
prior(beta(3, 4), class = "b", nlpar = "z", lb = 0, ub = 1),
prior(exponential(2), class = "b", nlpar = "s", lb = 0))

matching_model <- brm(formula_metabolic_matching, prior = metabolism_priors,
data = fake_metabolism_data, backend = "cmdstan", refresh = 0 , silent = 2, file = here::here("lognormal_demo.rds"))

big_dataset <- fake_metabolism_data |>

big_dataset |>
ggplot(aes(x = x, y = .prediction)) + stat_lineribbon() +
geom_point(aes(x = x, y = obs_response), data = fake_metabolism_data, pch = 21, fill = "orange") +
scale_fill_brewer(palette = "Greens", direction = 1) +
theme_dark()


Does this make sense to do? Do most people content themselves with modelling the median, not the mean, of a lognormal, and just relax about it? Is there another parameterization of the LogNormal that would make this pain go away?

This makes sense to me. An alternative is to create a custom family in brms, which I have already done for the Lognormal with natural scale parameter parameterization. It is available here: custom-brms-families/lognormal_natural.R at master · paul-buerkner/custom-brms-families (github.com)

To fit the same model with the custom family you do like this (note that there is a PR for new code for the Stan function and the helper functions, which I use here):

library(brms)

# helper functions for post-processing of the family
log_lik_lognormal_natural <- function(i, prep) {
mu <- prep$dpars$mu[, i]
if(NCOL(prep$dpars$sigma)==1){sigma <- prep$dpars$sigma}else
{sigma <- prep$dpars$sigma[, i]}   ## [, i] if sigma is modelled, without otherwise
y <- prep$data$Y[i]
common_term = log(1+sigma^2/mu^2)
Vectorize(dlnorm)(y, log(mu)-common_term/2, sqrt(common_term), log = TRUE)
}

posterior_predict_lognormal_natural <- function(i, prep, ...) {
mu <- prep$dpars$mu[, i]
if(NCOL(prep$dpars$sigma)==1){sigma <- prep$dpars$sigma}else
{sigma <- prep$dpars$sigma[, i]}   ## [, i] if sigma is modelled, without otherwise
common_term = log(1+sigma^2/mu^2)
rlnorm(n, log(mu)-common_term/2, sqrt(common_term))
}

posterior_epred_lognormal_natural <- function(prep) {
mu <- prep$dpars$mu
return(mu)
}

# definition of the custom family
custom_family(name = "lognormal_natural",
dpars = c("mu", "sigma"),
lb = c(0, 0),
type = "real") ->
lognormal_natural

stan_lognormal_natural <- "
real lognormal_natural_lpdf(real y, real mu, real sigma) {
real common_term = log(1+sigma^2/mu^2);
return lognormal_lpdf(y | log(mu)-common_term/2,
sqrt(common_term));
}
real lognormal_natural_rng(real mu, real sigma) {
real common_term = log(1+sigma^2/mu^2);
return lognormal_rng(log(mu)-common_term/2,
sqrt(common_term));
}
"

brm(formula = obs_response ~ log(x),
family = lognormal_natural,
stanvars = stanvar(scode = stan_lognormal_natural, block = "functions"),
data = fake_metabolism_data) ->
brm_model_natural_parameters


Note that the parameterization of the power-law changes a bit, so that exp(Intercept)=M and x is logged, due to

\begin{aligned} \log (\mu) =\ &\log \left(M \cdot x^{z}\right)=\\ & \log (M)+\log \left(x^{z}\right)=\\ & \log (M)+z \log (x) \end{aligned}

EDIT: Added some new code for the helper functions as well.

thank you very much @StaffanBetner ! I didn’t know about this grimoire of custom response families for brms, and I’m very grateful to learn of it. I hope that the natural parameterization of the lognormal does indeed become more widely available, since this distribution is a favourite of ecologists.