Lognormal regression and moment matching

summary I can tickle brms until it gives me a lognormal regression. However the result is so weird looking I wonder if it’s not just wrong or needless. Does this look weird to you too?

I have some data that I want to fit with a lognormal distribution – the data are positive, continuous numbers that have several extreme values. I want to model the mean of my response

As with many other distributions, the two parameters of the lognormal are each a function of both the mean \mu and standard deviation \sigma. I’ll call these two parameters a and b:

a = \ln\left(\frac{\mu^2}{\sqrt{\sigma^2 + \mu^2}}\right)
b = \sqrt{\ln\left(\frac{\sigma^2}{\mu^2} + 1\right)}

a is the median of the lognormal distribution. Its also the mean of the normal distribution you’d get if you took the log of these lognormal values. b is the standard deviation of those values.

faking data

I want to simulate data from so-called Power Law curve, which is a classic in ecology for modelling e.g. how metabolism scales with mass

\begin{align} \text{metabolism} &\sim \text{LogNormal}\left(\ln\left(\frac{\mu^2}{\sqrt{\sigma^2 + \mu^2}}\right),\sqrt{\ln\left(\frac{\sigma^2}{\mu^2} + 1\right)}\right) \\ \mu &= M \times x ^z \\ \end{align}

With known values M = 17, z = .3 and \sigma = 11

suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(tidybayes))
suppressPackageStartupMessages(library(brms))


calculate_a_lnorm <- function(mean, sd){
  log(mean^2/sqrt(sd^2 + mean^2))
}

calculate_b_lnorm <- function(mean, sd){
  sqrt(log(sd^2 / mean^2 + 1))
}



known_M <- 17
known_z <- .3
known_s <- 11

fake_metabolism_data <- tibble(x = 1:500,
       mean_response = known_M*x^(known_z),
       obs_response = rlnorm(500, 
                             meanlog = calculate_a_lnorm(mean = mean_response, sd = known_s),
                             sdlog = calculate_b_lnorm(mean = mean_response, sd = known_s))
       )


fake_metabolism_data |> 
  ggplot(aes(x = x, y = obs_response)) + geom_point()

Created on 2022-02-24 by the reprex package (v2.0.0)

recovering parameters

We can tickle brms until it fits this equation and beautifully recovers the parameters. However the result is so wild-looking that it makes me doubt my wisdom, good taste, and sanity:

formula_metabolic_matching <- bf(obs_response ~ 2*log(M*x^z) - .5*log(s^2 + (M*x^z)^2), 
                                 family = lognormal(link = "identity",
                                                    link_sigma = "identity"),
                                 nl = TRUE) + 
  nlf(sigma ~ sqrt(log(s^2/(M*x^z)^2 + 1))) + 
  lf(M ~ 1) + 
  lf(z ~ 1) + 
  lf(s ~ 1)

get_prior(formula = formula_metabolic_matching,
          data = fake_metabolism_data)

metabolism_priors <- c(prior(normal(10,2), class = "b", nlpar = "M"),
                       prior(beta(3, 4), class = "b", nlpar = "z", lb = 0, ub = 1),
                       prior(exponential(2), class = "b", nlpar = "s", lb = 0))


matching_model <- brm(formula_metabolic_matching, prior = metabolism_priors,
                      data = fake_metabolism_data, backend = "cmdstan", refresh = 0 , silent = 2, file = here::here("lognormal_demo.rds"))


big_dataset <- fake_metabolism_data |> 
  add_predicted_draws(matching_model, ndraws = 1000)

big_dataset |> 
  ggplot(aes(x = x, y = .prediction)) + stat_lineribbon() + 
  geom_point(aes(x = x, y = obs_response), data = fake_metabolism_data, pch = 21, fill = "orange") + 
  scale_fill_brewer(palette = "Greens", direction = 1) + 
  theme_dark()

Does this make sense to do? Do most people content themselves with modelling the median, not the mean, of a lognormal, and just relax about it? Is there another parameterization of the LogNormal that would make this pain go away?

This makes sense to me. An alternative is to create a custom family in brms, which I have already done for the Lognormal with natural scale parameter parameterization. It is available here: custom-brms-families/lognormal_natural.R at master · paul-buerkner/custom-brms-families (github.com)

To fit the same model with the custom family you do like this (note that there is a PR for new code for the Stan function and the helper functions, which I use here):

library(brms)

# helper functions for post-processing of the family
log_lik_lognormal_natural <- function(i, prep) {
  mu <- prep$dpars$mu[, i]
  if(NCOL(prep$dpars$sigma)==1){sigma <- prep$dpars$sigma}else
  {sigma <- prep$dpars$sigma[, i]}   ## [, i] if sigma is modelled, without otherwise
  y <- prep$data$Y[i]
  common_term = log(1+sigma^2/mu^2)
  Vectorize(dlnorm)(y, log(mu)-common_term/2, sqrt(common_term), log = TRUE)
}


posterior_predict_lognormal_natural <- function(i, prep, ...) {
  mu <- prep$dpars$mu[, i]
  if(NCOL(prep$dpars$sigma)==1){sigma <- prep$dpars$sigma}else
  {sigma <- prep$dpars$sigma[, i]}   ## [, i] if sigma is modelled, without otherwise
  common_term = log(1+sigma^2/mu^2)
  rlnorm(n, log(mu)-common_term/2, sqrt(common_term))
}

posterior_epred_lognormal_natural <- function(prep) {
  mu <- prep$dpars$mu
  return(mu)
}

# definition of the custom family
custom_family(name = "lognormal_natural", 
              dpars = c("mu", "sigma"), 
              links = c("log", "log"), 
              lb = c(0, 0), 
              type = "real") ->
  lognormal_natural

# additionally required Stan code
stan_lognormal_natural <- "
  real lognormal_natural_lpdf(real y, real mu, real sigma) {
    real common_term = log(1+sigma^2/mu^2);
    return lognormal_lpdf(y | log(mu)-common_term/2, 
                              sqrt(common_term));
  }
  real lognormal_natural_rng(real mu, real sigma) {
    real common_term = log(1+sigma^2/mu^2);
    return lognormal_rng(log(mu)-common_term/2, 
                            sqrt(common_term));
  }
"


brm(formula = obs_response ~ log(x), 
    family = lognormal_natural, 
    stanvars = stanvar(scode = stan_lognormal_natural, block = "functions"), 
    data = fake_metabolism_data) -> 
  brm_model_natural_parameters

Note that the parameterization of the power-law changes a bit, so that exp(Intercept)=M and x is logged, due to

\begin{aligned} \log (\mu) =\ &\log \left(M \cdot x^{z}\right)=\\ & \log (M)+\log \left(x^{z}\right)=\\ & \log (M)+z \log (x) \end{aligned}

EDIT: Added some new code for the helper functions as well.

thank you very much @StaffanBetner ! I didn’t know about this grimoire of custom response families for brms, and I’m very grateful to learn of it. I hope that the natural parameterization of the lognormal does indeed become more widely available, since this distribution is a favourite of ecologists.