Measurement error models on lognormal distributed data

Hi
I want to model plant mass data that I recorded with a good idea of the measurement error on my data. Because plant mass can only be positive, I tried to model this with a lognormal error distribution in BRMS, making use of the “mi(sdy = [measurement error])” capabilities to incorporate measurement error. I include a toy example below. In this simulation, as is the case in my plant mass data, the Gaussian measurement error results in negative observations of mass.

#known measurement error of 0.5
m_err <- 0.5
#simulate lognormal real plant and observed plant mass
sim1_data <- tibble(n = 1:1000)%>%mutate(m_err = m_err,
                                         real_mass = exp(rnorm(n(), -2, 1)), 
                                         obs_mass = rnorm(n(), real_mass, m_err))

bf_sim1 <- bf(obs_mass | mi(sdy = m_err) ~ 1, family="lognormal")

m_sim1 <- brm(data = sim1_data, 
             bf_sim1,
             backend = "cmdstanr",
             prior = c(
               set_prior("normal(0,1)", class = "Intercept"),
               set_prior("cauchy(0,1)", class = "sigma")),
             iter = 500, warmup = 100)

Brms does not let me model this with negative observed outcomes: “Error: Family ‘lognormal’ requires response greater than 0.” However, when inspecting the stan code generated by BRMS using the make_stancode() function, it seems that it models the outcome variable as the result of a gaussian draw with mean the positive latent “real” plant mass with sd the given measurement error.

  1. Does this not mean that negative outcomes should be allowed. After all, negative outcomes have a non-zero probability under any guassian distribution no matter the mean.
  2. Is there a way to define the such a model that circumvents this restriction by using the latent variable imputation of BRMS? I tried fitting a multivariate formula with the formulas below and fixing the coefficients for mi(real_mass) and m_err to 1 so that the first formula models the measurement error process. This model has a very hard time converging.
bf_sim1_me <- bf(obs_mass ~ 0+mi(real_mass), sigma ~ 0 + m_err, family = 'normal')
bf_sim1_dwr <-bf(real_mass | mi() ~ 1, family = brmsfamily("lognormal", link_sigma = "identity"))

I am returning to the Stan/BRMS forums with a similar question to a previous post which did not yield a working solution (Model lognormal distributed real values with measurement error that yield negative observed values in brms). With the data set in question, I am always returning to this same problem. So, I thought to give it another shot.

Thank you for any help.

From your description, it looks like the mean of your outcome variable is what should be log-normally distributed, right? That is, if I’m understanding it right, your observed variable is a compound distribution: Normal with a log-normal mean and a constant SD. If so, I wonder whether this is something you can address with a non-linear formula; something like:

obs_mass ~ exp(true_mass),
true_mass ~ 1 + (1 | index),
sigma ~ 1,
nl = TRUE,
family = "gaussian"

You would need to add an index variable to allow true_mass to follow a log-normal distribution across observations; the random effect of index would then be the SD of the log-normal component (and the intercept would be the mean).

  1. Does this not mean that negative outcomes should be allowed. After all, negative outcomes have a non-zero probability under any gaussian distribution no matter the mean.

I certainly think you are right, this seems like a bug.

Are you open to building a Stan model for it (outside brms)? That might be the best way forward.

I would love to be wrong but I don’t think there is a way around the fact that you lack the weight of the envelope before drying…

Your goal is to relate dry mass to some predictors, presumably. Even if you get around the "Error: Family ‘lognormal’ requires response greater than 0.” your error is not measured per observation; it is a distribution of weights of the used envelopes, but it is disconnected from any given observation. See also this discussion.