I want to model data of a strictly positive response variable (plant dry mass) in brms using a lognormal error distribution but with a measurement error process that results in negative observed values. However, brms does not accept negative response variables when modelling a lognormal error distribution. Something along the lines of following example:
#simulate lognormal real plant and observed plant mass
sim_data <- tibble(n = 1:1000)%>%mutate(real_mass <- exp(rnorm(n(), -2, 1)), obs_mass = rnorm(n(), real_mass, 1))
bf_sim <- bf(obs_mass | mi(sdy = 1) ~ 1, family="lognormal")
m_sim <- brm(data = sim_data,
bf_sim,
backend = "cmdstanr",
prior = c(
set_prior("normal(0,1)", class = "Intercept"),
set_prior("cauchy(0,1)", class = "sigma")),
iter = 500, warmup = 100)
To add a bit more context on the response variable and where the measurement error comes from: I measured the dry mass of plants that were put into envelopes for drying. The measured mass was that of the envelope + plants. These envelopes, however, are not uniform but have a distribution of masses with a known mean and standard deviation. Rather than modelling the observed mass as a sum of two randomly distributed variables (I would not know how in brms), I wanted to model the measured mass subtracted with the mean envelope mass where the standard deviation of envelope mass is then modelled with the measurement error functionalities of brms on the (strictly positive) plant mass real values.
I am wondering whether this is possible to achieve in brms or not? If not, how I could take into account the envelope mass in another model? Or whether my model does not make sense at all (given lognormal error distribution on negative values).
You could use a shifted lognormal, where the shift is the envelope weight. That way, you can model the envelopes with mean/sd and incorporate it into your model.
Thank you for the response. How would you model the envelope weight as the shift (ndt parameter) in a shifted lognormal model? I have been able to get a shifted lognormal error distribution to work for as long as the observed mass is greater than the mean. Here is an adaptation to the previous example
sim_data <- tibble(n = 1:1000)%>%
#real mass values that are strict positive
mutate(real_mass = exp(rnorm(n(), -2, 0.1)),
#envelope mean and sd
envmean = 1,
envsd = 0.2,
#observed mass: real mass + mass of envelope (which has mean = 1, sd = 0.2, known from seperate measures of envelopes)
obs_mass = real_mass + rnorm(n(), envmean, envsd))
bf_sim <- bf(obs_mass ~ 1, ndt ~ 0 + envmean, family = shifted_lognormal(link_ndt = "identity"))
m_sim <- brm(data = sim_data%>%filter(obs_mass>envmean),
bf_sim, backend = "cmdstanr",
prior = c(
set_prior("normal(0,1)", class = "Intercept"),
set_prior("constant(1)", dpar = "ndt"),
set_prior("cauchy(0,1)", class = "sigma")),
iter = 500, warmup = 100,family = shifted_lognormal(link_ndt = "identity"))
What I have figured out is to specify the identity link on the ndt parameter to have it in the same scale as the envelope mass. Also, a shift equal to the envelope mass is modelled by specifying no intercept and fixing the beta for envmean at 1. I am still scratching my head at how to model the standard deviation of the envelopes. I tried adding a (1|n) to the ndt formula to model a different shift per observation with the prior for standard deviation fixed at 0.2 but that gives an error about rejecting initial values that give negative values somewhere in the calculation, pretty sure when Y - ndt is negative. For it to work, it seems that Y-ndt needs to be constrained to nonnegative values somehow.
How can I have the model work with all observed masses by accounting for standard deviation of the envelopes?
You are setting the shift to a constant with that prior. If you now have observed values below 1, than there is no parameter configuration that could make that possible.
I assume that without any additional predictors your model will have a hard time to identify if mass comes from the envelope or the plant in general. You either need a very strict prior (as you tried here, just make sure it actually is compatible with the data) or additional predictors to inform the model.
I assume that without any additional predictors your model will have a hard time to identify if mass comes from the envelope or the plant in general.
But canāt you make the same argument for when you model a measurement error on the response variable in any simple linear model. There, any change from the expected can be caused by either natural variation on the true value or the additional measurement error. But I always interpreted that it is possible to estimate the natural variation because the measurement error is given.
You could even argue that in the case of a shifted lognormal distribution, the model would have the information that any observation that is lower than the expected shift is, at least in part, due to variation in the envelope. But I guess that is not how it works.
In the meantime, I tried to model a measurement error equal to the envelope standard deviation like in the first example. bf(obs_mass| mi(envsd) ~ 1, ndt ~ 0 + envmean, family = shifted_lognormal(link_ndt = "identity")) Which resulted in the error āVariable ālbmiā is of invalid type.ā that is unclear to me.
If your shift is a constant, you canāt have observations below the shift, similar to how a beta is constrained to (0,1), the shifted lognormal is constrained to (shift, \infty). If you have a strong prior that allows for some variation, that is what I meant here:
You either need a very strict prior (as you tried here, just make sure it actually is compatible with the data) or additional predictors to inform the model.
I donāt have any experience with modeling measurement error so I canāt really help you further with that sadly.