Difference between mi() and se() for response measurement error

Hi,

I am trying to incorporate measurement uncertainty in some models, following the example in Statistical Rethinking (15.1) as shown here. I have read the discussion in this issue as well as Paul’s note about mi()'s functionality being expanded to handle both measurement error/noise as well as missing values and I am fairly confused about the differences (if any) between them. My understanding is that
y.m | se(y.sd, sigma = TRUE) and y.m | mi(y.sd) do the same thing. But then if I try to use se(y.sd, sigma = TRUE) for a model with a beta distribution I get

Error: Argument 'se' is not supported for family 'beta(logit)'.

However if I use mi() instead, brm proceeds without an issue, but I am concerned it does not do what I think it does.

So what is going on here, are there differences or are they interchangeable?

I think it could help people to answer you question if you posted the full model.

My understanding is that one would use se() in a meta-analysis context where one gets the error variance for the outcome from the data.

In contrast, x | mi(sdy = se) ~ z is used when you know the error variance of the covariate x (it is in the data). If you have only one predictor z with no missing data, this would be a measurement error model with know error variance of the covariate.

Lastly x | mi() ~ z would be a simple imputation model (some z are missing) or measurement error model (no z are missing) with estimated error variance.

To the best of my knowledge mi() supports only gaussian modelling.
Given that se() is the error variance of the outome, this also requires a gaussian model. This could explain the error message when you use the beta family.

Thank you for your reply. My model has the formula

Resp.mu|mi(Resp.sd) ~ 1 + cont + cat + cont:cat + (1 + cat|unit)

where cont and cat are a continuous and a 3 level categorical predictor respectively.

I am a bit unsure what you main question is now.
I hope my earlier response could clarify that se() is not used to model measurement error, this is done with me() or mi(). To better understand where the error is comming from, one would need to see the full model, or even better a reproducible example (with fake data).

Regarding your model formula
This seems to be only part of the model, because (a) there is no family = beta(logit) statement and (b) this is only the imputation model (for Resp.mu) and there is not outcome model in which the imputed values are used (e.g. y ~ mi(Resp.mu) + ... ).

As an aside, it seems unusual to me to have a measurement error model where the response variable Resp.mu is modeled by two different variables. I would have expected a mesurement error of this form:

bf(
y ~ mi(x.latent),
x.latent | mi(sdy = x.sd) ~ 0 + x.measured
)

which should be the same as

y ~ se(x.measured, x.sd)

Hello,

I’ve been trying to wrap my head around the measurement error options in brms for the response variable, and I’m still unsure about the practical differences between modelling a response with measurement error in brms using, e.g. y | mi(sd) vs. y | se(sd, sigma = TRUE).

For example, some sources seem to suggest that they are equivalent (e.g., here and here) in practice, even if the underlying likelihood statements in the stan code are different. In fact, the second link seems to suggest that using y | se(sd, sigma = TRUE) might often bypass convergence issues encountered with y | mi(sd) (and I have experienced this recently).

So, imagine a dataset that contains a continuous response y which was obtained from an instrument that also provides uncertainty for each observation, and a continuous predictor x — for simplicity of this example, let’s assume that it is extremely precise and carries no error:

y | mi(sd)

library(tidyverse)
library(brms)
set.seed(10)
dat <- data.frame(x = rnorm(100, 10, 2)) |>
  dplyr::mutate(
    y = rnorm(100, -10 + x * 5, 1), sd_y = rgamma(100, 2, 5)
  )
brms::make_stancode(y | mi(sd_y) ~ x, data = dat)

In this case in which no missing observations exist in the dataset, the key elements in the stancode would be

data {
....
  // data for measurement-error in the response
  vector<lower=0>[N] noise;
....
}
....
parameters {
  vector[N] Yl;  // latent variable
....
}
....
model {
  // likelihood including constants
  if (!prior_only) {
    target += normal_id_glm_lpdf(Yl | Xc, Intercept, b, sigma);
  }
  // priors including constants
  target += lprior;
  target += normal_lpdf(Y[Jme] | Yl[Jme], noise[Jme]);
}
....

y | se(sd, sigma = TRUE)

brms::make_stancode(y | se(sd_y, sigma = TRUE) ~ x, data = dat)
data {
....
  vector<lower=0>[N] se;  // known sampling error
....
}
transformed data {
  vector<lower=0>[N] se2 = square(se);
....
}
....
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] mu = rep_vector(0.0, N);
    mu += Intercept + Xc * b;
    target += normal_lpdf(Y | mu, sqrt(square(sigma) + se2));
  }
  // priors including constants
  target += lprior;
}
....

The first likelihood declaration with a latent step states

Y_l \sim N(\mu, \sigma)
Y \sim N(Y_l, \sigma_{Y})

and the second states

Y \sim N\left(\mu, \sqrt{\sigma^2 + \sigma_{Y}^2}\right)

Noting that se == noise between the two stan codes (i.e. they are the same vector, just named differently depending on the input formula syntax), and borrowing rationale from here, the two statements could be made equivalent if:

U \sim N(0, \sigma_{Y})
Y_l \sim N(\mu, \sigma)
Y = U + Y_l \rightarrow Y \sim N\left(\mu, \sqrt{\sigma^2 + \sigma_{Y}^2}\right)

But I am not sure if there’s some flawed logic in there. I was hoping someone might be able to provide some definitive clarity on the issue, and whether my interpretation of the equivalence is correct.

Thanks!!

1 Like

You are interpreting everything correctly. The key point here is that

\mathcal{N}\left(y | \mu, \sqrt{\sigma_1^2 + \sigma_2^2}\right) \propto \int_{-\infty}^\infty \left(\mathcal{N}(y | l, \sigma_1) \times \mathcal{N}(l | \mu, \sigma_2) \right)dl

where \mathcal{N} is the normal probability density function, y is data, and l a dummy equivalent to your Y_l. Thus we say that this normal mixture of normals marginalizes to a normal, where marginalization refers to summing over all of the possible values of l, in this case by integrating over all the reals. You can ask Stan to do this integration for you numerically by declaring l as a parameter and integrating via MCMC, or you can do the integration yourself and give stan the marginalized version. When possible, it’s almost always more efficient to marginalize, and that will certainly be the case here.

1 Like

Thanks very much. I believe it would be useful (and a time saver) to have your explanation incorporated into brms documentations of mi() and se() for the cases where the user is not necessarily interested in estimating the latent vector.