Using loo in a linear regression with data with measurement errors

Hello.

I’m using the loo package to compare linear regression models, when the outcome variable has measurement errors. My problem is that I’m not sure of the expression of likelihood required by loo.

Given a model like this:
y_i \sim \mathrm{normal}(u_i, s_i)
u_i \sim \mathrm{normal}(\beta_0 + \beta_1 x_i, \sigma)

The values y_i, s_i and x_i are observed. s_i is the standard deviation of the measurement error of each point. The value u_i is not observed.

loo requires the value of the likelihood of each point. In this case I’m not sure if I have to use the joint likelihood of y_i, u_i (a bivariate normal) like this:

generated quantities {
   vector[N] log_like;
   for (i in 1:N) {
      log_like[i] = normal_lpdf(y[i] | u[i], sy[i]) + 
                    normal_lpdf(u[i] | beta0 + beta1*x[i], sigma);
   }
}

or the marginal likelihood of y_i alone (a univariate normal) like this:

generated quantities {
   vector[N] log_like;
   for (i in 1:N) {
      log_like[i] = normal_lpdf(y[i] | beta0 + beta1*x[i], sqrt(sigma**2 + sy[i]**2));
  }
}

I have read the papers, but I’m not a statistician so I’m not sure which likelihood (if any) is correct in this case.

You need to consider only the log density for y_i. You will get more stable computation if you integrate out u_i as you have done in your second code!

2 Likes

Hi Aki,

I’m having a similar issue using loo on a brms object in R, as per this topic here.

Can I apply the loo library in R directly to brms objects for these kinds of measurement error models, as I am trying to do? Or do I need to write some code like that above?

Thanks in advance for your help,
Stuart.

If I read your post correctly there is no additional question here, and we should continue the discussion in the topic you mention above.