Hello.
I’m using the loo
package to compare linear regression models, when the outcome variable has measurement errors. My problem is that I’m not sure of the expression of likelihood required by loo
.
Given a model like this:
y_i \sim \mathrm{normal}(u_i, s_i)
u_i \sim \mathrm{normal}(\beta_0 + \beta_1 x_i, \sigma)
The values y_i, s_i and x_i are observed. s_i is the standard deviation of the measurement error of each point. The value u_i is not observed.
loo
requires the value of the likelihood of each point. In this case I’m not sure if I have to use the joint likelihood of y_i, u_i (a bivariate normal) like this:
generated quantities {
vector[N] log_like;
for (i in 1:N) {
log_like[i] = normal_lpdf(y[i] | u[i], sy[i]) +
normal_lpdf(u[i] | beta0 + beta1*x[i], sigma);
}
}
or the marginal likelihood of y_i alone (a univariate normal) like this:
generated quantities {
vector[N] log_like;
for (i in 1:N) {
log_like[i] = normal_lpdf(y[i] | beta0 + beta1*x[i], sqrt(sigma**2 + sy[i]**2));
}
}
I have read the papers, but I’m not a statistician so I’m not sure which likelihood (if any) is correct in this case.