Hi there! I’m using LOO-CV. I know it can be used to compare two different models which fit the same data set, but can it be used to compare two models which fit different datasets?
My Stan model for multivariate linear regression is:
data {
int<lower=1> K;
int<lower=0> N;
matrix[N,K] x; // data matrix
vector[N] y;
int<lower=0> Ntest;
matrix[Ntest,K] xtest;
vector[Ntest] ytest;
}
parameters {
vector[K] beta;
real<lower=0> sigma;
}
model {
vector[N] mu;
mu = x * beta;
y ~ normal(mu, sigma);
beta ~ normal(0., 10.);
sigma ~ cauchy(0., 10.);
}
generated quantities {
vector[Ntest] logLikelihood;
{
vector[Ntest] mu;
mu = xtest*beta;
for (i in 1:Ntest) {
logLikelihood[i] = normal_lpdf(ytest[i]| mu[i], sigma);
}
}
}
I have different variables X_1, X_2, X_3, X_4,… and I want to predict the value Z using combinations of two variables X_i,X_j. I obtain better predictions for the values of Z using calibrations of the type Xi = f(X_j,Z) = a + bX_j + cX_j^2 + dZ + eX_jZ and then isolating the Z value, instead of using calibrations like Z = g(X_i,X_j).
The problem is then that, like I’m using different dependent variables in different calibrations (different X_i), it makes no sense comparing them using LOO-CV. But, will it be OK comparing different calibrations of the type Z = g(X_i,X_j), even when each calibrations use different independent variables, but all of them have the same dependent variable Z? For example, compare the models Z = g(X_1,X_2), Z = g(X_3,X_4), Z = g(X_1,X_3), etc.
Looking forward to your advice.
Regards,
Christian.