I played around with something similar to this a few years back. Here is the stan code I used (this is about 4 years old, so some syntax may need to be updated and it could probably be improved):
data {
int<lower=0> n;
vector[n] y;
vector[n] x;
real<lower= -1, upper= 1> rho;
real g;
}
transformed data {
matrix[2,2] sigx;
vector[2] mux;
sigx[1,1] <- variance(x); sigx[1,2] <- sd(x)*rho;
sigx[2,1] <- sigx[1,2]; sigx[2,2] <- 1.0;
mux[1] <- mean(x);
mux[2] <- 0.0;
}
parameters {
real b0;
real b1;
real<lower=0> sig;
vector[n] u;
}
transformed parameters {
vector[2] xx[n];
for(i in 1:n) {
xx[i,1] <- x[i];
xx[i,2] <- u[i];
}
}
model {
real eta[n];
for(i in 1:n) {
eta[i] <- b0 + b1*xx[i][1] + g*xx[i][2];
}
xx ~ multi_normal(mux, sigx);
y ~ normal( eta, sig);
}
x is the observed predictor (x2 in your example, x1 in my thoughts, I don’t have an x3, but you could add it).
rho is the correlation between x1 and x2
g is the slope on the unobserved variable
I believe that to get anything meaningful you need to specify rho and g exactly. If you put a prior on either (or both) or worse, use the default flat prior, then your posterior will have a flat or nearly flat section in a subspace of the posterior around the mode (problems with identifiability) which will lead to not very useful results.
The variable u is just the unobserved x variable (x1 for your case) which I force to have mean 0 and standard deviation 1 (you could change this, but if u is unobserved, then all transformations of u, including the standardization, are also unobserved).
The transformed data section just creates the variance matrix for the relationship between the x variables and the mean vector of the x variables.
The transformed parameters section creates a matrix (xx, actually an array of vectors, which may be able to be improved) of the x variables, the first “column” is the observed x variable (x2) and the second column is the unobserved predictor (x1, u, a parameter in Stan syntax).
The model then uses a multi_normal prior/likelihood on the xx matrix (to bring in the information about the correlation between x1 and x2) and a normal likelihood for y based on the regression model.
The code should really have priors on b0, b1, and sig and the likelihood with y could probably be improved to use matrix multiplication/vectorization.
The simulations that I tried looked promising and I have been meaning to come back and explore this idea some more (my thought is to add something along these lines to the obsSens R package). Please let me know if this works for you, and/or what improvements you make.
Hope this helps get you started.