Consider the following error-in-variables regression where we are interested in the linear effect of an integer-valued predictor with known misclassification rates
Y ~ Normal (a + b*X, sigma)
W|X=x_j ~ Discrete with support 1:k, probabilities P_j \in {0,1}^k s.t. \sum_k P_{jk} = 1
Y and W are both observed, but X is not. However, the unconditional (discrete) distribution of X is known, as are the k vectors of probabilities P_1,…,P_k for each possible value of W conditioned on X.
I’m struggling to implement this model as 1. including X as an integer valued parameter is illegal and 2. I can’t see how to estimate the slope b if I marginalize over the possible values of X as discussed in chapter 13 of the Stanual.
I.e., the model I’d like to be able to fit is
data {
int<lower=0> J;
real y[J];
int w[J];
}
parameters {
int x[J]; // uncontaminated predictor
real mu_y; // prior on mean of Y
real<lower=0,upper=1> maf;
real sigma_y; // prior variance
real alpha; // prior variance
real beta; // prior variance
}
model {
for (i in 1:J) {
w[i] ~ imputationError(x[i], .2, .3, .5, .2, .4, .4, .1, .8, .9);
x[i] ~ binomial(2, maf);
y[i] ~ normal(alpha + beta*x[i], sigma_y);
}
alpha ~ cauchy(0,10);
beta ~ cauchy(0,3);
}
Where imputationError
is user-defined integer-valued mass function.
Am I missing something? Would love to know if this is within Stan’s capabilities!