# Latent Categorical Predictor Variable Model

Hello,

I am working on a measurement error model with a latent categorical predictor variable. I want to do something that is very similar to a continuous measurement error model, but instead of a known standard error for each observation I am using a known simplex of probabilities which relates to the latent variable for each observation. For illustration, my data look like:

``````data.frame(y = c(1, 1, 0, 1, 0),
pr_Xa = c(0.3, 0.2, 0.4, 0.7, 0.3),
pr_Xb = c(0.1, 0.1, 0.5, 0.1, 0.4),
pr_Xc = c(0.6, 0.7, 0.1, 0.2, 0.3),
X_obs = c("c", "c", "b", "a", "b"))
``````

A simple model without taking into account measurement error would look like `y ~ X_obs`. Where `X_obs` is the “observed” category for the latent variable which we get by only considering the highest value in the simplex.

But how can I include the measurement error simplex for each observation in a logistic regression model in Stan? I have tried adapting the code from this example, but if I’m understanding it right, that example assigns equal measurement error probabilities to each observation?

Here is an example of the type of model I’m trying to write—except this uses a continuous predictor variable with a known standard error attached to each observation.

``````data {
int<lower=0> N;
array[N] real x_meas;
array[N] real x_se;
vector[N] y;
}

parameters {
real alpha;
real beta;
real<lower=0> sigma;
array[N] real x;    // unknown true value
}

model {
alpha ~ normal(0, 2);
beta ~ normal(0, 2);
sigma ~ cauchy(0, 5);
for (i in 1:N) {
x[i] ~ normal(0, 2);
x_meas[i] ~ normal(x[i], x_se[i]);
y[i] ~ normal(alpha + beta * x[i], sigma);
}
}
``````

My idealized model with a categorical predictor variable whose measurement error is given in a simplex for each observation is:

``````data {
int<lower=0> N;
int<lower=1> K;
array[N] int x_meas;     // observed categorical predictor
simplex[K] p[N];      // simplex for each observation
vector[N] y;
}

parameters {
real alpha;
real beta;
array[N] int x;    // unknown true value
}

model {
alpha ~ normal(0, 2);
beta ~ normal(0, 2);
for (i in 1:N) {
...
}
}
``````

I am a little lost regarding what to put in the `model` block to mimic the measurement error process in the continuous predictor example above. I think it would be something with a categorical distribution and dirichlet prior? And I am also struggling with how to deal with the fact that the unknown true value of the predictor is an integer and therefore can’t be part of the `parameters` block.