Hello,

I am still very new to the world of Stan and Bayesian modeling, so please excuse in advance my ignorance!

I am attempting to build a measurement error model in Stan where the main predictor variable is categorical, but where there is uncertainty whether the observed category matches the true category. Attached to each observation is a vector of probabilities for all potential categories. The observed categories are generated from the highest of these probabilities per observation. My goal is to incorporate this uncertainty when predicting some binary outcome Y, which is a function of the observation’s true category. The simulated data below shows the structure of what I am working with:

```
library(MCMCpack)
N <- 1000
B_a <- 0.8
B_b <- 0.5
B_c <- 0.2
cat_true <- rep(NA, N)
probs <- data.frame(prob_a = rep(NA, N),
prob_b = rep(NA, N),
prob_c = rep(NA, N))
Y <- rep(NA, N)
for (i in 1:N) {
cat_true[i] <- sample(c("a", "b", "c"), 1)
if (cat_true[i] == "a") {
probs[i, ] <- rdirichlet(1, c(3, 1, 1))
Y[i] <- rbinom(1, 1, B_a)
} else if (cat_true[i] == "b") {
probs[i, ] <- rdirichlet(1, c(1, 3, 1))
Y[i] <- rbinom(1, 1, B_b)
} else if (cat_true[i] == "c") {
probs[i, ] <- rdirichlet(1, c(1, 1, 3))
Y[i] <- rbinom(1, 1, B_c)
}
}
cat_obs <- colnames(probs)[max.col(probs)]
cat_obs <- gsub("prob_", "", cat_obs)
```

My main question is how to write a logit model in Stan with this data in the absence of `cat_true`

. As I understand it, it is tricky working with discrete values in Stan? I have been attempting to adapt Richard McElreath’s example in 15.3 with weighted averages, but without luck so far. Any guidance would be greatly appreciated!