# Missing data in binary outcome model

I have a model that predicts a binary decision given current costs and biases built up over previous trials. However, some of my outcome data is missing. Currently I pass these data points as they are useful for updating my biases, but ignore them in the model block. I’m wondering what the correct way to handle these data points are in the model block, and also when generating my log liklihood. See my model below, thanks.

``````data {
int<lower=0> N_trial;
int<lower=0> trial_num[N_trial];
int<lower=0, upper=2> resp[N_trial];
int<lower=0, upper=1> error[N_trial];
vector[2] costs[N_trial];
}

parameters {
real<lower=0.0001> temp;
real<lower=0> max_bias;
real<lower=0, upper=1> bias_rate;
}

transformed parameters {
simplex[2] p[N_trial];
vector[2] bias;
vector[2] offset_inv_costs;

for (i in 1:N_trial) {
int t = trial_num[i];

if (t == 1) {
bias[1] = 0;
bias[2] = 0;
} else {
if(error[i-1] == 1) {
bias[1] = bias[1] * (1 - bias_rate);
bias[2] = bias[2] * (1 - bias_rate);
} else if (resp[i-1] == 1) {
bias[1] = bias[1] * (1 - bias_rate);
bias[2] = bias[2] + (max_bias - bias[2]) * bias_rate;
} else {
bias[1] = bias[1] + (max_bias - bias[1]) * bias_rate;
bias[2] = bias[2] * (1 - bias_rate);
}
}

offset_inv_costs[1] = 1 / (costs[i, 1] * (1 - bias[1]));
offset_inv_costs[2] = 1 / (costs[i, 2] * (1 - bias[2]));

p[i] = softmax(offset_inv_costs / temp);
}
}

model {
temp ~ normal(0.03, 0.01);
max_bias ~ normal(0.05, 0.015);
bias_rate ~ normal(0.5, 0.1);

for (i in 1:N_trial) {
if (resp[i] != 2) {
resp[i] ~ bernoulli(p[i, 2]);
}
}
}

generated quantities {
vector[N_trial] log_lik;
// log likelihood
for (i in 1:N_trial){
if (resp[i] != 2) {
log_lik[i] = bernoulli_lpmf(resp[i] | p[i, 2]);
}
}
}
``````

Do you thing missingness is associated with the value of bias at any point?

If one response is missing, does that mean the next response is missing too?

The way to figure this out is write down a probabilistic model that includes a probabilistic mechanism for the missing data (coinflips, or whatever).

Once you have that, then you can crank the math gears and decide if the inference is possible (if you can safely leave the data out or if you need to do something special).