Missing data in binary outcome model

mwarb · July 4, 2019, 12:11pm

I have a model that predicts a binary decision given current costs and biases built up over previous trials. However, some of my outcome data is missing. Currently I pass these data points as they are useful for updating my biases, but ignore them in the model block. I’m wondering what the correct way to handle these data points are in the model block, and also when generating my log liklihood. See my model below, thanks.

data {
  int<lower=0> N_trial;
  int<lower=0> trial_num[N_trial];
  int<lower=0, upper=2> resp[N_trial];
  int<lower=0, upper=1> error[N_trial];
  vector[2] costs[N_trial];
}

parameters {
  real<lower=0.0001> temp;
  real<lower=0> max_bias;
  real<lower=0, upper=1> bias_rate;
}

transformed parameters {
  simplex[2] p[N_trial];
  vector[2] bias;
  vector[2] offset_inv_costs;
  
  for (i in 1:N_trial) {
    int t = trial_num[i];
    
    if (t == 1) {
      bias[1] = 0;
      bias[2] = 0;
    } else {
      if(error[i-1] == 1) {
        bias[1] = bias[1] * (1 - bias_rate);
        bias[2] = bias[2] * (1 - bias_rate);
      } else if (resp[i-1] == 1) {
        bias[1] = bias[1] * (1 - bias_rate);
        bias[2] = bias[2] + (max_bias - bias[2]) * bias_rate;
      } else {
        bias[1] = bias[1] + (max_bias - bias[1]) * bias_rate;
        bias[2] = bias[2] * (1 - bias_rate);
      }
    }
    
    offset_inv_costs[1] = 1 / (costs[i, 1] * (1 - bias[1]));
    offset_inv_costs[2] = 1 / (costs[i, 2] * (1 - bias[2]));
    
    p[i] = softmax(offset_inv_costs / temp);
  }
}

model {
  temp ~ normal(0.03, 0.01);
  max_bias ~ normal(0.05, 0.015);
  bias_rate ~ normal(0.5, 0.1);
  
  for (i in 1:N_trial) {
    if (resp[i] != 2) {
      resp[i] ~ bernoulli(p[i, 2]);
    }
  }
}

generated quantities {
  vector[N_trial] log_lik;
  // log likelihood
  for (i in 1:N_trial){
    if (resp[i] != 2) {
      log_lik[i] = bernoulli_lpmf(resp[i] | p[i, 2]);
    }
  }
}

bbbales2 · July 6, 2019, 4:19pm

Do you thing missingness is associated with the value of bias at any point?

If one response is missing, does that mean the next response is missing too?

The way to figure this out is write down a probabilistic model that includes a probabilistic mechanism for the missing data (coinflips, or whatever).

Once you have that, then you can crank the math gears and decide if the inference is possible (if you can safely leave the data out or if you need to do something special).

Topic		Replies	Views
Using log_mix when imputing missing observations of a binary predictor variable Modeling specification	2	423	February 14, 2020
Missing categorical outcome data Modeling	2	467	June 30, 2020
Truncated model for neg_binomial_2 Modeling	20	1697	June 9, 2017
Marginalize missing binary outcome variable for GLM Modeling	11	1078	January 31, 2020
Data imputation/missing data in a correlation model Modeling	6	1321	December 28, 2018

Missing data in binary outcome model

Related topics