Assigning bernoulli prior to missing entries in covariate

Neodyme · August 14, 2019, 4:50pm

I am fitting a logistic model in STAN (rstan library). My response variable does not have any missing values, however one of my covariates “HB” is binary and has missing entries.

Thus the goal is to impute at each iteration the missing entries in the binary vector, using a Bernoulli prior (with parameter, say, 0.5).

However, I am running into issues:

The missing data needs to be declared either as real or vector in the parameters or transformed parameters block;
Realizations from a Bernoulli distribution in the model block need to be integers;
As far as I am aware there is no function in STAN to convert a real or vector to integer.

I used the guidelines provided in section 3.3 of the STAN user guide. With the model below, the parser gives me an error at the bernoulli assignment line (penultimate line in the model block), saying it needs integers. Note: I also tried defining HB_miss as a real in the parameters block and getting the same error.

m2 <- '
data {                          
int<lower=0> N;                // total number of observations
int<lower=0,upper=1> y[N];     // setting the dependent variable y as binary
vector[N] X;                   // independent variable 1

int<lower=0> N_obs; 
int<lower=0> N_miss; 
int<lower=1, upper=N> ii_obs[N_obs]; 
int<lower=1, upper=N> ii_miss[N_miss]; 

vector[N_obs] HB_obs;         // independent variable 2 (observed) 

}
parameters {
real b_0;                      // intercept
real b_X;                      // beta 1,2, ...
real b_HB;
vector[N_miss] HB_miss;
}
transformed parameters {
vector[N] HB;
HB[ii_obs] = HB_obs;
HB[ii_miss] = HB_miss;
}
model {
b_0 ~ normal(0,100);           
b_X ~ normal(0,100);           
b_HB ~ normal(0,100); 
HB_miss ~ bernoulli(0.5); // This is where the parser gives me an error
y ~ bernoulli_logit(b_0 + b_X * X + b_HB * HB); // model
}

Any ideas how I can assign a bernoulli prior to HB_miss effectively in STAN?

stijn · August 15, 2019, 2:49am

You basically can’t do this. Stan does not work with discrete parameters because it cannot calculate the gradients. You probably can solve this by using a mixture model for the missing observations with two components and a mixture probability .5.

Component 1 for HB_miss = 0

bernoulli_logit(b_0 + b_X * X)

Component 2 for HB_miss = 1

bernoulli_logit(b_0 + b_X * X + b_HB)

Chapter 5 and 7 of the Stan User Guide might give more explanation.

Neodyme · August 15, 2019, 8:51pm

Got it!
Indeed I found out I can use log_mix for the mixture of the 2 bernoulli distributions you suggest above. Also need to marginalize over the parameters using bernoulli_logit_lpmf:

for (i in 1:N) {
if (HB[i] == -1) {
target += log_mix(lambda, bernoulli_logit_lpmf(y[i]| b_0 + b_X * X[i] + b_HB), bernoulli_logit_lpmf(y[i]| b_0 + b_X * X[i]));
} else {
HB[i] ~ bernoulli(lambda);
y[i] ~ bernoulli_logit(b_0 + b_X * X[i] + b_HB * HB[i]);
}

Topic		Replies	Views
Missing data in categorical data models Modeling rstan	7	1299	August 12, 2023
Bernoulli process with observational error Modeling rstan , techniques	4	482	December 8, 2023
Bernoulli_logit with different number of predictors Modeling rstan	3	980	February 5, 2022
Question about logistic regression RStan	5	1858	June 5, 2018
Real return type required error for binomial General rstan	4	424	February 1, 2023

Assigning bernoulli prior to missing entries in covariate

Related topics