Target with missing values

Trung_Dung_Tran · January 11, 2018, 2:02pm

Hi all,

In the manual 2.17.0, page 180, section 11.1. Missing Data, there is a code:

data {
int<lower=0> N_obs;
int<lower=0> N_mis;
real y_obs[N_obs];
}
parameters {
real mu;
real<lower=0> sigma;
real y_mis[N_mis];
}
model {
y_obs ~ normal(mu, sigma);
y_mis ~ normal(mu, sigma);
}

What is the role of y_mis ~ normal(mu, sigma);? I am thinking that statement is just a prior for y_mis but it does not contribute to the likelihood (I mean not the posterior density).

So what happens conceptually if I remove that line from the Stan code?

Thanks for reading my question!

Trung Dung.

Bob_Carpenter · January 18, 2018, 4:24am

If you move that and the declaration of y_mis, then you should get the same posterior for mu and sigma. So this model would be more efficient implementing y_mis in the generated quantities block.

However, some missing data problems do affect the log density, so they can’t be removed. If the log density is the same up to a proportion with the missing data marginalized out, then it won’t affect the posterior.

Trung_Dung_Tran · January 18, 2018, 8:58am

Thanks Bob,

So I understand that when missingness is ignorable (MAR for example) then we do not need y_mis and its declaration.

When missingness is non-ignorable, we need to deal with y_mis otherwise we will get biased results?

Do I understand you correctly?

Kind regards,
Trung Dung.

Trung_Dung_Tran · January 29, 2018, 4:53pm

If I am correct in the reply above then I think that the following codes is correct.

In R I create a missing indicator misIn, receiving 1 if y is missing and 0 otherwise.

In Stan code I write

for (i in 1 : N) {
    if (misIn[i] = 0) {y[i] ~ N(mu, sigma)};
}

What do you think, @Bob_Carpenter, that this is a correct code, and equivalent to your suggestion of

If you move that and the declaration of y_mis, then you should get the same posterior for mu and sigma.

Thank you for your time!
Trung Dung.

Bob_Carpenter · January 30, 2018, 6:02pm

I’d think you probably just want to write

y ~ normal(mu, sigma);

It’s not N and you need the observed versions to help estimate mu and sigma for the missing versions.

Topic		Replies	Views
Marginalize out missing continuous data Modeling techniques , specification	7	1113	February 25, 2023
Can't understand an example for handling missing value in rstan Modeling rstan , missing-data	1	799	June 26, 2022
Missing data imputation in Stan General	3	1687	November 7, 2020
Lognormal model with only summary statistics Modeling specification	6	564	June 18, 2018
Assigning bernoulli prior to missing entries in covariate Modeling	2	1050	August 15, 2019

Target with missing values

Related topics