Bob, thank you very much for your reply.

I meant my question as more of a behind-the-scenes question as opposed to a model question.

This may make my question a little more clear. Below are two model codes. The first uses the method shown in the manual for bivariate normal data with missing data. Additionally, I have added a generated quantities block with the missing values imputed using the normal_rng function and the appropriate conditional mean and standard deviation for each missing data point (it’s a little messy, my apologies). The second uses “index fiddling” to create a data matrix with missing values treated as parameters and also returns what I would think would be imputed values.

model1 <- ’

data{

int<lower = 1> N;

vector[2] y[N];

int<lower = 0, upper = 1> y_observed[N, 2];

int<lower = 0, upper = N> TotalMissing;

}

parameters{

vector[2] mu;

cov_matrix[2] Sigma;

}

model{

for (n in 1:N) {

if (y_observed[n, 1] && y_observed[n, 2])

y[n] ~ multi_normal(mu, Sigma);

else if (y_observed[n, 1])

y[n, 1] ~ normal(mu[1], sqrt(Sigma[1, 1]));

else if (y_observed[n, 2])

y[n, 2] ~ normal(mu[2], sqrt(Sigma[2, 2]));

}

}

generated quantities{

real y_mis[TotalMissing];

y_mis[1] = normal_rng(mu[1] + (Sigma[1,1]/Sigma[2,2])*(Sigma[1,2]/(sqrt(Sigma[1,1])*sqrt(Sigma[2,2])))*(y[1,2] - mu[2]), (1 - (Sigma[1,2]/(sqrt(Sigma[1,1])*sqrt(Sigma[2,2])))^2)*Sigma[1,1]);*

y_mis[2] = normal_rng(mu[2] + (Sigma[2,2]/Sigma[1,1])(Sigma[1,2]/(sqrt(Sigma[1,1])*sqrt(Sigma[2,2])))*(y[4,1] - mu[1]), (1 - (Sigma[1,2]/(sqrt(Sigma[1,1])*sqrt(Sigma[2,2])))^2)*Sigma[2,2]);*

y_mis[3] = normal_rng(mu[2] + (Sigma[2,2]/Sigma[1,1])(Sigma[1,2]/(sqrt(Sigma[1,1])*sqrt(Sigma[2,2])))*(y[10,1] - mu[1]), (1 - (Sigma[1,2]/(sqrt(Sigma[1,1])*sqrt(Sigma[2,2])))^2)*Sigma[2,2]);*

y_mis[4] = normal_rng(mu[1] + (Sigma[1,1]/Sigma[2,2])(Sigma[1,2]/(sqrt(Sigma[1,1])*sqrt(Sigma[2,2])))*(y[22,2] - mu[2]), (1 - (Sigma[1,2]/(sqrt(Sigma[1,1])*sqrt(Sigma[2,2])))^2)*Sigma[1,1]);*

y_mis[5] = normal_rng(mu[1] + (Sigma[1,1]/Sigma[2,2])(Sigma[1,2]/(sqrt(Sigma[1,1])*sqrt(Sigma[2,2])))*(y[43,2] - mu[2]), (1 - (Sigma[1,2]/(sqrt(Sigma[1,1])*sqrt(Sigma[2,2])))^2)*Sigma[1,1]);

}

’

model2 <- ’

data{

int<lower = 1> N;

vector[2] y[N];

int<lower = 0, upper = 1> y_observed[N, 2];

int<lower = 0, upper = N> TotalMissing;

int<lower = 0, upper = N> NumMissing[N,2];

}

parameters{

vector[2] mu;

cov_matrix[2] Sigma;

real y_mis[TotalMissing];

}

transformed parameters{

vector[2] y_star[N];

for(n in 1:N){

for(j in 1:2){

if(y_observed[n, j])

y_star[n,j] = y[n,j];

else

y_star[n,j] = y_mis[NumMissing[n,j]];

}

}

}

model{

y_star ~ multi_normal(mu, Sigma);

}

’

N <- 50

y <- rmvnorm(N, mean = rep(200, 2), sigma = diag(rep(100^2, 2)))

y_observed <- matrix(c(0, rep(1, 20), 0, rep(1, 20), 0, rep(1, 10), 0,

rep(1, 5), 0, rep(1, 40)), nrow = N, ncol = 2)

y[which(y_observed == 0)] <- 9999999

TotalMissing <- sum(y_observed == 0)

NumMissing <- matrix(0, nrow = N, ncol = 2)

k <- 1

for(i in 1:N){

for(j in 1:2){

if(y_observed[i,j] == 0){

NumMissing[i, j] <- k

k <- k + 1

}

}

}

Missing <- apply(y_observed, 1, function(x) sum(x) < 2) + 0.0

ContainsMissing <- which(Missing == 1)

fit1 <- stan(model_code = model1,

data = c(“N”,

“y”,

“y_observed”,

“TotalMissing”),

iter = 1e4,

warmup = 1e3,

chains = 1,

seed = 315315)

fit2 <- stan(model_code = model2,

data = c(“N”,

“y”,

“y_observed”,

“TotalMissing”,

“NumMissing”),

iter = 1e4,

warmup = 1e3,

chains = 1,

seed = 315315)

The resulting estimates for the mean vector and covariance matrix are VERY similar. However, the second model provides a much more narrow 95% credible intervals which do all contain the missing values and posterior means which are more accurate. While the second model appears messier than necessary for a bivariate normal, once you get to larger dimensions it is MUCH faster!

This is all to lead up to the question: How is the joint distribution being estimated behind-the-scenes given that missing data is present in the data matrix and the data model? Are the values of the missing data imputed from the posterior predictive distribution of the missing data values given the observed data and parameter values from the previous iteration and then the “complete” data are used to update the parameter estimates? Or the other way around?

I would like to feel confident about the model that was built, but I lack the understanding of what is going on behind the scenes to defend this model with absolute certainty.

Thank you in advance for all of your help and time.