Can't understand an example for handling missing value in rstan

I found this instructive post on github on how to deal with missing data but I am having a bit trouble understanding one detail:Multivariate normal fit with missing values in STAN · GitHub
I pasted the code below so we can see better, not sure if this is the right way to quote a github post, if anything, feel free to delete this post.

My question is regarding y_miss parameter, it seems like it’s making all missing value into a long vector, treat it as unknown parameter and impute values in. But how does Stan know to match the missing location, for instance if th only missing location is row 4 column 3, to the 3rd variable generated from multivariate normal whose mean is -5.

To be more specific, if the missing location is at row 4 column 3, the y_miss vector will be of length 1, and in generated parameter section, we will be assigning y[4,3] ← y_miss where y_miss is a scalar, in reality the missing value is for the thrid variable whose mean is -5, but how does Stan guarentee that the y_miss value will be a random variable with mean -5 but not 10 or 5, in his example?


library(rstan)
library(MASS)

# Simulate some data
mu <- c(10, 5, -5)
Sig <- matrix(c(1, 0.7, 0.8,
                0.7, 2, 0.2,
                0.8, 0.2, 1.5), 
              3, 3)
N <- 100
dat <- mvrnorm(N, mu, Sig)

# Randomly remove some data
nmiss <- 115
miss <- sample.int(prod(dim(dat)), size = nmiss)
dat[miss] <- NA

# Extract the missing values into a VECTOR
dat_complete <- dat[!is.na(dat)]

# Extract the missing and present values as MATRICES
ind_pres <- which(!is.na(dat), arr.ind = TRUE)
ind_miss <- which(is.na(dat), arr.ind = TRUE)

# STAN model
mod.nomiss <- '
data {
    int<lower=0> Nrow;
    int<lower=0> Ncol;
    int<lower=0> Ncomp; // Number of non-missing values
    int<lower=0> Nmiss; // Number of missing values
    real dat_complete[Ncomp];   // Vector of non-missing values
    int ind_pres[Ncomp, 2];     // Matrix (row, col) of non-missing value indices
    int ind_miss[Nmiss, 2];     // Matrix (row, col) of missing value indices
}
parameters {
    // Multivariate normal distribution parameters
    vector[Ncol] mu;
    cov_matrix[Ncol] Sigma;
    // Vector containing "stochastic" nodes (for filling missing values
    real ymiss[Nmiss];      
}
transformed parameters {
    vector[Ncol] y[Nrow];   // The "data" with interpolated missing values
    // Fill y with non-missing values 
    for(n in 1:Ncomp) {
        y[ind_pres[n,1]][ind_pres[n,2]] <- dat_complete[n];
    }
    // Fill the rest of y with missing value "parameters"
    for(n in 1:Nmiss){
        y[ind_miss[n,1]][ind_miss[n,2]] <- ymiss[n];
    }
}
model {
    for(i in 1:Nrow){
        y[i] ~ multi_normal(mu, Sigma);
    }
}
'

mod.data <- list(Nrow = nrow(dat),
                 Ncol = ncol(dat),
                 Ncomp = length(dat_complete),
                 Nmiss = sum(is.na(dat)),
                 dat_complete = dat_complete,
                 ind_pres = ind_pres,
                 ind_miss = ind_miss)

fit <- stan(model_code = mod.nomiss, data = mod.data,
            iter = 1000, chains = 3, verbose=TRUE,
            pars = c("mu", "Sigma"))
****

Kat, I think the key idea is that, in the model block, the missing values are modeled jointly with the observed values. So, in your example, the posterior of y[4,3] is informed by y[4,1] and y[4,2]. It is also informed by mu and Sigma, which are based on all the observed data.

Another way to think about it is by writing the joint distribution p(y1,y2,y3) as p(y1,y2) * p(y3 | y1,y2). So, in your example with y[4,3], you can view the joint distribution of (y1,y2,y3) as modeling the observed data, along with the conditional distribution of missing given observed. Also, from the properties of the multivariate normal, you can work out the analytic distribution of y[4,3] given y[4,1], y[4,2]. You could verify that the posterior of y[4,3] is close to the analytic distribution.