… where the multi_normal is applied for rows with both outcomes present, and the normals applied when only outcome 1 or 2 present respectively.
My question is, if you are using the cholesky decomposition form of the multivariate normal, can this subsetting by done in an analgous fashion ? In other words, if instead of Sigma I have an L_Sigma formed by cholesky decomposition, is it legitimate to do this:
It does not consider the case where both margins are missing, in which case you would need latents
Here is a more general version using latents, and R is a two-dimensional binary array indicating whether a cell of the N \times K data matrix \mathbf{X} is observed or missing:
data {
int<lower=1> N;
int<lower=1> N_obs;
int<lower=1> K;
int<lower=0, upper=1> R[N, K];
vector[N_obs] x_o; // only values where R[i,k] == 1
} // x_o must be in “row-major” order!!!
parameters {
// missing values we want the distribution of
vector[N * K - N_obs] x_m;
// common but not very interesting parameters
vector[K] mu;
cholesky_factor_cov[K, K] L;
}
model {
row_vector[K] x_c[N]; // better than a matrix
int obs_mark = 1; int miss_mark = 1;
for (i in 1:N) for (k in 1:K) { // row-major
if (R[i,k] == 1) {
x_c[i,k] = x_o[obs_mark];
obs_mark += 1;
} // observed case above, missing below
else {
x_c[i,k] = x_m[miss_mark];
miss_mark += miss_mark + 1;
}
} // ignore warning about Jacobian
target += multi_normal_cholesky_lpdf(x_c | mu, L);
// priors on mu and L
}
Thanks Ben. What I’m trying to do is not actually a missing values problem but something where I was using the missing values code as a template to work from. However, I think the same basic argument you made for missing values probably also applies so I will rethink things! Many thanks for the informative answer