Stan beginner here, trying to understand why two models attempting to solve the same problem give different results. Thanks to everyone for their patience and generosity of time!

The problem: Suppose I have two independent, bivariate-normal distributed random variables whose means and covariance matrices I want to estimate. I have observations of just the first (base) variable, and I have observations of the first (base) + second (condition) variable (sum), but I do not have any observations of the second variable alone.

The model that works estimates the covariance of the first (base) variable and the covariance of the sum of the variables separately, then computes the covariance of the unobserved second (cond) variable as a generated quantity. The full code is at the bottom of this post; here is the relevant portion (unsure if I need a Jacobian in the generated quantities):

```
model {
...
for (i in 1:N) {
if (cond[i]) // y is observation of base + cond variables
y[i] ~ multi_normal_cholesky(beta_sum, L_Sigma_sum);
else // y is observation of base variable only
y[i] ~ multi_normal_cholesky(beta_base, L_Sigma_base);
}
}
generated quantities {
// Convert cholesky factors back to correlation and variance-covariance.
matrix[J, J] Omega_base = multiply_lower_tri_self_transpose(L_corr_base);
matrix[J, J] Omega_sum = multiply_lower_tri_self_transpose(L_corr_sum);
matrix[J, J] Sigma_base = quad_form_diag(Omega_base, sigma_base);
matrix[J, J] Sigma_sum = quad_form_diag(Omega_sum, sigma_sum);
// Infer correlation matrix of unobserved, condition-only variable.
matrix[J, J] Sigma_cond = Sigma_sum - Sigma_base;
matrix[J, J] Omega_cond = cov2cor(Sigma_cond);
}
```

I thought, this is Stan and we are Bayesians, so why not try and model the covariance of the 2nd variable directly? This seems to underestimate the covariance/correlation of the 2nd variable, yielding 0.06 instead of 0.20 (the simulated value). I’m happy the first model (above) works, but it bothers me that the model below does not. It makes me think I do not understand something important about Stan, or maybe just am bad at thinking in terms of Cholesky parameterization for covariance.

```
model {
...
for (i in 1:N) {
if (cond[i])
// Is this wrong?
y[i] ~ multi_normal_cholesky(beta + beta_cond, L_Sigma + L_Sigma_cond);
else
y[i] ~ multi_normal_cholesky(beta, L_Sigma);
}
}
```

First model (2.4 KB)

Second model (1.3 KB)

R script to generate data (972 Bytes)