Multiple outcomes posterior predictions fit one outcome well but not the other

I’m working with a multiple outcomes model using the multi_normal_cholesky distribution. I’m doing some posterior predictive checks and I’ve come across something puzzling. To illustrate I made a very simple dataset as follows:

library(MASS)
sigma=matrix(c(1.0, 0.5, 0.3, 0.4, 0.2,
               0.5, 1.0, 0.6, 0.7, 0.3,
               0.3, 0.6, 1.0, 0.0, 0.0, 
               0.4, 0.7, 0.0, 1.0, 0.0,
               0.2, 0.3, 0.0, 0.0, 1.0), nrow=5)
colnames(sigma)<-c("y1","y2","x1","x2", "x3")
rownames(sigma)<-c("y1","y2","x1","x2", "x3")

Data=mvrnorm(n = 800, mu = c(20, 20,0,0,0),Sigma = sigma,empirical = F)

datalist = list(y=Data[,1:2],
                x=Data[,3:5],
                N=nrow(x),
                NvarsY=2,
                NvarsX=3)

So I’ve fixed the means of outcomes y1 & y2 both = 20, and the only thing that should be different is governed by the sigma matrix. Ok. I then fit that data with a linear MVN model in Stan and predicted y_ppc for both outcome variables:

data{
    int<lower=0> NvarsX;   // num independent variables
    int<lower=0> NvarsY;   // num dependent variables
    int<lower=0> N;        // Total number of rows
    vector[NvarsX] x [N];  // data for independent vars
    vector[NvarsY] y [N];  // data for dependent vars
}

parameters{
    cholesky_factor_corr[NvarsY] L_Omega; // L form of correlation matrix
    vector<lower=0>[NvarsY] L_sigma;      // Vector of coefficient scales for correlation matrix

    vector[NvarsY] alpha;         // Intercepts for each Y
    matrix[NvarsY, NvarsX] beta ; // Coefficients for each X vs each Y
}

transformed parameters{
    vector[NvarsY] mu[N];
    matrix[NvarsY, NvarsY] L_Sigma;

    L_Sigma = diag_pre_multiply(L_sigma, L_Omega);    // L form of Sigma covariance matrix

    for (n in 1:N){
            mu[n] = alpha + beta * x[n];
    }
}

model{
    //Priors
    L_Omega ~ lkj_corr_cholesky(1);
    L_sigma ~ normal(0, 5);
    
    alpha ~ normal(0, 3);
    to_vector(beta) ~ normal(0, 2);

    //likelihood
    y ~ multi_normal_cholesky(mu, L_Sigma);
}

generated quantities{
    matrix[NvarsY, NvarsY] Omega;   // Square form of correlation matrix
    matrix[NvarsY, NvarsY] Sigma;   // Square form of covariance matrix
    
    vector[NvarsY] mu_ppc[N];       // model predictions for posterior predictive check
    vector[NvarsY] y_ppc [N];       // model predictions for posterior predictive check
    
    // Generate correlation matrix
    Omega = multiply_lower_tri_self_transpose(L_Omega);     
    Sigma = quad_form_diag(L_Omega, L_sigma);               

    // Generate y_ppc    
    for (n in 1:N){
        mu_ppc[n] = alpha + beta * x[n];
    }
    y_ppc = multi_normal_cholesky_rng(mu_ppc, L_Sigma);
}

Ok so where things get weird is when I plot the y_ppc vs the data y:

Hmm. y1 is predicted more poorly than y2. Also if I look at the sd of true and predicted y’s I see:

> sd(res_fit$y.y1)
[1] 1.0166
> sd(res_fit$y.y2)
[1] 0.9842127
> sd(res_fit$y_ppc.1)
[1] 0.5446957
> sd(res_fit$y_ppc.2)
[1] 0.9545147

Huh. Can anyone help me understand why the Stan model is predicting y_ppc.1 with a much tighter SD than the true data has? Again, assuming my Stan code is correct, the only difference in y1 and y2 should be governed by the design matrix at the top.

Edit: adding this R file to run the model, check parameters and draw the graphs in case it is helpful: run_forum_question.R (1.3 KB)

I explored this a bit further by making two different datasets with 3 Y and 3 X variables each, but with the covariance vectors swapped between Y1 and Y3.

So here is sigma for the first dataset:

sigma=matrix(c(1.0, 0.5, 0.5, 0.3, 0.4, 0.2,
               0.5, 1.0, 0.5, 0.6, 0.7, 0.3,
               0.5, 0.5, 1.0, 0.2, 0.1, 0.9,
               0.3, 0.6, 0.2, 1.0, 0.0, 0.0,
               0.4, 0.7, 0.1, 0.0, 1.0, 0.0,
               0.2, 0.3, 0.9, 0.0, 0.0, 1.0), nrow=6)
colnames(sigma)<-c("y1","y2", "y3", "x1","x2", "x3")
rownames(sigma)<-c("y1","y2", "y3", "x1","x2", "x3")

And resultant plot:

and SD’s of true and ppc Y’s:

> sd(res_fit$y.y1)
[1] 1
> sd(res_fit$y_ppc.1)
[1] 0.5395692
> sd(res_fit$y.y2)
[1] 1
> sd(res_fit$y_ppc.2)
[1] 0.9693972
> sd(res_fit$y.y3)
[1] 1
> sd(res_fit$y_ppc.3)
[1] 0.9275423

Sigma for second data-set:

sigma=matrix(c(1.0, 0.5, 0.5, 0.2, 0.1, 0.9,
               0.5, 1.0, 0.5, 0.6, 0.7, 0.3,
               0.5, 0.5, 1.0, 0.3, 0.4, 0.2,
               0.2, 0.6, 0.3, 1.0, 0.0, 0.0,
               0.1, 0.7, 0.4, 0.0, 1.0, 0.0,
               0.9, 0.3, 0.2, 0.0, 0.0, 1.0), nrow=6)
colnames(sigma)<-c("y1","y2", "y3", "x1","x2", "x3")
rownames(sigma)<-c("y1","y2", "y3", "x1","x2", "x3")

Second dataset plots:

And SD’s:

> sd(res_fit$y.y1)
[1] 1
> sd(res_fit$y_ppc.1)
[1] 0.9266365
> sd(res_fit$y.y2)
[1] 1
> sd(res_fit$y_ppc.2)
[1] 0.9699625
> sd(res_fit$y.y3)
[1] 1
> sd(res_fit$y_ppc.3)
[1] 0.538183

So the only difference behind the above two sets of results is in the covariance matrix fed into mvrnorm generating the toy datasets. I’ve run the same Stan code and results code otherwise.