I’m working with a multiple outcomes model using the multi_normal_cholesky
distribution. I’m doing some posterior predictive checks and I’ve come across something puzzling. To illustrate I made a very simple dataset as follows:
library(MASS)
sigma=matrix(c(1.0, 0.5, 0.3, 0.4, 0.2,
0.5, 1.0, 0.6, 0.7, 0.3,
0.3, 0.6, 1.0, 0.0, 0.0,
0.4, 0.7, 0.0, 1.0, 0.0,
0.2, 0.3, 0.0, 0.0, 1.0), nrow=5)
colnames(sigma)<-c("y1","y2","x1","x2", "x3")
rownames(sigma)<-c("y1","y2","x1","x2", "x3")
Data=mvrnorm(n = 800, mu = c(20, 20,0,0,0),Sigma = sigma,empirical = F)
datalist = list(y=Data[,1:2],
x=Data[,3:5],
N=nrow(x),
NvarsY=2,
NvarsX=3)
So I’ve fixed the means of outcomes y1 & y2 both = 20, and the only thing that should be different is governed by the sigma matrix. Ok. I then fit that data with a linear MVN model in Stan and predicted y_ppc for both outcome variables:
data{
int<lower=0> NvarsX; // num independent variables
int<lower=0> NvarsY; // num dependent variables
int<lower=0> N; // Total number of rows
vector[NvarsX] x [N]; // data for independent vars
vector[NvarsY] y [N]; // data for dependent vars
}
parameters{
cholesky_factor_corr[NvarsY] L_Omega; // L form of correlation matrix
vector<lower=0>[NvarsY] L_sigma; // Vector of coefficient scales for correlation matrix
vector[NvarsY] alpha; // Intercepts for each Y
matrix[NvarsY, NvarsX] beta ; // Coefficients for each X vs each Y
}
transformed parameters{
vector[NvarsY] mu[N];
matrix[NvarsY, NvarsY] L_Sigma;
L_Sigma = diag_pre_multiply(L_sigma, L_Omega); // L form of Sigma covariance matrix
for (n in 1:N){
mu[n] = alpha + beta * x[n];
}
}
model{
//Priors
L_Omega ~ lkj_corr_cholesky(1);
L_sigma ~ normal(0, 5);
alpha ~ normal(0, 3);
to_vector(beta) ~ normal(0, 2);
//likelihood
y ~ multi_normal_cholesky(mu, L_Sigma);
}
generated quantities{
matrix[NvarsY, NvarsY] Omega; // Square form of correlation matrix
matrix[NvarsY, NvarsY] Sigma; // Square form of covariance matrix
vector[NvarsY] mu_ppc[N]; // model predictions for posterior predictive check
vector[NvarsY] y_ppc [N]; // model predictions for posterior predictive check
// Generate correlation matrix
Omega = multiply_lower_tri_self_transpose(L_Omega);
Sigma = quad_form_diag(L_Omega, L_sigma);
// Generate y_ppc
for (n in 1:N){
mu_ppc[n] = alpha + beta * x[n];
}
y_ppc = multi_normal_cholesky_rng(mu_ppc, L_Sigma);
}
Ok so where things get weird is when I plot the y_ppc vs the data y:
Hmm. y1 is predicted more poorly than y2. Also if I look at the sd of true and predicted y’s I see:
> sd(res_fit$y.y1)
[1] 1.0166
> sd(res_fit$y.y2)
[1] 0.9842127
> sd(res_fit$y_ppc.1)
[1] 0.5446957
> sd(res_fit$y_ppc.2)
[1] 0.9545147
Huh. Can anyone help me understand why the Stan model is predicting y_ppc.1 with a much tighter SD than the true data has? Again, assuming my Stan code is correct, the only difference in y1 and y2 should be governed by the design matrix at the top.
Edit: adding this R file to run the model, check parameters and draw the graphs in case it is helpful: run_forum_question.R (1.3 KB)