I wrote a short piece of code to generate two variables from a multivariate normal but the covariances of the generated distributions are not what I expect. Hoping someone can help me to understand where I’m going wrong. Here is my stan code:
data{
int<lower=0> N; // Total number of rows
}
generated quantities{
matrix[2, 2] Sigma; // Square form of covariance matrix
vector[2] zeroes[N];
vector[2] y_ppc [N]; // model predictions for posterior predictive check
// Specify covariance matrix
Sigma[1,1] = 1;
Sigma[2,1] = 0.05;
Sigma[1,2] = 0.05;
Sigma[2,2] = 0.15;
// Generate y_ppc
for (n in 1:N) {
zeroes[n] = rep_vector(0, 2);
}
y_ppc = multi_normal_rng(zeroes, Sigma);
}
Ok I run that through R and extract the generated variables:
N = 1000
datalist <- list(N)
fit1 <- stan(file="Stan models/gen_mvn.stan",
data = datalist, chains = 3,
seed=408, refresh=25, iter = 500 , algorithm = "Fixed_param"
)
params = extract(fit1)
y <- data.frame(t(sapply(1:datalist$N, function(x) colMeans(params$y_ppc[, x, 1:2]) )))
Great. Runs without hitch. However when I look at the covariance of the generated variables:
You want to apply the ‘cov’ function once for each iteration, that gets you one covariance matrix per iteration, then you want to summarise that (e.g.-using mean, or quantiles, or whatever
I don’t see why you write: apply(params$y_ppc1, 1, c), that should be cov not c.
I’m trying to extract the two estimates y1, y2 so this is c() not cov().
My own code for this was:
y <- data.frame(t(sapply(1:datalist$N, function(x) colMeans(params$y_ppc[, x, 1:2]) )))
This is perhaps the crux of the issue. I actually don’t care about the covariance matrix per iteration. I care about the overall covariance for the mean extracted y vectors. This is is what the mvrnorm gives me. The Stan code seems to be doing something different - or i’m extracting things wrong ?
Arrrgghhh. I think I understand the problem. It’s not the way I’m summarising the data. The covariance structure holds per iteration. But averaged over all the iterations it reduces to the mean - i.e. zero, for any given y1 or y2. Ehhh, thats not really what I wanted.