# Generated MVN not what I expect

I wrote a short piece of code to generate two variables from a multivariate normal but the covariances of the generated distributions are not what I expect. Hoping someone can help me to understand where I’m going wrong. Here is my stan code:

``````data{
int<lower=0> N;        // Total number of rows
}

generated quantities{
matrix[2, 2] Sigma;   // Square form of covariance matrix
vector[2] zeroes[N];
vector[2] y_ppc [N];       // model predictions for posterior predictive check

// Specify covariance matrix
Sigma[1,1] = 1;
Sigma[2,1] = 0.05;
Sigma[1,2] = 0.05;
Sigma[2,2] = 0.15;

// Generate y_ppc
for (n in 1:N) {
zeroes[n] = rep_vector(0, 2);
}

y_ppc = multi_normal_rng(zeroes, Sigma);
}

``````

Ok I run that through R and extract the generated variables:

``````N = 1000
datalist <- list(N)

fit1 <- stan(file="Stan models/gen_mvn.stan",
data = datalist, chains = 3,
seed=408, refresh=25, iter = 500 , algorithm = "Fixed_param"
)
params = extract(fit1)
y <- data.frame(t(sapply(1:datalist\$N, function(x) colMeans(params\$y_ppc[, x, 1:2]) )))

``````

Great. Runs without hitch. However when I look at the covariance of the generated variables:

``````> format( cov(y), digits = 2)
X1        X2
X1 "0.00145" "0.00007"
X2 "0.00007" "0.00019"
``````

This is different to covariance matrix I specified in the stan code which was:

``````1.0  0.05
0.05 0.15
``````

Umm… why is this different?

Wrong calculation, try this instead (I don’t recall if the `byrow` is right or not, easy to check…):

``````library(rstan)
library(magrittr)
N = 1000
datalist <- list(N)

fit1 <- rstan::stan(file="mvr.stan",
data = datalist, chains = 3,
seed=408, refresh=25, iter = 500 , algorithm = "Fixed_param"
)
params = rstan::extract(fit1)

est_cov = matrix(
data = apply(params\$y_ppc, 1, cov) %>% apply(1, mean),
ncol=2, byrow=TRUE)
``````
1 Like

Thanks @sakrejda. Now I’d best check I’ve been extracting my other parameters across the correct dimensions…eep

yeah, don’t we all :/ also: transform before you summarize! :)

1 Like

Hey again. Sorry I’ve gotten myself confused now. I adapted your code to try extract the estimates:

``````y <- data.frame( matrix(
data = apply(params\$y_ppc1, 1, c) %>% apply(1, mean),
ncol=2, byrow=TRUE) )
``````

but the distributions are again much tighter thasn i expect:

``````> summary(y)
X1                  X2
Min.   :-0.116121   Min.   :-1.040e-01
1st Qu.:-0.013724   1st Qu.:-1.340e-02
Median : 0.000992   Median :-1.292e-03
Mean   : 0.001208   Mean   :-3.142e-05
3rd Qu.: 0.016030   3rd Qu.: 1.289e-02
Max.   : 0.121384   Max.   : 1.104e-01
``````

Perhaps I am more fundamentally confused in some way. For clarity what I actually want is the equivalent of this:

``````library(MASS)
M <- matrix(c(1, 0.05, 0.05, 0.15), nrow=2, ncol=2)
Data=mvrnorm(n = 800, mu = c(0, 0),Sigma = M,empirical = F)

> cov(Data)
[,1]       [,2]
[1,] 1.00669272 0.01918344
[2,] 0.01918344 0.14648667

> summary(Data)
V1                 V2
Min.   :-4.80409   Min.   :-1.153805
1st Qu.:-0.70604   1st Qu.:-0.285007
Median :-0.06849   Median :-0.008895
Mean   :-0.05267   Mean   :-0.014432
3rd Qu.: 0.65333   3rd Qu.: 0.227168
Max.   : 2.81348   Max.   : 1.202469
>
``````

I’m sure I’m going wrong somehow but I don’t get where ?

Your margins are: 1) iteration, 2) N, 3) 1:2

You want to apply the ‘cov’ function once for each iteration, that gets you one covariance matrix per iteration, then you want to summarise that (e.g.-using mean, or quantiles, or whatever

I don’t see why you write: `apply(params\$y_ppc1, 1, c)`, that should be `cov` not `c`.

I’m trying to extract the two estimates y1, y2 so this is `c()` not `cov()`.
My own code for this was:

``````y <- data.frame(t(sapply(1:datalist\$N, function(x) colMeans(params\$y_ppc[, x, 1:2]) )))
``````

This is perhaps the crux of the issue. I actually don’t care about the covariance matrix per iteration. I care about the overall covariance for the mean extracted y vectors. This is is what the `mvrnorm` gives me. The Stan code seems to be doing something different - or i’m extracting things wrong ?

With ‘c’ you’re losing the matrix structure so no covariance, maybe either cbind or rbind depending on which margin is which

Arrrgghhh. I think I understand the problem. It’s not the way I’m summarising the data. The covariance structure holds per iteration. But averaged over all the iterations it reduces to the mean - i.e. zero, for any given y1 or y2. Ehhh, thats not really what I wanted.

… It should be smaller but not zero.

Should’ve fired the second neuron before posting, you’re completely correct.

1 Like

ha i know the feelin i’ve been puzzled by this for days 🤦‍♂️

1 Like