# Multivariate normal data covariance estimation

Hi all,

I try to fit the covariance matrix for multivariate normal data. My naive code using `multi_normal_cholesky_lpdf` is apparently working but according to what I read here and there, a reparameterization relying on a standard multivariate normal distribution may dramatically speed up the process (e.g. there or there).

However, these examples try to fit covariance on sampled parameters rather than inputted data and I can’t find a way to code so. Maybe my question is just nonsensical is I don’t really need Stan to fit this matrix…

Below are a toy dataset and my current code. Thanks in advance for any hint!

Stan code:

``````data {
int<lower=0> N;
matrix[N, 3] x;
}

parameters {
cholesky_factor_corr[3] chol;
simplex[3] prop;
real<lower = 0> scale;
vector[3] mus;
}

transformed parameters {
vector<lower=0>[3] sigma;
matrix[3, 3] chol_cov;
sigma = sqrt(prop) * sqrt(3) * scale;
chol_cov = diag_pre_multiply(sigma, chol);
}

model {
chol ~ lkj_corr_cholesky(1);
scale ~ cauchy(0, 5);
mus ~ cauchy(0, 2);
for (i in 1:N) {
target += multi_normal_cholesky_lpdf(to_vector(x[i]) | mus, chol_cov);
}
}
``````

R code to generate data:

``````library(mvtnorm)

r12 <- .3; r13 <- .5; r23 <- .7
s1 <- 2; s2 <- 4; s3 <- 6
mus <- c(-1, 2, -3)

cov_mat <- matrix(c(
s1 ^ 2,        r12 * s1 * s2, r13 * s1 * s3,
r12 * s1 * s2, s2 ^ 2,        r23 * s2 * s3,
r13 * s1 * s3, r23 * s2 * s3, s3 ^ 2
), 3)

mv <- rmvnorm(
n = 1e3,
mean = mus,
sigma = cov_mat
)

temp_data <- list(
x = mv,
N = nrow(mv)
)
``````
``````data {
int<lower=0> N;
vector[3] x[N];
}

parameters {
cholesky_factor_corr[3] chol;
vector<lower=0>[3] sigma;
vector[3] mus;
}

transformed parameters {
matrix[3, 3] chol_cov = diag_pre_multiply(sigma, chol);
}

model {
chol ~ lkj_corr_cholesky(1);
sigma ~ cauchy(0, 5);
mus ~ cauchy(0, 2);
x ~ multi_normal_cholesky(mus, chol_cov);
}
``````

The code can be rewritten, see above. The cases you are mentioned are for drawing samples from a multi normal distribution, see here for the math behind: https://stats.stackexchange.com/questions/61719/cholesky-versus-eigendecomposition-for-drawing-samples-from-a-multivariate-norma?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa

Thanks!
So this confirms this approach is useful for drawing samples and not to fit multi normal data.
Thanks for your modifications with vectorization, I planned to try so when the code with loops was already functional but this saved me some time!

https://discourse.mc-stan.org/t/centered-or-non-centered-parametrization-for-random-effects/728

“… for `y` being data; the built in `y ~ normal(mu, sigma)` will be better”