Multivariate normal data covariance estimation

Hi all,

I try to fit the covariance matrix for multivariate normal data. My naive code using multi_normal_cholesky_lpdf is apparently working but according to what I read here and there, a reparameterization relying on a standard multivariate normal distribution may dramatically speed up the process (e.g. there or there).

However, these examples try to fit covariance on sampled parameters rather than inputted data and I can’t find a way to code so. Maybe my question is just nonsensical is I don’t really need Stan to fit this matrix…

Below are a toy dataset and my current code. Thanks in advance for any hint!

Stan code:

data {
  int<lower=0> N;
  matrix[N, 3] x;
}

parameters {
  cholesky_factor_corr[3] chol;
  simplex[3] prop;
  real<lower = 0> scale;
  vector[3] mus;
}

transformed parameters {
  vector<lower=0>[3] sigma;
  matrix[3, 3] chol_cov;
  sigma = sqrt(prop) * sqrt(3) * scale;
  chol_cov = diag_pre_multiply(sigma, chol);
}

model {
  chol ~ lkj_corr_cholesky(1);
  scale ~ cauchy(0, 5);
  mus ~ cauchy(0, 2);
  for (i in 1:N) {
    target += multi_normal_cholesky_lpdf(to_vector(x[i]) | mus, chol_cov);
  }
}

R code to generate data:

library(mvtnorm)

r12 <- .3; r13 <- .5; r23 <- .7
s1 <- 2; s2 <- 4; s3 <- 6
mus <- c(-1, 2, -3)

cov_mat <- matrix(c(
  s1 ^ 2,        r12 * s1 * s2, r13 * s1 * s3,
  r12 * s1 * s2, s2 ^ 2,        r23 * s2 * s3, 
  r13 * s1 * s3, r23 * s2 * s3, s3 ^ 2
), 3)

mv <- rmvnorm(
  n = 1e3,
  mean = mus,
  sigma = cov_mat
)

temp_data <- list(
  x = mv,
  N = nrow(mv)
)
data {
  int<lower=0> N;
  vector[3] x[N];
}

parameters {
  cholesky_factor_corr[3] chol;
  vector<lower=0>[3] sigma;
  vector[3] mus;
}

transformed parameters {
  matrix[3, 3] chol_cov = diag_pre_multiply(sigma, chol);
}

model {
  chol ~ lkj_corr_cholesky(1);
  sigma ~ cauchy(0, 5);
  mus ~ cauchy(0, 2);
  x ~ multi_normal_cholesky(mus, chol_cov);
}

The code can be rewritten, see above. The cases you are mentioned are for drawing samples from a multi normal distribution, see here for the math behind: https://stats.stackexchange.com/questions/61719/cholesky-versus-eigendecomposition-for-drawing-samples-from-a-multivariate-norma?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa

Thanks!
So this confirms this approach is useful for drawing samples and not to fit multi normal data.
Thanks for your modifications with vectorization, I planned to try so when the code with loops was already functional but this saved me some time!

https://discourse.mc-stan.org/t/centered-or-non-centered-parametrization-for-random-effects/728

Right. See Bob’s answers in above link.
“… for y being data; the built in y ~ normal(mu, sigma) will be better”

Thanks for closing the topic.