Tip for Fitting Multivariate Normal

I have had a lot of difficulty with fitting some time series data with a multivariate normal distribution in Stan over the years. Sometimes it worked well, but not always. In my recent project, the sample covariance matrix tends to have a very low determinant, which tends to throw all kinds of problems for Stan’s algorithms.

I have found that one of my issues is with the lkj_corr_cholesky prior. My data doesn’t have anything like an identity matrix for a correlation matrix, hence why the determinant is so low.

However, there is a very simple solution. One recommendation from the documentation is to center and scale variables so that they have a mean of zero and standard deviation of 1. One can also multiply the scaled data by some matrix so that the resulting matrix has an identity matrix, which corresponds well to what the lkj_corr_cholesky prior expects. I used the cholesky because that is rather strongly connected to the structural VAR literature (except no lags) and I could do it all within Stan. One could probably also use PCA.

I provide some R code and two different multivariate normal implementations in Stan below. The results when using the cholesky adjustment results in much faster performance from Stan, no divergences, high N_eff, rhats around 1. The original version has divergences, is slow, and poor metrics.

One thing I noticed from the pairs plots for L_Omega parameters (corresponding to the cholesky correlation matrix) below is that some of the parameters seem to trace out a curve instead of looking like a random cloud. I notice that these variables tend to be the ones with the lower N_eff compared to the others. The original approach that I had taken was more indirect. I was using regressions where the first variable was just univariate normal and then the second was regressed on the first variable and the third was regressed against the second and third. This was a big positive for me initially, but I found that it could get a little confusing when I tried to make my model more complicated. Nevertheless, it could be a technique applied again to the cholesky adjusted data if needing to boost N_eff. I’ll have to check if it make sense.

//multivariate_normal.stan
data {
	int<lower=1> T;
	int<lower=1> N;
	matrix[N, T] X;
}
transformed data {
	matrix[N, T] X_scaled;
	matrix[N, N] X_scaled_cov;
	matrix[N, N] X_scaled_cov_chol;
	matrix[N, T] Z;
	vector[N] Z_vec[T];

	for (i in 1:N) {
		X_scaled[i] = (X[i] - mean(X[i])) / sd(X[i]);
	}
	for (i in 1:T) {
		Z_vec[i] = X_scaled[, i];
	}
}
parameters {
	vector[N] mu;
	
	cholesky_factor_corr[N] L_Omega;
	vector<lower=0>[N] L_sigma;
}
model {
	matrix[N, N] L_Sigma;
	
	mu ~ normal(0, 1);

	L_Omega ~ lkj_corr_cholesky(1);
	L_sigma ~ cauchy(0, 2.5);
	L_Sigma = diag_pre_multiply(L_sigma, L_Omega);
	
	Z_vec ~ multi_normal_cholesky(mu, L_Sigma);
}

//multivariate_normal_adj_cholesky.stan
data {
	int<lower=1> T;
	int<lower=1> N;
	matrix[N, T] X;
}
transformed data {
	matrix[N, T] X_scaled;
	matrix[N, N] X_scaled_cov;
	matrix[N, N] X_scaled_cov_chol;
	matrix[N, T] Z;
	vector[N] Z_vec[T];

	for (i in 1:N) {
		X_scaled[i] = (X[i] - mean(X[i])) / sd(X[i]);
	}
	X_scaled_cov = (X_scaled * X_scaled') / (T - 1);
	X_scaled_cov_chol = cholesky_decompose(X_scaled_cov);
	Z = X_scaled_cov_chol \ X_scaled;
	for (i in 1:T) {
		Z_vec[i] = Z[, i];
	}
}
parameters {
	vector[N] mu;
	
	cholesky_factor_corr[N] L_Omega;
	vector<lower=0>[N] L_sigma;
}
model {
	matrix[N, N] L_Sigma;
	
	mu ~ normal(0, 1);

	L_Omega ~ lkj_corr_cholesky(4);
	L_sigma ~ cauchy(0, 2.5);
	L_Sigma = diag_pre_multiply(L_sigma, L_Omega);
	
	Z_vec ~ multi_normal_cholesky(mu, L_Sigma);
}

The following R code provides an example:

library(MASS)
library(rstan)
rstan_options(auto_write = TRUE)

# Basic parameters
N_sim <- 100
N <- 2

# Stan parameters
N_iter <- 500
N_chain <- 4

# Set Working Directory
wd_orig <- ***enter your own***
setwd(wd_orig)

# Simulate Data
C0 <- rep(0.5, N ^ 2)
C0 <- matrix(C0, N, N)
diag(C0) <- 1

X <- mvrnorm(N_sim, rep(0, N), C0)
X <- cbind(X, 0.5 * X[, 1] + 0.5 * X[, 2] + rnorm(N_sim, 0, 0.000001))

# Stan

stan_data <- list(T=N_sim, N=3, X=t(X))

fit_1 <- stan(file = 'multivariate_normal.stan',
			  data = stan_data, verbose = FALSE,
			  iter=N_iter, chain=N_chain)

fit_2 <- stan(file = 'multivariate_normal_adj_cholesky.stan',
			  data = stan_data, verbose = FALSE,
			  iter=N_iter, chain=N_chain)
			  
print(fit_1)
plot(fit_1, pars = c('mu', 'L_sigma'), include = TRUE)
plot(fit_1, pars = c('L_Omega'), include = TRUE)
pairs(fit_1, pars = c('L_Omega'), include = TRUE)

print(fit_2)
plot(fit_2, pars = c('mu', 'L_sigma'), include = TRUE)
plot(fit_2, pars = c('L_Omega'), include = TRUE)
pairs(fit_2, pars = c('L_Omega'), include = TRUE)
5 Likes