Feasibility of a two “stacked” multivariate normal model?

Hi Stan community,

I’m just getting into Stan modeling, and really appreciate all the great documentation.

I’m hoping to conduct a meta-analysis for N data items each with k regression coefficients \hat{\beta_n}, s.e. of beta \hat{s_n}. C is the correlation matrix of error, which at this point I’m assuming is given.

I currently have a “stacked” multivariate normal model, and was hoping if I could get some feedback into the fitting and speeding this thing up.

My data generating process is the following,

\mu_n\sim N(0,1)
\tau_k \sim \textsf{Gamma}(2, 0.1)
\Omega \sim \textsf{LKJCorr}(2)
\Sigma = \texttt{diag}\mathtt{\_}\texttt{matrix}(\tau) \times \Omega \times \texttt{diag}\mathtt{\_}\texttt{matrix}(\tau)
\beta_n\sim N(\vec{1}*\mu_n, \Sigma)
\widehat{\beta_n}\sim N(\beta_n, V_n)
V_n = \texttt{diag}\mathtt{\_}\texttt{matrix}(\hat{s_{n}}) \times C \times \texttt{diag}\mathtt{\_}\texttt{matrix}(\hat{s_n})

The primary latent parameter of interest is the base mean for each row, \mu_n.

After reading through the documentation, my current implementation of this model in stan is the following.
two_mult_normal.stan (1.3 KB)

data {
  int<lower=0> N; // number of data items
  int<lower=0> K; //number of features
  vector[K] beta_hat[N]; // measured effects
  vector<lower=0>[K] s_hat[N]; // standard error of effects
  matrix[K,K] c_beta; // correlation matrix of errors across features
}

transformed data{
  cholesky_factor_cov[K] L_v[N];  // cholesky factorized, covariance of error of effects for each row
  cholesky_factor_corr[K] cholesky_c_beta; 
  matrix<lower=1,upper=1>[N,K] one_mat; // matrix of ones
  
  cholesky_c_beta = cholesky_decompose(c_beta); // cholesky factorized, correlation matrix of error of effects
 
  for(n in 1:N){
    L_v[n] = diag_pre_multiply(s_hat[n], cholesky_c_beta); // convert to row-wise covariance matrix for error of effects
  }

  one_mat = rep_matrix(1,N,K);
}

parameters {
  cholesky_factor_corr[K] L_Omega;
  vector<lower=0>[K] tau; // Beta prior scale
  matrix[K,N] z_beta; // Unscaled beta
  vector[N] mu; // Row-wise baseline effects
}

transformed parameters {
  matrix[N,K] beta; // feature true effects
  beta = diag_pre_multiply(mu,one_mat) + (diag_pre_multiply(tau,L_Omega) * z_beta)';
}

model {
  mu ~ std_normal();
  L_Omega ~ lkj_corr_cholesky(2);
  tau ~ gamma(2,10);

  to_vector(z_beta) ~ std_normal();

  for(n in 1:N)
    beta_hat[n] ~ multi_normal_cholesky(beta[n], L_v[n]);
}

When I tried the current stan model with a smaller simulated data (N=100, K=50, R code shown bellow), I can recover the \mu_n pretty well. However, my real data set that I want to ultimately fit has the dimensions of roughly N=35000, K=400. I’m hitting into serious speed issue, even after running it for multiple days, its no where close to be even getting past the warm-up stage.

I’ve tried to optimize it the most I can (e.g. vectorization, cholesky decompose), I was wondering if there was any other suggestion people may be able to offer me? Or is two layers of multivariate normal with dimension K > 400, just not really feasible at this point?

A bit of a side note, I’ve noticed that if I don’t set a rough estimate initial value for \Omega, I get some rejected initial value problems.

Chain 1: Rejecting initial value:
Chain 1: Log probability evaluates to log(0), i.e. negative infinity.

Although, if I set initial value to something approx. Cor(\hat{\beta}), it runs fine.

Would really appreciate the help!

Data simulation run.simulate.R (1.3 KB)

library(MASS)
library(rethinking)

set.seed(1)

N=100
K=50

sampled.beta = matrix(, nrow = N, ncol = K)
sampled.beta_hat = matrix(, nrow = N, ncol = K) ## input data

sampled.mu = rnorm(N, mean=0, sd=1)

sampled.s_beta = abs(rgamma(K, 2,10))
sampled.u = diag(sampled.s_beta) %*% rlkjcorr(n=1, K=K, eta=2) %*% diag(sampled.s_beta)

sampled.s_hat = matrix(abs(rgamma(K, 2,10)), nrow = N, ncol = K) ## input data
sampled.c_beta = rlkjcorr(n=1, K=K, eta=2) ## input data

for(n in 1:N){
    if(n %% 10 == 0){print(n)}

    sampled.beta[n,] = mvrnorm(n=1, mu=rep(sampled.mu[n], K), Sigma=sampled.u)

    sigma.error_cov = diag(sampled.s_hat[n,]) %*% sampled.c_beta %*% diag(sampled.s_hat[n,])
    sampled.beta_hat[n,] = mvrnorm(n=1, mu=sampled.beta[n,], Sigma=sigma.error_cov)
}

data.simulate = list(N=N, K=K, beta_hat=sampled.beta_hat, beta=sampled.beta, s_hat=sampled.s_hat, c_beta=sampled.c_beta, 
sigma=sampled.u, mu=sampled.mu ## adding these two for evaluation
)

## create some approx. initial values
approx.omega = cor(data.simulate$beta_hat)
init_fn <- function() {
    list(L_Omega=t(chol(approx.omega)))
}
## fit stan model w/ simulated data and initial values for omega
fit.slim = stan(file = 'two_mult_normal.stan', data = data.simulate, init=init_fn, save_warmup = FALSE)

With K = 400 and N = 35000, there are kind of a crazy number of parameters.

cholesky_factor_corr[K] L_Omega; // 400^2 = 160000 parameters
vector<lower=0>[K] tau; // Beta prior scale
matrix[K,N] z_beta; // Unscaled beta // 14000000 parameters
vector[N] mu; // Row-wise baseline effects

I’m kind of surprised N = 100, K = 50 worked. I think you’ll need to figure out a way to simplify this model somehow. I think the largest Stan model I know of was like 10-20k parameters.

Thank you Ben for your feedback!
Yes, I guess since I’m not directly interested in the \beta s, I should try to simplify the model and not try to estimate each one of those. I’ll give it a try.

Oh yeah I guess that’s a normal-normal model. Try to integrate out the \beta s.

1 Like