Hi Stan community,
I’m just getting into Stan modeling, and really appreciate all the great documentation.
I’m hoping to conduct a meta-analysis for N data items each with k regression coefficients \hat{\beta_n}, s.e. of beta \hat{s_n}. C is the correlation matrix of error, which at this point I’m assuming is given.
I currently have a “stacked” multivariate normal model, and was hoping if I could get some feedback into the fitting and speeding this thing up.
My data generating process is the following,
\mu_n\sim N(0,1)
\tau_k \sim \textsf{Gamma}(2, 0.1)
\Omega \sim \textsf{LKJCorr}(2)
\Sigma = \texttt{diag}\mathtt{\_}\texttt{matrix}(\tau) \times \Omega \times \texttt{diag}\mathtt{\_}\texttt{matrix}(\tau)
\beta_n\sim N(\vec{1}*\mu_n, \Sigma)
\widehat{\beta_n}\sim N(\beta_n, V_n)
V_n = \texttt{diag}\mathtt{\_}\texttt{matrix}(\hat{s_{n}}) \times C \times \texttt{diag}\mathtt{\_}\texttt{matrix}(\hat{s_n})
The primary latent parameter of interest is the base mean for each row, \mu_n.
After reading through the documentation, my current implementation of this model in stan is the following.
two_mult_normal.stan (1.3 KB)
data {
int<lower=0> N; // number of data items
int<lower=0> K; //number of features
vector[K] beta_hat[N]; // measured effects
vector<lower=0>[K] s_hat[N]; // standard error of effects
matrix[K,K] c_beta; // correlation matrix of errors across features
}
transformed data{
cholesky_factor_cov[K] L_v[N]; // cholesky factorized, covariance of error of effects for each row
cholesky_factor_corr[K] cholesky_c_beta;
matrix<lower=1,upper=1>[N,K] one_mat; // matrix of ones
cholesky_c_beta = cholesky_decompose(c_beta); // cholesky factorized, correlation matrix of error of effects
for(n in 1:N){
L_v[n] = diag_pre_multiply(s_hat[n], cholesky_c_beta); // convert to row-wise covariance matrix for error of effects
}
one_mat = rep_matrix(1,N,K);
}
parameters {
cholesky_factor_corr[K] L_Omega;
vector<lower=0>[K] tau; // Beta prior scale
matrix[K,N] z_beta; // Unscaled beta
vector[N] mu; // Row-wise baseline effects
}
transformed parameters {
matrix[N,K] beta; // feature true effects
beta = diag_pre_multiply(mu,one_mat) + (diag_pre_multiply(tau,L_Omega) * z_beta)';
}
model {
mu ~ std_normal();
L_Omega ~ lkj_corr_cholesky(2);
tau ~ gamma(2,10);
to_vector(z_beta) ~ std_normal();
for(n in 1:N)
beta_hat[n] ~ multi_normal_cholesky(beta[n], L_v[n]);
}
When I tried the current stan model with a smaller simulated data (N=100, K=50, R code shown bellow), I can recover the \mu_n pretty well. However, my real data set that I want to ultimately fit has the dimensions of roughly N=35000, K=400. I’m hitting into serious speed issue, even after running it for multiple days, its no where close to be even getting past the warm-up stage.
I’ve tried to optimize it the most I can (e.g. vectorization, cholesky decompose), I was wondering if there was any other suggestion people may be able to offer me? Or is two layers of multivariate normal with dimension K > 400, just not really feasible at this point?
A bit of a side note, I’ve noticed that if I don’t set a rough estimate initial value for \Omega, I get some rejected initial value problems.
Chain 1: Rejecting initial value:
Chain 1: Log probability evaluates to log(0), i.e. negative infinity.
Although, if I set initial value to something approx. Cor(\hat{\beta}), it runs fine.
Would really appreciate the help!
Data simulation run.simulate.R (1.3 KB)
library(MASS)
library(rethinking)
set.seed(1)
N=100
K=50
sampled.beta = matrix(, nrow = N, ncol = K)
sampled.beta_hat = matrix(, nrow = N, ncol = K) ## input data
sampled.mu = rnorm(N, mean=0, sd=1)
sampled.s_beta = abs(rgamma(K, 2,10))
sampled.u = diag(sampled.s_beta) %*% rlkjcorr(n=1, K=K, eta=2) %*% diag(sampled.s_beta)
sampled.s_hat = matrix(abs(rgamma(K, 2,10)), nrow = N, ncol = K) ## input data
sampled.c_beta = rlkjcorr(n=1, K=K, eta=2) ## input data
for(n in 1:N){
if(n %% 10 == 0){print(n)}
sampled.beta[n,] = mvrnorm(n=1, mu=rep(sampled.mu[n], K), Sigma=sampled.u)
sigma.error_cov = diag(sampled.s_hat[n,]) %*% sampled.c_beta %*% diag(sampled.s_hat[n,])
sampled.beta_hat[n,] = mvrnorm(n=1, mu=sampled.beta[n,], Sigma=sigma.error_cov)
}
data.simulate = list(N=N, K=K, beta_hat=sampled.beta_hat, beta=sampled.beta, s_hat=sampled.s_hat, c_beta=sampled.c_beta,
sigma=sampled.u, mu=sampled.mu ## adding these two for evaluation
)
## create some approx. initial values
approx.omega = cor(data.simulate$beta_hat)
init_fn <- function() {
list(L_Omega=t(chol(approx.omega)))
}
## fit stan model w/ simulated data and initial values for omega
fit.slim = stan(file = 'two_mult_normal.stan', data = data.simulate, init=init_fn, save_warmup = FALSE)