Hi there,
I’m applying a high-dimensional multivariate gaussian mixture model to some epidemiological data and need to improve the computational speed. Essentially all of the computational time is spent in the multi_normal_lpdf loop, which I would ideally like to vectorize. However, in my case I think it would only work if multi_normal_lpdf returned the vector of values rather than the sum of the vector because I need to weight the individual values by another parameter. I’d really appreciate if there were any suggestions for workarounds or ways to speed this up?
Here is a much simplified 2-dimension example of the code. For context, I am using the model in up to 10 dimensions with thousands of data points and I impose some rules on the relationships between the many gaussian mean parameters to aid identifiability.
data {
int N;
vector[2] y[N];
vector[4] alpha;
}
parameters {
simplex[4] theta; // gaussian weights
vector[2] mu[4]; // gaussian means
cholesky_factor_corr[2] L[4]; // cholesky factor
vector<lower=0>[2] sigma[4]; // standard deviations
}
transformed parameters {
corr_matrix[2] R[4]; // correlation matrix
matrix[2,2] Sigma[4]; // covariance matrix
//--- generate covariance matrix ---//
for(c in 1:4){
R[4] = multiply_lower_tri_self_transpose(L[4]); // R = L*L'
Sigma[4] = quad_form_diag(R[4], sigma[4]);
}
}
model {
matrix[N,4] pC;
// priors
theta ~ dirichlet(alpha);
mu[1,] ~ normal(0,0.2);
mu[2,] ~ normal(1.5,0.2);
for(c in 1:4) sigma[c] ~ normal(0,0.1);
for(c in 1:4) L[c] ~ lkj_corr_cholesky(1);
// likelihood
for(n in 1:N){
for(c in 1:4){
pC[n,c] = log(theta[c]) + multi_normal_lpdf(y[n] | mu[c], Sigma[c]);
}
target += log_sum_exp(pC[n,]);
}
}