Speeding up a multi_normal model

phildias · October 23, 2017, 10:55pm

Hello!

I’ve recently started using Stan and I’m loving it!

I developed a multiple regression model and it’s taking quite some time to run when I use 4 outcomes, 1500 observations and 6 independent variables. I’ve already set up the code so that I’m benefiting from matrix-like operations and using the Cholesky decomposition.

However, in the final step, I need to use a for loop to properly state my model. Here is my Stan code:

data 
{
    int<lower=1> N; // Number of observations
    int<lower=1> J; // Number of dependent variables i.e. number of regression lines 
    int<lower=0> K; // Number of independent variables
    matrix[N,K] x;  // Matrix containing data on independent variable
    matrix[N,J] y;  // Matrix containing data on dependent variable
}

parameters 
{
    matrix[K,J] beta;
    vector<lower=0>[J] st_devs;
    cholesky_factor_corr[J] L_corr;
}

model 
{
    matrix[N,J] xbeta = x * beta;
    
    for (j in 1:J)
        beta[,j] ~ normal(0,10);
        
    st_devs ~ cauchy(0, 2.5);
    L_corr ~ lkj_corr_cholesky(1.0);
    
    for (i in 1:N)
        y[i] ~ multi_normal_cholesky(xbeta[i],diag_pre_multiply(st_devs, L_corr));
}

generated quantities
{
    corr_matrix[J] cor_mtx = multiply_lower_tri_self_transpose(L_corr);
    cov_matrix[J]  cov_mtx = quad_form_diag(cor_mtx, st_devs);
}

My question here is actually pretty simple: is there any way I could get rid of that for loop in the model (the one with the multi_normal statement)?

The multi_normal distribution can’t receive the whole xbeta matrix as is, but the distribution is indeed already vectorized to receive array-like inputs. So I thought about transforming the xbeta matrix with dimensions NxJ into an N-long array containing J-long vectors at each of its elements. Something like this:

matrix[N,J] xbeta = x * beta;
row_vector[J] xbeta_new[N] = transform_into_array_of_vectors(xbeta);

I’m hoping there’s a magical function like this that doesn’t involve for-loops, or else that would defeat the purpose of speeding up the whole process.

Is there any way of doing this? I tried some of the “to_array” and “to_vector” functions, but Stan kept giving me errors.

Thank you so much!

PS: Sorry for the poor choice of posting category. I wasn’t really sure where question this fit.

bgoodri · October 24, 2017, 12:03am

For x and y, you should just declare it as

row_vector[K] x[N];
row_vector[J] y[N];

but for the conditional mean, you have to do

row_vector[J] xbeta[N];
for (n in 1:N) xbeta[n] = x[n] * beta;
y ~ multi_normal_cholesky(xbeta, diag_pre_multiply(st_devs, L_corr));

which is all still much faster than calling multi_normal_cholesky N times.

phildias · October 24, 2017, 6:44pm

Thanks for the heads up, @bgoodri!

Here are the speed-up results:

using my original formulation: sampling time = 1899 seconds
using the formulation you suggested: sampling time = 1716 seconds

Note that this happens when I set n_jobs = 1 in the pystan call.

phildias · October 26, 2017, 2:16am

I know this is even more memory intensive, but I’ve gotten some pretty great results using both matrix and array-variables like this:

In data block:

matrix[N,K] x;

In model block:

matrix[N,J] xbeta_temp = x * beta;
row_vector[J] xbeta[N];
for (n in 1:N)
    xbeta[n] = xbeta_temp[n];

So here I’m benefiting from the quick matrix multiplication and just doing a row-wise (from 1 to N) assignment.

Bob_Carpenter · November 2, 2017, 10:00pm

Exactly what you need to do because matrix multiplication is faster, but vectorization is geared to arrays to avoid confusion in multivariate distributions.

Topic		Replies	Views
Multivariate normal too slow Modeling performance	7	4198	December 2, 2017
Stan how to optimize the estimation process? Algorithms	6	1860	August 30, 2017
Making a multiple regression model more efficient Modeling	16	1257	April 20, 2018
Increasing Stan efficiency by vectorizing for loop Modeling	6	679	October 9, 2022
Multi Normal model Modeling	4	630	December 21, 2019

Speeding up a multi_normal model

Related topics