Multivariate normal too slow

annptr · November 21, 2017, 4:47pm

Hi there,
I’m very new to Stan and I have been trying a model with multivariate outcomes.
Quick details about the data: I have a set of subjects and groups and for each subject-group pair there is a measure which is 6-dimensional. Because of the way these measures are taken, I assume diagonal covariance matrix.
It takes more than 10 hours to run 500 iterations (250 warmup 250 sampling) with 2 chains on 2 cores.
Below is my code and any helpful look into that to speed up would be really great. Thanks!

data {
int<lower=1> K; //dimension size (6)
int<lower=1> J; // num subjects (183)
int<lower=1> S; // num groups (65)  
int<lower=0> N; // num observations (1000)
vector[J] x[N]; // for subjects indicators
vector[S] u[N]; // for group indicators
vector[K] y[N]; // outcome measure (numeric)
}

parameters {
matrix[K, J] beta; //subject coefficients
matrix[K, S] eta; // group coefficients
vector<lower=0>[K] sigma; // std for dimesnions
}

model {
vector[K] mu[N];
matrix[K, K] Sigma;
Sigma = diag_matrix(square(sigma)); // assuming diagonal covariance matrix

for (n in 1:N)
mu[n] = beta * x[n] + eta * u[n];
y ~ multi_normal(mu , Sigma);
}

bgoodri · November 21, 2017, 6:04pm

Never use multi_normal with a diagonal covariance matrix, use the univariate normal distribution on the stacked observations

annptr · November 21, 2017, 6:21pm

Thanks Ben!

I tried that version too (code below) and it took about 2.5 hours for 500 iterations with warning about increasing max_treedepth.

data {
int<lower=1> J; // num subjects (183)
int<lower=1> S; // num groups (65)  
int<lower=0> N; // num observations (6000)
matrix[N,J] x;  // for subjects indicators
matrix[N,S] u; // for group indicators
real y[N]; // outcome measure (numeric)
}

parameters {
vector[J] beta;
vector[S] eta;
real<lower=0> sigma;
}

model {
y ~ normal(x* beta + u * eta, sigma);
}

Is there any settings that I need to do in RStan to speed this up?

bgoodri · November 21, 2017, 6:50pm

You probably need to use the QR reparameterization.

annptr · November 21, 2017, 8:49pm

Thanks so much Ben! QR reparameterization reduced the time to 16 mins from 2.5 hours.

One question, is QR reparameterization still possible when I model the multivariate outcomes with covariance parameter (similar to my initial code in the question)? Instead of a matrix for the predictors, I’ll have a set of vectors (vector[J] x[N];)

Any pointers will be very useful.

bgoodri · November 21, 2017, 9:23pm

You are going to need a matrix to be doing the QR decomposition. I think in your case, it would be several QR decompositions.

annptr · November 28, 2017, 3:41pm

Following up on this question. I tried a complicated model with a larger dataset as follows:

data{
int<lower=1> J; // num subjects (5,000)
int<lower=1> S; // num groups (90)
int<lower=0> N; // num observations (25,000)
int<lower=1> K; // num dimensions (6)
matrix[N,J] x;  // for subjects indicators
matrix[N,S] u; // for group indicators
vector[K] y[N];  // outcome measure (numeric)
int<lower=1,upper=90> group[N]; // group for subject
}


transformed data {
  // QR reparameterization of predictors
  matrix[N, J] Qx = qr_Q(x)[, 1:J] * N;
  matrix[J, J] R = qr_R(x)[1:J, ] / N;
  matrix[J, J] R_inv = inverse(R);

  matrix[N, S] Qu = qr_Q(u)[, 1:S] * N;
  matrix[S, S] R2 = qr_R(u)[1:S, ] / N;
  matrix[S, S] R2_inv = inverse(R2);
}

parameters {
  matrix[K,J] beta_t; // for subject
  matrix[K,S] eta_t; // for group

  cholesky_factor_corr[K] L_Omega;
  vector<lower=0>[K] L_sigma;
  vector<lower=0,upper=1>[S] alpha;  // random intercept for each subject-group
}

transformed parameters {
  matrix[K,J] beta =  beta_t * R_inv ;
  matrix[K,S] eta =  eta_t * R2_inv;
}


model {
vector[K] mu[N];
matrix[K, K] L_Sigma;

alpha ~ beta(2,2);
L_Sigma = diag_pre_multiply(L_sigma, L_Omega);

to_vector(beta) ~ normal(0, 5);
to_vector(eta) ~ normal(0, 5);
L_Omega ~ lkj_corr_cholesky(4);
L_sigma ~ cauchy(0, 2.5);

for (n in 1:N){
for (k in 1:K){
mu[n][k] = alpha[group[n]] * dot_product(beta_t[k], Qx[n]) + dot_product(eta_t[k], Qu[n]); // to avoid transpose
}
}
y ~ multi_normal_cholesky(mu, L_Sigma);
}

Based on the manual, I tried few speed up trick such as removing transpose and using do_product.
Initial estimation for two chains in two cores is as follows:

Gradient evaluation took 18.4891 seconds
1000 transitions using 10 leapfrog steps per transition would take 184891 seconds.
Adjust your expectations accordingly!

Gradient evaluation took 20.4548 seconds
1000 transitions using 10 leapfrog steps per transition would take 204548 seconds.
Adjust your expectations accordingly!

However, this code is running for more than four days now and it has only finished the warmpup state (250/500 iterations) and started sampling (251/500).

Any suggestions for further improving the speed would be helpful!

Bob_Carpenter · December 2, 2017, 1:07am

The next thing to do is use matrix algebra to define mu. Somethign like

mu[n] = alpha(group[n]) + beta_t[k] * Qx + eta_t[k] * Qu;

Or the other way around to vectorize on K if that’s bigger.

Where do your estimates fall? Those hard interval constraints can be bad news, as I just blogged about:

http://andrewgelman.com/2017/11/28/computational-statistical-issues-uniform-interval-priors/

So the best thing to do at this point might be adding tighter priors. For instance, those Cauchy can also be problematic if there’s not enough data to determine the scale of L_sigma.

Thanks for sharing the detailed example and reporting back.

Topic		Replies	Views
Speeding up multivariate normal model Modeling techniques	7	2975	September 9, 2017
Slow Stan Code for Hierarchical Multivariate Model Modeling	8	1082	July 24, 2018
Improving Hierarchical Model Efficiency Modeling rstan , fitting-issues , specification	7	398	February 14, 2023
Stan efficient! Modeling	7	944	February 17, 2022
Assessing speed of multilevel model Modeling	1	328	March 25, 2020

Multivariate normal too slow

Related Topics