Slow Stan Code for Hierarchical Multivariate Model

yyj0810 · June 27, 2018, 8:43pm

Hi everyone! I am trying to fit a hierarchical multivariate model. However, the model takes long long hours to run. Below are the model, Rstan code and simulation data. I hope to get some help. Thank you!

The model I am trying to fit is the following:

\left( \begin{array}{c} Y_i \\ \log s_i \\ \end{array} \right) \sim N_2 \biggl[ \left( \begin{array}{c} \theta_i \\log \sigma_i \\ \end{array} \right), \left( \begin{array}{cc} \sigma_i^2 & \rho_1\sigma_i \\\rho_1\sigma_i & 1 \\ \end{array} \right) \biggr]
Hierarchical model for \theta_i and \sigma^2_i,
\left( \begin{array}{c} \theta_i \\ \log \sigma_i \\ \end{array} \right) \sim N_2 \biggl[ \left( \begin{array}{c} \mu_\theta \\\mu_\sigma \\ \end{array} \right), \left( \begin{array}{cc} r_\theta^2 & \rho_2 r_{\theta}r_{\sigma} \\\rho_2 r_{\theta}r_{\sigma} & r_\sigma^2 \\ \end{array} \right) \biggr]

Rstan code

data {
      int<lower=1> N;
      vector[2] x[N];
    }
    parameters {
      vector[2] mu;
      vector<lower=0>[2] lambda;
      vector[2] theta[N];
      real<lower=-1,upper=1> r1;
      real<lower=-1,upper=1> r2;
    }
    transformed parameters {
      vector<lower=0>[2] sigma;
      cov_matrix[2] T;
      cov_matrix[2] M[N];
      //Reparameterization
      sigma[1] = inv_sqrt(lambda[1]);
      sigma[2] = inv_sqrt(lambda[2]);
      T[1,1] = square(sigma[1]);
      T[1,2] = r2 * sigma[1] * sigma[2];
      T[2,1] = r2 * sigma[1] * sigma[2];
      T[2,2] = square(sigma[2]);
      //Reparameterization
      for (i in 1:N){
      M[i][1,1] = square(exp(theta[i,2]));
      M[i][1,2] = r1 * exp(theta[i,2]);
      M[i][2,1] = r1 * exp(theta[i,2]);
      M[i][2,2] = 1;
      }
    }
    model {
    lambda[1] ~ gamma(1,1);
    lambda[2] ~ gamma(1,1);
    for (i in 1:N){
    theta[i] ~ multi_normal(mu, T);
    x[i] ~ multi_normal(theta[i], M[i]);}
    }

Data generation

i <- 999
n <- 50
mu <-100
tau <- 50

set.seed(i)
y=vector()
theta=vector()
s2=vector()
sigma2=vector()

theta=rnorm(n=n,mean=mu,sd=tau)
sigma2=rlnorm(n=n,mean=5+0.1*theta,sd=1)
y=rnorm(n,theta,sqrt(sigma2))
s2=rlnorm(n=n,mean=log(sigma2),sd=1)
data_simu=cbind(y,theta,sigma2, s2)
data <- data.frame(data_simu)[,c("theta","y","s2")]
x <- cbind(data$y,log(sqrt(data$s2)))
dat=list(N=n,x=x)

martinmodrak · July 9, 2018, 12:03pm

Not completely understanding your code, but here are some quick thoughts:

It seems like the lambda parameters represent precision of the normal distribution, with a gamma prior. Andrew and others seem to be of the opinion that this is not a great parametrization - working with sigma directly and putting a Half-normal prior on sigma might improve things.
exp(theta[i,2]) can be computed once and then reused
multi_normal_cholesky tends to be a faster version. For the T matrix, which is reused, you should get a speed up by calling cholesky_decompose once and then using the result. You could be able to do some algebra and compute the Cholesky factor of the covariance matrix directly from the parameters which would be useful for the M matrices.
The “folk theorem” states that poor performance usually signals some problems with the model. Does the model converge? Does it recover simulated data well? Are the Rhat and n_eff statistics reasonable?
lambda[1] ~ gamma(1,1); lambda[2] ~ gamma(1,1); could be vectorized as lambda ~ gamma(1,1);

Bob_Carpenter · July 20, 2018, 12:16pm

You’re doing a lot of redundant work here.

First, I wouldn’t recommend parameterizing in terms of precision (lambda). Just use the scales sigma as parameters directly—it’s much easier to reason about priors that way.

Second, don’t recompute repeated terms. For example, just use T[2,1] = T[1,2] in the definition of the covariance matrices.

Third, add priors. They help immensely with controlling both the statistical and computational priors of models.

Overall, we tend to factor our covariance priors as you have done, but we tend to do it with an LKJ prior on the correlation matrix and then normal or maybe student-t priors on the scales. There’s a discussion in the regression chapter of the manual. This approach is computationally more efficient and tends to fit better, too.

Vectorize everything, so that it’s just lambda ~ gamma(1, 1) (though I’d really encourage something more like sigma ~ normal(0, 1).

Then take use theta ~ multi_normal(mu, T) outside of the loop so that it only needs to solve the matrix T once. Not much you can do to make the second multi-normal faster as you have a different value for each entry.

pchiu · July 20, 2018, 10:46pm

Hi! I’m trying to improve my hierarchical multivariate model as well, and hope it’s okay to post the model here:

There are IRT models on top of a hierarchical multivariate model (with diagonal level 1 error variances and unstructured level 2 error variances ). Any suggestions will be greatly appreciated!

data {
  int<lower=1> N0;
  int<lower=1> jj0[N0];
  int<lower=0,upper=1> y0[N0];
  real beta0[N0];
  real alpha0[N0];

  int<lower=1> N1;
  int<lower=1> jj1[N1];
  int<lower=0,upper=1> y1[N1];
  real beta1[N1];
  real alpha1[N1];

  int<lower=1> N2;
  int<lower=1> jj2[N2];
  int<lower=0,upper=1> y2[N2];
  real beta2[N2];
  real alpha2[N2];

  int<lower=1> J;
  matrix[3,2] X;
  matrix[3,2] X_new;
  matrix[2,2] R;
  matrix[3,3] Omega;
  cov_matrix[2] wishart_sigma;
  int<lower=1> wishart_df;
}

parameters {
  vector[3] THETA[J];

  //level2/////////////////////////////////////////////////////////////////////
  vector[2] beta;
  cov_matrix[2] L2_cov;
  vector[2] pi_i[J];

  //level1////////////////////////////////////////////////////////////////////
  vector<lower=0>[3] L1_sigma;
}

transformed parameters{
  real theta0[J];
  real theta1[J];
  real theta2[J];

  matrix[3, 3] L1_cov;
  L1_cov= quad_form_diag(Omega, L1_sigma);

  for (n in 1:J)
    theta0[n]=THETA[n,1];
  for (n in 1:J)
    theta1[n]=THETA[n,2];
  for (n in 1:J)
    theta2[n]=THETA[n,3];
}

model {
  vector[N0] atb_0;
  vector[N1] atb_1;
  vector[N2] atb_2;

  // priors://////////////////////////////////////////////////////////////////
  theta0 ~ normal(0, 1);
  theta1 ~ normal(0, 1);
  theta2 ~ normal(0, 1);

  to_vector(beta) ~ normal(0, 5);
  L2_cov ~ inv_wishart(wishart_df,wishart_sigma);
  L1_sigma~uniform (0, 20);

  for (n in 1:N0)
    atb_0[n]=alpha0[n]*(theta0[jj0[n]] - beta0[n]);
  y0 ~ bernoulli_logit(atb_0);

  for (n in 1:N1)
    atb_1[n]=alpha1[n]*(theta1[jj1[n]] - beta1[n]);
  y1 ~ bernoulli_logit(atb_1);

  for (n in 1:N2)
    atb_2[n]=alpha2[n]*(theta2[jj2[n]] - beta2[n]);
  y2 ~ bernoulli_logit(atb_2);

  //random effects:////////////////////////////////////////////////////////////
  pi_i ~ multi_normal(beta,L2_cov);

  //likelihood:////////////////////////////////////////////////////////////////

  for (n in 1:J)
    THETA[n]~ multi_normal(X * pi_i[n],L1_cov);
}

generated quantities {
  vector[3] THETA_PRED[J];
  for (n in 1:J)
    THETA_PRED[n]=multi_normal_rng(X_new * pi_i[n],L1_cov);
}

[edit: escaped and indented code]

Bob_Carpenter · July 21, 2018, 10:46am

Not only OK, but encouraged. I escaped it with

```
code
```

so it’d be easier to read.

Bob_Carpenter · July 21, 2018, 10:54am

I’d recommend reading the manual chapter on regression—it goes over our recommended priors and also provides efficiency and programming tips. The main takeaways to look for are:

use a scaled LKJ rather than inverse Wishart prior
work with Cholesky factor parameterizations for efficiency (huge savings in more than a couple dimensions)
vectorize the multivariate normals (even bigger efficiency savings because you only solve once)
don’t use interval-constrained priors (like on L1_sigma), and if you do insist on going against our recommendations, you should declare L1_sigma with <lower = 0, upper = 20>—Stan requires the support to match the constraints and the onstraints allow L1_sigma = 30, but that is out of support for the log density
vectorize assignments, e.g.,
atb_1 = alpha1 .* (theta1[jj1] - beta1); and
and
theta0 = THETA[ , 1];

yyj0810 · July 24, 2018, 3:25am

Thanks, Bob. These comments are very helpful!

yyj0810 · July 24, 2018, 3:26am

Thanks so much!

yyj0810 · July 24, 2018, 3:32am

Actually, this model does converge when I do the model checking. I believe it is the loop that makes this model need more time to run. The LKJ prior on the correlation matrix is quite helpful and I also simplified some redundant codes. Thank you so much!

Topic		Replies	Views
Improving Hierarchical Model Efficiency Modeling rstan , fitting-issues , specification	7	533	February 14, 2023
How to speed up my Stan code? Modeling rstan , fitting-issues	8	744	June 14, 2021
(Another) Slow Hierarchical Model in Stan Modeling fitting-issues , specification , performance	3	1766	September 29, 2017
How to speed up model? Modeling rstan , techniques , performance	7	807	September 9, 2020
Slow hierarchical ODE model Modeling rstan	3	525	January 25, 2023

Slow Stan Code for Hierarchical Multivariate Model

Related topics