Multi_normal_cholesky dramatically slower when cholesky factor scaled with variances

andrjohns · September 4, 2019, 6:17am

I’m working with a model that has multiple pairs of correlated observations, and so am estimating multiple multi_normal_cholesky distributions. I’m also estimating the variances for these observations, and so am using the diag_pre_multiply(variance, cholesky) approach.

When using just the cholesky factor (and not scaling with the variance), the chains mix well enough and runs in a reasonable time (~30secs).

However, when using the scaled cholesky factors, the model runs 10x slower (~350secs) and the chains are almost completely stationary.

I’ll paste the full model below, as well as attach some data that reproduces the issue. If there was some insight as to where this was coming from, it would be greatly appreciated.

data.R (13.1 KB) NonCentRes.stan (2.8 KB)

The relevant part is this section:

  for(c in 1:(T*(J-1))) {
    matrix[2,2] y_cov[J*(T-1)];
    y_cov[c] = diag_pre_multiply(resvar[{c,J+c}],y_Lcorr[c]);
    y[,{c,J+c}] ~ multi_normal_cholesky(eta_vec[,{c,J+c}], y_cov[c]);
  }

The model works without any issues if I use y_Lcorr[c] instead of y_cov[c] in the multi_normal_cholesky call

The full model is:

data{
  int N;              //Number of individuals
  int J;              //Number of items
  int T;              //Number of timepoints
  vector[J*T] y[N];
}

parameters{
  row_vector[J*T] intercepts;

  /* 'Raw' Variables for Non-Centered Parameterisation */
  matrix<lower=0.0>[J,T] load_r;
  matrix[N,T] eta_r;

  /* Hyperparameters for Loadings */
  vector<lower=0>[T] load_m;
  vector<lower=0>[T] load_sd;

  /* Correlations between factors over time,
       item residual variances, and residual correlations */
  cholesky_factor_corr[T] e_Lcorr;
  vector<lower=0>[J*T] resvar;
  cholesky_factor_corr[2] y_Lcorr[J*(T-1)];
}

transformed parameters{
  matrix[T,J] load;

  /* Transform 'raw' loadings to required distribution */
  for(j in 1:J)
    load[,j] = (load_m[j] + load_r[j] * load_sd[j])';
}

model{
  /* These are temporary variables, used for holding
       functions of other parameters. More efficient on
       memory to include here as their posteriors aren't saved */
  matrix[J*T, T] lam;
  matrix[N,T] eta_t;
  matrix[N,J*T] lam_eta;
  row_vector[J*T] eta_vec[N];

  /* Priors for factor and residual correlations */
  e_Lcorr ~ lkj_corr_cholesky(1);

  for(c in 1:(T*(J-1)))
    y_Lcorr[c] ~ lkj_corr_cholesky(1);

  /* Priors for intercepts and loading hyperparameters */
  intercepts ~ normal(0,5);
  load_m ~ normal(0,5);
  load_sd ~ normal(0,0.1);

  /* Priors for 'raw' loadings for non-centered parameterisation */
  to_vector(load_r) ~ std_normal();

  /* Pack loadings into a factor loading matrix with zeros
     for cross-loadings */
  lam[,1] = append_row(load[1]', rep_vector(0, J*(T-1)));
  lam[,T] = append_row(rep_vector(0, J*(T-1)), load[T]');

  for(t in 2:(T-1))
    lam[,t] = append_row(append_row(rep_vector(0, J*(t-1)), load[t]'),
                         rep_vector(0, J*(T-t)));

  /* Priors for residual variances */
  resvar ~ std_normal();
  
  /* Priors for 'raw' factor scores and transform with multivariate
     non-centered parameterisation */
  to_vector(eta_r) ~ std_normal();
  for(n in 1:N)
    eta_t[n] = (e_Lcorr*eta_r[n]')';

  /* Create the E[Y] for each individual */
  lam_eta = eta_t*lam';
  for(n in 1:N)
    eta_vec[n] = lam_eta[n] + intercepts;

  /* Link to observed data, with a multivariate normal distribution
       for each pair of correlated observations */
  for(c in 1:(T*(J-1))) {
    matrix[2,2] y_cov[J*(T-1)];
    y_cov[c] = diag_pre_multiply(resvar[{c,J+c}],y_Lcorr[c]);
    y[,{c,J+c}] ~ multi_normal_cholesky(eta_vec[,{c,J+c}], y_cov[c]);
  }
}

generated quantities {
  /* Transform factor correlations from Cholesky to Correlation matrix */
  corr_matrix[T] corr = multiply_lower_tri_self_transpose(e_Lcorr);

  /* Put variance parameters into variance scale (from SD) */
  vector[J*T] vars = square(resvar);
  vector[T] loadvars = square(load_sd);
}

martinmodrak · September 4, 2019, 2:37pm

I don’t fully understand your model, but there is some weird stuff going on, which might be me not uderstanding your model or real problems.

Most notably, the resvar variable is weird as some of its elements get reused for multiple y_cov elements while others don’t (the last J elements only appear in one matrix). Is this desired? I can imagine that the two uses of each element induce difficult geometry.

Also why is y_cov an array of matrices, wouldn’t the following work just as well?

for(c in 1:(T*(J-1))) {
    matrix[2,2] y_cov;
    y_cov = diag_pre_multiply(resvar[{c,J+c}],y_Lcorr[c]);
    y[,{c,J+c}] ~ multi_normal_cholesky(eta_vec[,{c,J+c}], y_cov);
  }

Generally the strategy going forward would be to strip everything except for the covariances from the model and see if the problem keeps appearing there.

andrjohns · September 5, 2019, 12:24am

Oh good catch, I was getting too clever with my looping.

The model is a repeated measures factor analysis with autocorrelated residuals, so y1_t1 is correlated with y1_t2, and y1_t2 is correlated with y1_t3, etc. The aim was to specify pairwise multi-normal distributions, but I didn’t realise that some of the y’s and their variances were appearing in multiple multi_normal calls (e.g. y1_t2 is separately specified as multi_normal with both y1_t1 and y1_t3).

The working solution that I found was to ‘pack’ the individual cholesky factors into a single larger cholesky factor and use that in a single multi_normal call:

transformed parameters{
  cholesky_factor_corr[J*T] y_corr = diag_matrix(rep_vector(1,J*T));

  for(c in 1:(T*(J-1))) {
    y_corr[c+J,c] = y_Lcorr[c][2,1];
    y_corr[c+J,c+J] = y_Lcorr[c][2,2];
  }
}

model{
...

  y ~ multi_normal_cholesky(eta_vec, diag_pre_multiply(resvar,y_corr));
}

That model runs exceedingly well and recovers the parameters nicely.

Thanks for catching that looping oddity, it’s always the little things that trip me up!

Topic		Replies	Views
Cholesky decomposition/speed/divergencies Modeling	11	2816	August 19, 2017
Slow run time when fitting multi_normal_cholesky likelihood in bivariate model Modeling rstan , fitting-issues , specification , performance	3	281	May 8, 2024
What is the point of the multi_normal_cholesky parametrization? Modeling	9	173	February 3, 2025
Multi_normal_cholesky is slower than hand coded lpdf Developers	6	795	April 11, 2022
Cholesky decomposition failed for precision/covariance matrix Modeling	21	5413	August 28, 2017

Multi_normal_cholesky dramatically slower when cholesky factor scaled with variances

Related topics