Help to speed up a hierarchical mvn model

I’m trying to speed up the model below. I’ve already tried QR reparamaterisation of the data and noncentered parameterisation of the level 2 effects to no avail. I even tried a map_rect approach that calculates the right answer but made things amazingly slower over all.

Can anyone suggest any other tips or tricks to speed things up? I was particularly wondering if there is any matrix magic that could improve on the nested loops? Alternatively what would be a smart way to implememnt map_rect in this model?

Thanks in advance.

    int<lower=0> NvarsX;     // num independent variables
    int<lower=0> NvarsY;     // num dependent variables
    int<lower=0> N;          // Total number of rows
    int<lower=0> N_ind;      // Number of individuals/participants
    int<lower=1> indiv[N];   // data to identify individuals
    matrix[N, NvarsX] x;     // data for independent vars
    vector[NvarsY] y [N];    // data for dependent vars

    cholesky_factor_corr[NvarsY] L_Omega; //For the correlations between outcome variables
    vector<lower=0>[NvarsY] L_sigma;      //Residual SD of outcome variables
    matrix[NvarsY, N_ind] Zbeta0_ind;         //Intercepts at individual level
    matrix[NvarsX, N_ind] Zbeta_ind [NvarsY]; //Coefficients at the individual level
    vector<lower=0> [NvarsY] Tau0;           //Random effect for individual intercepts
    matrix<lower=0> [NvarsY, NvarsX] Tau;    //Random effect for indvidiual coefficients

    vector[NvarsY] Beta0;         // Level 2 intercepts for each Y
    vector[NvarsX] Beta [NvarsY]; // Level 2 coefficients for each X vs each Y

transformed parameters{
    vector[NvarsY] mu[N];
    matrix[NvarsY, NvarsY] L_Sigma;
    matrix[NvarsY, N_ind] beta0_ind;
    matrix[NvarsX, N_ind] beta_ind [NvarsY];

    L_Sigma = diag_pre_multiply(L_sigma, L_Omega);
    // Define Individual level effects - non parametric specification
    for (s in 1:N_ind){
        for (dv in 1:NvarsY){
            beta0_ind[dv, s] = Zbeta0_ind[dv, s] * Tau0[dv] + Beta0[dv];
            for (iv in 1:NvarsX){
                beta_ind[dv, iv, s] = Zbeta_ind[dv, iv, s] * Tau[dv, iv] + Beta[dv, iv];

    // Define level 2 effects
    for (i in 1:N){
        for (dv in 1:NvarsY){
            mu[i, dv] = beta0_ind[dv, indiv[i]] +
                            dot_product( to_vector( beta_ind[dv, 1:NvarsX, indiv[i]] ), x[i, 1:NvarsX] ); 

    // Priors
    L_Omega ~ lkj_corr_cholesky(1);
    L_sigma ~ normal(0, 1);
    Tau0 ~ normal(0,1);
    to_vector(Tau) ~ normal(0,1);
    for (s in 1:N_ind){
        for (dv in 1:NvarsY){
            Zbeta0_ind[dv, s] ~ normal(0,1);
            Zbeta_ind[dv, 1:NvarsX, s] ~ normal(0,1);
    to_vector(Beta0) ~ cauchy(0, 5);
    //for( dv in 1:NvarsY){
    //    to_vector(Beta[dv, 1:NvarsX]) ~ normal(0, 2);

    // Likelihood
    for (i in 1:N){
        y[i, 1:NvarsY] ~ multi_normal_cholesky(mu[i, 1:NvarsY], L_Sigma);

generated quantities{
    matrix[NvarsY, NvarsY] Omega;
    matrix[NvarsY, NvarsY] Sigma;
    Omega = multiply_lower_tri_self_transpose(L_Omega);
    Sigma = quad_form_diag(L_Omega, L_sigma);

You can replace the likelihood loop with

y ~ multi_normal_cholesky(mu, L_Sigma);

which should cut down on the automatic differentiation in Stan because L_sigma is shared by each step in the loop.

You might be able to replace some of the other loops with matrix algebra in the transformed parameters block but than should not lead to a noticeable speed up because loops are quite efficient in Stan and those loops do not affect the automatic differentiation. Same for the priors on Zbeta, I think.

1 Like

Thanks! Changing out the likelihood loop makes a sizeable difference in small models at least. I’ll try a big model later.

I tried some matrix algebra manipulation yesterday but it ended up slower. The thing is I know in the code above that there are redundant calculations because of the indiv[i] indexing in the dot product statement. I’m still tinkering with it, however perhaps others are good at this kind of thing ? My only experience with linear algebra is in R so I didn’t think thats useful here.

Hmm, how sure are you on this? I tried it and failed. The product is a function of i and dv. I think you need both of those loops. I don’t think you reuse anything.

Did you re-try the centered parameterization of this after working out the issues in the last thread? It isn’t a given that non-centered works better.

Here’s a discussion of someone doing the map_rect stuff that probably has something you could use: Linear, parallell regression

Try to make the return value of your map_rect function just be a single number (like something to be incremented in target()). If the function returns a lot of things, that’ll mess up the map_rect efficiency.

1 Like

Ha. I was sure but now your are prompting me to rethink it and I think you are right. 🤦‍♂️

Yup. It had divergence issues galore.

I was quite active on that thread trying to make this same thing work but I had to quit as it was too hard. I came back to it in the last couple of weeks with better basics and fared a bit better but still could not quite crack it. I’ve more or less given up for now, its just too hard. I’ll have to accept slow models for now.

Thanks anyhow @bbbales2 - I do appreciate your answers!

1 Like

Haaaahaha, man, -1 for my reading comprehension.