(Another) Slow Hierarchical Model in Stan

Thank you @bgoodri for the quick suggestion. I have looked into QR reparameterization, but would appreciate if someone could double-check my code (below) since I am unsure of how to implement the reverse transformation from beta_tilde to beta in the transformed parameters block with a beta matrix.

I saw here
https://groups.google.com/forum/#!category-topic/stan-users/Ozu7kwnyS_Y
that @betanalpha suggested code the works for a vector beta but I am unsure about applying this to a matrix beta. I wrote the code below as my attempt. I suspect version 2 would be faster but am unsure if it is mathematically correct.

// Reverse the transform to get original beta

// Version 1
for (i in 1:nID){
beta[i,] = mdivide_right_tri_low(beta_tilde[i,], R');} // indiv coeffs by group

// Version 2
beta = mdivide_right_tri_low(beta_tilde, R'); // indiv coeffs by group

Full model code:

QRNonCenteredStanCode <- "

data {
// Number of observations and variables
int<lower=1> N;    // number of observations, in-sample
int<lower=1> K;    // number of variables in the model

// Identifiers
int<lower=1> nID;                    // number of unique IDs (unique state, channel, product)
int<lower=1, upper=nID> id[N];       // group ID variable for each observation

// (Individual-level) input data, QR decomposition
matrix[N, K] Q;       // inputting the in-sample data matrix
matrix[K, K] R;
matrix[K, K] R_inv;

// (Group-level) input data
matrix[nID, 1] u;  // a vestige of the manual's programming

// Outcomes
vector[N] loutcome; // outcome for each observation

}

parameters {

// Disturbances (standard normal) to create non-centered parameterization
matrix[K, nID] z;

// Cholesky decomposition of Omega (the correlation matrix)
cholesky_factor_corr[K] L_Omega;

// The scale factor
vector<lower=0,upper=pi()/2>[K] tau_unif;

// Group coefficients, gamma (unvarying for this model)
matrix[1, K] gamma;

real alpha;

// Prediction error scale (standard deviation), for final outcome, loutcome
real<lower=0> sigma;


}

transformed parameters {

// Beta, the original non-QR form
matrix[nID, K] beta;

// Beta tilde, the coefficients as modeled
matrix[nID, K] beta_tilde;

// The prior scale, tau
vector<lower=0>[K] tau; // prior scale

// Input values for tau based on the parameter tau_unif
for (k in 1:K) {tau[k] = 2.5 * tan(tau_unif[k]);}

// Create the beta matrix (vectorized form)
beta_tilde = u * gamma + (diag_pre_multiply(tau, L_Omega) * z)';

// Reverse the transform to get original beta
beta = mdivide_right_tri_low(beta_tilde, R'); // indiv coeffs by group

}

model {
// These are the priors
to_vector(z) ~ normal(0, 1);
L_Omega ~ lkj_corr_cholesky(2);
to_vector(gamma) ~ normal(0, 0.1);
to_vector(beta[nID,]) ~ normal(0, .1);

// This is the sampling statement, vectorized for efficiency
target += normal_lpdf(loutcome | (rows_dot_product(beta[id] , Q) + alpha), sigma);

// Note that this implies that XB is the logged (additive) value
// thus the outcome MUST be the logged version of quotes/binds

// Note that sigma is the standard deviation, not the variance

}
"