Thank you @bgoodri for the quick suggestion. I have looked into QR reparameterization, but would appreciate if someone could double-check my code (below) since I am unsure of how to implement the reverse transformation from beta_tilde to beta in the transformed parameters block with a beta matrix.
I saw here
https://groups.google.com/forum/#!category-topic/stan-users/Ozu7kwnyS_Y
that @betanalpha suggested code the works for a vector beta but I am unsure about applying this to a matrix beta. I wrote the code below as my attempt. I suspect version 2 would be faster but am unsure if it is mathematically correct.
// Reverse the transform to get original beta
// Version 1
for (i in 1:nID){
beta[i,] = mdivide_right_tri_low(beta_tilde[i,], R');} // indiv coeffs by group
// Version 2
beta = mdivide_right_tri_low(beta_tilde, R'); // indiv coeffs by group
Full model code:
QRNonCenteredStanCode <- "
data {
// Number of observations and variables
int<lower=1> N; // number of observations, in-sample
int<lower=1> K; // number of variables in the model
// Identifiers
int<lower=1> nID; // number of unique IDs (unique state, channel, product)
int<lower=1, upper=nID> id[N]; // group ID variable for each observation
// (Individual-level) input data, QR decomposition
matrix[N, K] Q; // inputting the in-sample data matrix
matrix[K, K] R;
matrix[K, K] R_inv;
// (Group-level) input data
matrix[nID, 1] u; // a vestige of the manual's programming
// Outcomes
vector[N] loutcome; // outcome for each observation
}
parameters {
// Disturbances (standard normal) to create non-centered parameterization
matrix[K, nID] z;
// Cholesky decomposition of Omega (the correlation matrix)
cholesky_factor_corr[K] L_Omega;
// The scale factor
vector<lower=0,upper=pi()/2>[K] tau_unif;
// Group coefficients, gamma (unvarying for this model)
matrix[1, K] gamma;
real alpha;
// Prediction error scale (standard deviation), for final outcome, loutcome
real<lower=0> sigma;
}
transformed parameters {
// Beta, the original non-QR form
matrix[nID, K] beta;
// Beta tilde, the coefficients as modeled
matrix[nID, K] beta_tilde;
// The prior scale, tau
vector<lower=0>[K] tau; // prior scale
// Input values for tau based on the parameter tau_unif
for (k in 1:K) {tau[k] = 2.5 * tan(tau_unif[k]);}
// Create the beta matrix (vectorized form)
beta_tilde = u * gamma + (diag_pre_multiply(tau, L_Omega) * z)';
// Reverse the transform to get original beta
beta = mdivide_right_tri_low(beta_tilde, R'); // indiv coeffs by group
}
model {
// These are the priors
to_vector(z) ~ normal(0, 1);
L_Omega ~ lkj_corr_cholesky(2);
to_vector(gamma) ~ normal(0, 0.1);
to_vector(beta[nID,]) ~ normal(0, .1);
// This is the sampling statement, vectorized for efficiency
target += normal_lpdf(loutcome | (rows_dot_product(beta[id] , Q) + alpha), sigma);
// Note that this implies that XB is the logged (additive) value
// thus the outcome MUST be the logged version of quotes/binds
// Note that sigma is the standard deviation, not the variance
}
"