No wait, I am afraid I was not clear, I am using QR on the data matrix and applying Cholesky on the coefficient of my likelihood. I am posting the entire model to be as clear as possible.
This is a sketch of the model (I am sorry I omitted the likelihood before, I should have been clearer), so I am applying QR on X and Cholesky on the var-covar of \beta. But because of the QR, I need to put a prior on \theta=R^*\beta, hence my question on Cholesky on \theta.
y \sim N(X \beta, v) \\
\beta \sim \mathcal{MN} ( \mu, \Sigma_l \Sigma_l')
Here’s the code:
data {
int<lower=0> N; // num individuals
int<lower=1> K; // num ind predictors
int<lower=1> J; // num groups
int<lower=1> L; // num group predictors
int<lower=1,upper=J> g[N]; // group for individual
matrix[N, K] x; // individual predictors
matrix[J,L] u; // group predictors
vector[N] y; // outcomes
vector[N] w; // treatment assigned
}
transformed data{
matrix[L,J] u_transp;
matrix[N, K] Q_ast;
matrix[K, K] R_ast;
// thin and scale the QR decomposition
Q_ast = qr_thin_Q(x) * sqrt(N - 1);
R_ast = qr_thin_R(x) / sqrt(N - 1);
// group predictors transposed
u_transp = u';
}
parameters {
matrix[K,J] z; // this will be the basis for the beta's
// transformation
vector[J] z_2;
matrix[K,L] gamma; // school predictors coefficients
vector<lower=0,upper=pi()/2>[K] tau_unif; // heterskedasticity component
// for the coefficients
real<lower=0,upper=pi()/2> sigma_alpha_unif; // SE for the constant
vector[L] eta;
real effect; // super-population average treatment effect
real<lower=0,upper=pi()/2> sigma_t_unif; // residual SD for the treated
real<lower=0,upper=pi()/2> sigma_c_unif; // residual SD for the control
cholesky_factor_corr[K] L_Omega; // corr matrix for the slopes
}
transformed parameters{
vector[J] alpha; // vector of school-specific intercepts
vector[J] c; // intercepts' expected value
// All SE's have half-Cauchy priors, but we sample them from a Uniform and
// then take the tangent as it is computationally more efficient, note that
// if X is Uniform, then Y is a std half-Cauchy
vector<lower=0>[K] tau = tan(tau_unif);
real<lower=0> sigma_alpha = tan(sigma_alpha_unif);
real<lower=0> sigma_t = tan(sigma_t_unif);
real<lower=0> sigma_c = tan(sigma_c_unif);
// School specific slopes, since it is the transformation of a
// multivariate normal, we do not need to give a prior to beta, but rather
// to z and to the variance component (tau) and the correlation component
// of the Cholensky decomposition (L_Omega). On top of that, we utilize the
// QR-reparametrized theta = R_ast * beta
matrix[K,J] theta; // vector of school-specific coefficients
matrix[K,K] L_Sigma;
// Here we define beta, not that since it is the transformation of a
// multivariate normal, we do not need to give a prior to beta, but rather
// to z and to the variance component (tau) and the correlation component
// of the Cholensky decomposition (L_Omega)
// This is theta's variance-covariance matrix
L_Sigma = cholesky_decompose(R_ast*diag_pre_multiply(tau, L_Omega)*diag_pre_multiply(tau, L_Omega)'*R_ast');
theta = R_ast * (gamma * u_transp) + L_Sigma * z;
// intercepts' expected value
for (j in 1:J) {
c[j] = eta' * u_transp[ ,j];
}
// School specific intercepts
alpha = c + sigma_alpha * z_2;
}
model {
vector[N] m;
vector[N] d;
vector[N] uno;
tau_unif ~ uniform(0,pi()/2);
L_Omega ~ lkj_corr_cholesky(2);
to_vector(z) ~ std_normal();
to_vector(gamma) ~ std_normal();
z_2 ~ std_normal();
effect ~ normal(0,2);
sigma_c_unif ~ uniform(0,pi()/2);
sigma_t_unif ~ uniform(0,pi()/2);
sigma_alpha_unif ~ uniform(0,pi()/2);
// Here we model the likelihood of our outcome variable y
// Its expected value
for (n in 1:N) {
m[n] = alpha[g[n]] + Q_ast[n, ] * theta[, g[n]] + effect*w[n];
}
// Its variance
uno = rep_vector(1, N);
d = sigma_t*w + sigma_c*(uno-w);
y ~ normal(m, d);
}
As you can see I have individual level variables that I gather in a matrix X to which I apply the QR.
Then I give a Normal likelihood “regression style”. But now the coefficient of the regression is not \beta anymore, but the QR transformed \theta, hence the dilemma on how to do Cholesky on \theta = R^* \beta .
I hope I was clear, otherwise let me know. By the way, the model gives reliable results so I think the procedure is correct. (interesting note: I get slightly lower ESS with QR + Cholesky)
Also does anybody know if one must partial out the constant before applying the QR?