Thanks @spinkney ! I wrote the the first second parameterization in that way to emphasize that I use a multi-normal distribution. What I really use in my code is what I believe a non-centered parameterization of the multi-normal prior:
transformed data {
matrix[2,2] VCoV = rep_matrix(-0.5,2,2); // [1,-0.5;
VCoV = add_diag(1.5,VCoV); // -0.5,1]
matrix[2,2] L = cholesky_decompose(VCoV);
}
parameters {
vector[2] beta_raw;
}
transformed parameters {
vector[3] beta;
beta[1:2] = L*beta_raw;
beta[3] = 0.0-sum(beta[1:2]);
}
model {
beta_raw ~ normal(0.0,1.0);
}
This parameterization should avoid the call of multi-normal density functions.
Another related modeling issue that troubles me is that I would like to model the residual correlation among read count of cell groups. Because standard discrete distributions in Stan do not have a correlation structure, I introduced latent sum-to-zero residuals on the centered log-ratio transformed composition as parameters, and then connect the composition to read counts via the multinomial distribution (instead of solving the compound distribution). Intuitively, the model should be something like:
data {
array[3] int y; // I keep only one observation here for simplicity
}
parameters {
cholesky_factor_corr[3] L;
vector[3] alpha;
sum_to_zero_vector[3] beta;
sum_to_zero_vector[3] intermediate_u; // the latent residuals
}
transformed parameters {
vector[3] sigma = exp(alpha);
vector[3] pi = softmax(beta+intermediate_u);
}
model {
// Hyper-priors
L ~ lkj_corr_cholesky(2);
alpha ~ normal(0.0,1.0);
// Priors
beta ~ normal(0.0,1.0);
intermediate_u ~ multinormal_cholesky(rep_vector(0.0,3),diag_pre_multiply(sigma,L));
y ~ multinomial(pi);
}
Of course such a model yield severe divergent transitions across chains on my actual example data (not shown here), and I identified that is because I model the correlation explicitly as a parameter and the ‘correlation matrix’ to which L correspond is a singular one (being semi positive definite instead of positive definite, having rank N-1) due to the sum-to-zero constraint on the residuals.
A more numerically stable version of the model would reduce the dimension of alpha and L to 2 and use an explicit modeling the sum-to-zero constraint similar to what I have posted above.
data {
array[3] int y; // I keep only one observation here for simplicity
}
parameters {
cholesky_factor_corr[2] L;
vector[2] alpha;
sum_to_zero_vector[3] beta;
vector[2] intermediate_u_raw; // the latent residuals
}
transformed parameters {
vector[2] sigma = exp(alpha);
vector[3] intermediate_u;
intermediate_u[1:2] = diag_pre_multiply(sigma,L)*intermediate_u_raw;
intermediate_u[3] = 0.0 - sum(intermediate_u[1:2]);
vector[3] pi = softmax(beta+intermediate_u);
}
model {
// Hyper-priors
L ~ lkj_corr_cholesky(2);
alpha ~ normal(0.0,1.0);
// Priors
beta ~ normal(0.0,1.0);
intermediate_u_raw ~ normal(0.0,1.0);
y ~ multinomial(pi);
}
generated quantities {
real sigma_3 = calculate_the_singular_VCoV_and_return_constrained_sigma(sigma,L);
}
But by doing this, I seem to lose my ability to directly impose a prior on the (log-transformed) standard deviation of the 3rd latent residual, and I would like to keep it. Additionally, I did observe inflated posterior variance on sigma_3, and thus the ordering of the input data would matter.
To summarize, I have a vector of alpha that has a constraint much more complicated than the sum-to-zero one, and I would like to impose a prior (hopefully normal) on the full vector. Is there any way to go around it?