Hi,
Following your recommendations I changed the uniform priors that were placed on variance parameters to cauchy. However, this did not change sampling duration or the results. I still have uniform priors on the group’s Mu’s though, if you think it’s worth changing as well. You also suggested hierarchical priors for the subject-level covariance matrices - these are included in the model, and I added comments where they appear. As I said in a previous post, I’m mostly interested in the subject-level parameters of the covariance matrix, so I don’t want to switch to a one covariance matrix for all subjects (the purpose of the model is to test the influence of external feedback on self-ratings of a speech performance); but perhaps this is why it takes it a long time (the duration I reported was for a sampling of 4 chains with 10,000 iterations each)? I’m attaching the most simple model I have below, and afterwards my understanding of how to implement the k-fold based on https://datascienceplus.com/k-fold-cross-validation-in-stan/ :
data{
int <lower=1> Nsubj ; // number of participants=50
int <lower=1> Ntotal ; // number of questions=40 per subj*50 subj=2000
int <lower=1> nYlevels ; // number of responses on a rating scale (0-10 scale, so 11 responses)
int <lower=1> s[Ntotal] ; // index of subject at a given question
int <lower=1> k ; // num. of predictors=6(all y's)
int yPreA[Ntotal] ; // rating #1-before 1st speech
int yPreFb[Ntotal] ; // rating #2-after 1st speech
int yJud[Ntotal] ; // judges' feedback on 1st speech
int yPost[Ntotal] ; // rating #3-after feedback
int yPreB[Ntotal] ; // rating #4-upcmoing 2nd speech
int yRec[Ntotal] ; // memory of feedback
}
parameters {
real <lower=0> mu_tau ;
real <lower=0> sigma_tau ;
real <lower=0> omega_corr_pos ;
vector<lower=0>[k] tau[Nsubj] ; // for subj. cov. mtx.
cholesky_factor_corr[k] Omega_pos[Nsubj] ; // for subj. cov. mtx.
vector[Nsubj] muPreASub ; // mu for yPreA
real <lower=1,upper=nYlevels> muPreAGroup ;
real <lower=0> sigma2PreAGroup ;
vector[Nsubj] muPreFbSub ; // mu for yPreFb
real <lower=1,upper=nYlevels> muPreFbGroup ;
real <lower=0> sigma2PreFbGroup ;
vector[Nsubj] muJudPosSub ; // mu for yJud
real <lower=1,upper=nYlevels> muJudPosGroup ;
real <lower=0> sigma2JudPosGroup ;
vector[Nsubj] muPostSub ; // mu for yPost
real <lower=1,upper=nYlevels> muPostGroup ;
real <lower=0> sigma2PostGroup ;
vector[Nsubj] muPreBSub ; // mu for yPreB
real <lower=1,upper=nYlevels> muPreBGroup ;
real <lower=0> sigma2PreBGroup ;
vector[Nsubj] muJudRecSub ; // mu for yRec
real <lower=1,upper=nYlevels> muJudRecGroup ;
real <lower=0> sigma2JudRecGroup ;
}
transformed parameters{
matrix[k, k] quad_pos[Nsubj];
for (i in 1:Nsubj) {
quad_pos[i,,] = diag_pre_multiply(tau[i,], Omega_pos[i,,]);
}
}
model {
// hierarchical priors for subj. covariance matrix:
mu_tau ~ cauchy(0, 2.5);
sigma_tau ~ cauchy(0, 5);
omega_corr_pos ~ cauchy(0, 5);
// subj. covariance matrix params:
for (i in 1:Nsubj) {
tau[i,] ~ gamma((mu_tau/sigma_tau)^2, mu_tau/(sigma_tau^2));
Omega_pos[i,,] ~ lkj_corr_cholesky(omega_corr_pos);
}
// all y's params:
sigma2PreAGroup ~ cauchy(0, 5);
muPreAGroup ~ uniform(1 , nYlevels);
muPreASub ~ normal(muPreAGroup, sqrt(sigma2PreAGroup)) ;
sigma2PreFbGroup ~ cauchy(0, 5);
muPreFbGroup ~ uniform(1 , nYlevels);
muPreFbSub ~ normal(muPreFbGroup, sqrt(sigma2PreFbGroup)) ;
sigma2JudPosGroup ~ cauchy(0, 5);
muJudPosGroup ~ uniform(1 , nYlevels);
muJudPosSub ~ normal(muJudPosGroup, sqrt(sigma2JudPosGroup)) ;
sigma2PostGroup ~ cauchy(0, 5);
muPostGroup ~ uniform(1 , nYlevels);
muPostSub ~ normal(muPostGroup, sqrt(sigma2PostGroup)) ;
sigma2PreBGroup ~ cauchy(0, 5);
muPreBGroup ~ uniform(1 , nYlevels);
muPreBSub ~ normal(muPreBGroup, sqrt(sigma2PreBGroup)) ;
sigma2JudRecGroup ~ cauchy(0, 5);
muJudRecGroup ~ uniform(1 , nYlevels);
muJudRecSub ~ normal(muJudRecGroup, sqrt(sigma2JudRecGroup)) ;
for (i in 1:Ntotal) {
[yPreA[i], yPreFb[i], yJud[i], yPost[i], yPreB[i], yRec[i]] ~ multi_normal_cholesky([muPreASub[s[i]],
muPreFbSub[s[i]], muJudPosSub[s[i]], muPostSub[s[i]], muPreBSub[s[i]], muJudRecSub[s[i]]], quad_pos[s[i],,]);
}
}
generated quantities {
vector[Ntotal] log_lik;
for (n in 1:Ntotal) {
log_lik[n] = multi_normal_cholesky_lpdf([yPreA[n], yPreFb[n], yJud[n], yPost[n], yPreB[n], yRec[n]] |
[muPreASub[s[n]], muPreFbSub[s[n]], muJudPosSub[s[n]], muPostSub[s[n]], muPreBSub[s[n]], muJudRecSub[s[n]]],
quad_pos[s[n],,]);
}
}
And possible implementation of k-fold:
data{
....
int<lower=0,upper=1> holdout[Ntotal];
//index whether the observation should be held out (1) or used (0)
}
....
model {
// params....
// we leave this:
for (i in 1:Ntotal) {
[yPreA[i], yPreFb[i], yJud[i], yPost[i], yPreB[i], yRec[i]] ~ multi_normal_cholesky([muPreASub[s[i]],
muPreFbSub[s[i]], muJudPosSub[s[i]], muPostSub[s[i]], muPreBSub[s[i]], muJudRecSub[s[i]]], quad_pos[s[i],,]);
}
//likelihood holding out some data
for(n in 1:Ntotal){
if(holdout[n] == 0){
target += multi_normal_cholesky_lpdf([yPreA[n], yPreFb[n], yJud[n], yPost[n], yPreB[n], yRec[n]] |
[muPreASub[s[n]], muPreFbSub[s[n]], muJudPosSub[s[n]], muPostSub[s[n]], muPreBSub[s[n]], muJudRecSub[s[n]]],
quad_pos[s[n],,]);
}
}
}
generated quantities {
vector[Ntotal] log_lik;
for (n in 1:Ntotal) {
log_lik[n] = multi_normal_cholesky_lpdf([yPreA[n], yPreFb[n], yJud[n], yPost[n], yPreB[n], yRec[n]] |
[muPreASub[s[n]], muPreFbSub[s[n]], muJudPosSub[s[n]], muPostSub[s[n]], muPreBSub[s[n]], muJudRecSub[s[n]]],
quad_pos[s[n],,]);
}
}
Does the k-fold seem right? and any more thoughts on the model?
Thanks for you advice,
Ofir
EDIT: escaped code and added syntax highlighting.