Hi forum,
I am struggling in estimating parameters using Bayesian Hierarchical model over STAN.
My data:
- I scanned 20 subjects in fMRI. Each subject had 6 conditions, for each of them there were 50-60 trials [not all trials were valid, thus, the number of trials is not fixed per condition per subject].
- Out of the whole brain scan, I selected an ROI (region of interest) consisting of 50-70 voxels per subject. The selection is independent of the current fMRI measurement. The ROIs are set for each subject in his own functional space, thus, the coordinates of the ROI voxels per subject differ.
- Using SPM linear modeling, I extracted beta estimators that represent the BOLD signal for each single trial, for each voxel, for each condition and for each subject. Altogether, there are about 500,000 such beta estimators that serve as my data.
What I try to estimate:
I attempt to estimate the contrasts between some of the 6 conditions.
Structure of the model:
- Each subject has normal posterior distribution for each condition of means (muCS[i,j]) and SD (sigCS[I,j])
- The means and SD for each condition and each subject are derived from higher hierarchy normal and gamma distributions for each condition: (muC[j], sigMuC[j])
- Since the voxels of ROIs are close to each other, gaussian process dependency is assumed between the voxels with rho[i], alpha[i] and sigma[i] distribution per subject. Important note: the dimensions of the K matrix of the gaussian process are different per subject since the amount of voxels per subject varies.
Running the model:
The model runs without pre-sampling errors, but experienced divergencies for each sample!!
I changed the adapt_delta step to 0.99 to no avail.
The (untrusted) results of the model are consistent with my frequentist results in which I averaged the beta estimations over trials, conditions, voxels and subjects: thus, in general, I could expect the model to generate relevant posterior distributions if I could avoid the divergencies along the sampling.
For debugging purposes, I ran a downgraded model in which I neglected the dependency among the voxels (which is wrong of course): this model was successfully sampled and generated reasonable posterior results of the condition contrasts.
My conclusion so far is that there is a problem with the model that caused the divergencies in each sample and it is possibly related to the gaussian processes implementation.
I wonder if there is a way to decentralize the model implementation, which is the STAN experts recommendation in case of divergent samplings: I could not figure the way to do that while keeping the gaussian process dependency in place. Any ideas?
I would very much appreciate any useful advice!
Below see the model to which I add couple of the diagMCMC figures that demonstrate the problematic divergent sampling.
Thanks a lot!
The stan model:
StanModel = " //model in which dependency between voxels is considered
data {
int<lower=1> Ntot ; //total number of accumulated trials
int<lower=1> Ncond ; //number of conditions: 6
int<lower=1> Nsubj ; // number of subjects: 20
int<lower=1> MaxVox ; //largest number of voxels per subject (71)
int<lower=1> Nvox_subj[Nsubj]; //number of voxels per subject
matrix<lower=0>[Nsubj*3,MaxVox+2] x; // matrix of the 3 dimensional coordinates of the voxels per each subject
matrix [Ntot,MaxVox+3] z;//the beta estimations measurements of the BOLD signal: Number of total trials (col) X voxels per each subject (rows)
int Ntrials[Nsubj*Ncond]; //The total number of trials per each subject per condition
int AccTrials[Nsubj*Ncond+1]; //The accumulated number of Ntrials
real sdy ; //The SD of all measurements
real<lower=0> agamma[2,1]; // estimated parameters for gamma distribution priors, based on sdy.
}
parameters {
vector<lower=0.01>[Nsubj] rho;
vector<lower=0.01>[Nsubj] alpha;
vector<lower=0.01>[Nsubj] sigma;
matrix[Nsubj,Ncond] muCS;
matrix<lower=0.01>[Nsubj,Ncond] sigCS;
vector[Ncond] muC;
vector<lower=0.01>[Ncond] sigMuC;
}
model {
for (i in 1:Nsubj){
matrix[Nvox_subj[i], Nvox_subj[i]] K;
matrix[Nvox_subj[i], Nvox_subj[i]] L_K;
vector[3] xsub[Nvox_subj[i]];
for (u in 1:Nvox_subj[i]){
for (v in 1:3){
xsub[u,v]=x[(i-1)*3+v,u+2]; // coordinates 3Xdimensional vector of the current subject
}}
K= cov_exp_quad(xsub, alpha[i], rho[i]);
for (b in 1:Nvox_subj[i]){
K[b, b] = K[b, b] + square(sigma[i]);}//adding diagonal elements
L_K = cholesky_decompose(K);
rho[i] ~ inv_gamma(11,9); // parameters to allow wide range of rho distributions
alpha[i] ~ lognormal(1.8,0.22); // parameters to allow wide range of alpha distributions; Attempting to use normal distribution triggered initialization error due to negative logarithm functions in sampling
sigma[i] ~ uniform(sdy/100,sdy*20); // parameters to allow wide range of sigma distributions
for (j in 1:Ncond){
muCS[i,j] ~ normal(muC[j],sigMuC[j]);
sigCS[i,j] ~ uniform(sdy/100,sdy*20);
for (w in 1:Ntrials[j+(i-1)*Ncond]){
vector[Nvox_subj[i]] y=z[w+AccTrials[j+(i-1)*Ncond],4:(Nvox_subj[i]+3)]’;
vector[Nvox_subj[i]] mu_vector;
mu_vector = rep_vector(muCS[i,j], Nvox_subj[i]);
y ~ multi_normal_cholesky(mu_vector, L_K);}
}} //The gaussian process is estimated per each row of beta estimators that corresponds to a single trial of one subject along its respective voxels
for (j in 1:Ncond){
muC[j] ~ normal(0,10*sdy);
sigMuC[j] ~ gamma(agamma[1],agamma[2]);}
}"
dataList=list(Ntot=Ntot,Ncond=Ncond,Nsubj=Nsubj,MaxVox=MaxVox,z=z,x=x,Nvox_subj=Nvox_subj,Ntrials=Ntrials,AccTrials=AccTrials,sdy=sdy,agamma=agamma)
stanDso = stan_model( model_code=modelString )
stanFit = sampling( object=stanDso , data=dataList , chains=3 , iter=2000 , warmup=1000 , refresh = 20, thin=1 , control = list(adapt_delta = 0.99,max_treedepth = 15))
Few diagMCMC results: