All sampling are diverted in a model that employ gaussian processes modeling

Hi forum,
I am struggling in estimating parameters using Bayesian Hierarchical model over STAN.
My data:

  1. I scanned 20 subjects in fMRI. Each subject had 6 conditions, for each of them there were 50-60 trials [not all trials were valid, thus, the number of trials is not fixed per condition per subject].
  2. Out of the whole brain scan, I selected an ROI (region of interest) consisting of 50-70 voxels per subject. The selection is independent of the current fMRI measurement. The ROIs are set for each subject in his own functional space, thus, the coordinates of the ROI voxels per subject differ.
  3. Using SPM linear modeling, I extracted beta estimators that represent the BOLD signal for each single trial, for each voxel, for each condition and for each subject. Altogether, there are about 500,000 such beta estimators that serve as my data.

What I try to estimate:
I attempt to estimate the contrasts between some of the 6 conditions.
Structure of the model:

  1. Each subject has normal posterior distribution for each condition of means (muCS[i,j]) and SD (sigCS[I,j])
  2. The means and SD for each condition and each subject are derived from higher hierarchy normal and gamma distributions for each condition: (muC[j], sigMuC[j])
  3. Since the voxels of ROIs are close to each other, gaussian process dependency is assumed between the voxels with rho[i], alpha[i] and sigma[i] distribution per subject. Important note: the dimensions of the K matrix of the gaussian process are different per subject since the amount of voxels per subject varies.
    Running the model:
    The model runs without pre-sampling errors, but experienced divergencies for each sample!!
    I changed the adapt_delta step to 0.99 to no avail.

The (untrusted) results of the model are consistent with my frequentist results in which I averaged the beta estimations over trials, conditions, voxels and subjects: thus, in general, I could expect the model to generate relevant posterior distributions if I could avoid the divergencies along the sampling.

For debugging purposes, I ran a downgraded model in which I neglected the dependency among the voxels (which is wrong of course): this model was successfully sampled and generated reasonable posterior results of the condition contrasts.

My conclusion so far is that there is a problem with the model that caused the divergencies in each sample and it is possibly related to the gaussian processes implementation.
I wonder if there is a way to decentralize the model implementation, which is the STAN experts recommendation in case of divergent samplings: I could not figure the way to do that while keeping the gaussian process dependency in place. Any ideas?

I would very much appreciate any useful advice!

Below see the model to which I add couple of the diagMCMC figures that demonstrate the problematic divergent sampling.

Thanks a lot!

The stan model:

StanModel = " //model in which dependency between voxels is considered

data {

int<lower=1> Ntot ; //total number of accumulated trials

int<lower=1> Ncond ; //number of conditions: 6

int<lower=1> Nsubj ; // number of subjects: 20

int<lower=1> MaxVox ; //largest number of voxels per subject (71)

int<lower=1> Nvox_subj[Nsubj]; //number of voxels per subject

matrix<lower=0>[Nsubj*3,MaxVox+2] x; // matrix of the 3 dimensional coordinates of the voxels per each subject

matrix [Ntot,MaxVox+3] z;//the beta estimations measurements of the BOLD signal: Number of total trials (col) X voxels per each subject (rows)

int Ntrials[Nsubj*Ncond]; //The total number of trials per each subject per condition

int AccTrials[Nsubj*Ncond+1]; //The accumulated number of Ntrials

real sdy ; //The SD of all measurements

real<lower=0> agamma[2,1]; // estimated parameters for gamma distribution priors, based on sdy.

}

parameters {

vector<lower=0.01>[Nsubj] rho;

vector<lower=0.01>[Nsubj] alpha;

vector<lower=0.01>[Nsubj] sigma;

matrix[Nsubj,Ncond] muCS;

matrix<lower=0.01>[Nsubj,Ncond] sigCS;

vector[Ncond] muC;

vector<lower=0.01>[Ncond] sigMuC;

}

model {

for (i in 1:Nsubj){

matrix[Nvox_subj[i], Nvox_subj[i]] K;

matrix[Nvox_subj[i], Nvox_subj[i]] L_K;

vector[3] xsub[Nvox_subj[i]];

for (u in 1:Nvox_subj[i]){

for (v in 1:3){

xsub[u,v]=x[(i-1)*3+v,u+2]; // coordinates 3Xdimensional vector of the current subject

}}

K= cov_exp_quad(xsub, alpha[i], rho[i]);

for (b in 1:Nvox_subj[i]){

K[b, b] = K[b, b] + square(sigma[i]);}//adding diagonal elements

L_K = cholesky_decompose(K);

rho[i] ~ inv_gamma(11,9); // parameters to allow wide range of rho distributions

alpha[i] ~ lognormal(1.8,0.22); // parameters to allow wide range of alpha distributions; Attempting to use normal distribution triggered initialization error due to negative logarithm functions in sampling

sigma[i] ~ uniform(sdy/100,sdy*20); // parameters to allow wide range of sigma distributions

for (j in 1:Ncond){

muCS[i,j] ~ normal(muC[j],sigMuC[j]);

sigCS[i,j] ~ uniform(sdy/100,sdy*20);

for (w in 1:Ntrials[j+(i-1)*Ncond]){

vector[Nvox_subj[i]] y=z[w+AccTrials[j+(i-1)*Ncond],4:(Nvox_subj[i]+3)]’;

vector[Nvox_subj[i]] mu_vector;

mu_vector = rep_vector(muCS[i,j], Nvox_subj[i]);

y ~ multi_normal_cholesky(mu_vector, L_K);}

}} //The gaussian process is estimated per each row of beta estimators that corresponds to a single trial of one subject along its respective voxels

for (j in 1:Ncond){

muC[j] ~ normal(0,10*sdy);

sigMuC[j] ~ gamma(agamma[1],agamma[2]);}

}"

dataList=list(Ntot=Ntot,Ncond=Ncond,Nsubj=Nsubj,MaxVox=MaxVox,z=z,x=x,Nvox_subj=Nvox_subj,Ntrials=Ntrials,AccTrials=AccTrials,sdy=sdy,agamma=agamma)

stanDso = stan_model( model_code=modelString )

stanFit = sampling( object=stanDso , data=dataList , chains=3 , iter=2000 , warmup=1000 , refresh = 20, thin=1 , control = list(adapt_delta = 0.99,max_treedepth = 15))

Few diagMCMC results:
image
image

Here’s how to noncenter muCS.

parameters {
  ...
  matrix[Nsubj,Ncond] muCS_raw;
}
model {
  for (i in 1:Nsubj){
    ...
    for (j in 1:Ncond) {
      real muCS = muC[j] + sigMuC[j]*muCS_raw[i,j];
      muCS_raw[i,j] ~ std_normal();
      ...
    }
  }  
}

A big problem is that the bounds for sigCS are inconsistent with the distribution.
If you’re using a uniform distribution the declaration should be

matrix<lower=sdy/100,upper=sdy*20>[Nsubj,Ncond] sigCS;

That is, the bounds in the declaration and the sampling statement must match.

Also, I see that sigCS is not actually used anywhere in your model. Best to just omit it entirely; it has uniform posterior anyway. Or did you mean to write something like this?

muCS[i,j] ~ normal(muC[j], sigCS[i,j]);
sigCS[i,j] ~ gamma(5, 5/sigMuC[j]);

That may cause more divergencies. In that case computing it as

real sigCS = sigCS_raw[i,j]*sigMuC[j];
sigCS_raw[i,j] ~ gamma(5,5)

might help, I don’t know.

Thanks a lot!
That was very helpful: I modified the model per your advice and corrected the range definitions of the sigma variable. Actually, I was not using sigCS - I forgot to delete it, but there was another variable: sigma, whose range did not fit the uniform distribution thresholds. [This is my first STAN model…].

I ran a reduced data model with just one chain and low number of samplings in order to test the modifications and there were no divergent samples any more :). The program condensed nicely, even though ESS was too low, but I assume it is due to the low amount of samplings.
I will, next, gradually run the program over the whole data-set with 3 chains and sufficient number of samplings. I will report if there will be other though hurdles.

Again: many thanks for the useful insights!