Bayesian Hierarchical Two-Stage Process Model with Nested Conditions and Ragged Data Structure (Solved)

Hey all,

The aim of this thread is to consolidate all of my previous threads about my code, and to provide the finished, tested, code so that others can benefit from all of the help I received. The program I’m going to share is a Bayesian hierarchical version of a two-stage process model of metacognition. The model works with data from psychophysical experiments, and experiments that generate similar data, where a decision is made and a confidence report is given.

The model had to be built to accomodate different metacognitive contexts (termed conditions) and task difficulties (termed contrast levels) in such a way that conditions could cut across contrast levels and vice versa. Because not all conditions contain all contrast levels, low confidence variability subjects are not included in the data set, data sets are of different sizes depending on the combination of condition and contrast, and there are lapse trials within some subjects’ data, the data set is very ragged and had to be extensively padded. Further, this model ran into significant performance issues where the expected run time for the full data set was, initially, months! These were the problems facing the model, addressed across the following threads:

Note that this code was written for cmdstan-2.30.1, which is the version of Stan available on the cluster our lab has access to. Data takes the form of a response matrix with trial-wise response rows of the form [0 0 0 0] (for two confidence levels), and the trial-matched orientations (or other independent variable).

Below is the fully parallelized, working, tested code:

functions {

 //Custom quantile function for lognormal distribution
 matrix log_normal_qf(vector x, vector m, vector s) {

  //takes a vector sampling the x-axis in terms of cumulative density, plus vectors of mu and sigma parameters
  return exp(rep_matrix(s' * sqrt(2), size(x)) .* rep_matrix(inv_erfc(-2 * x + 2), size(m)) + rep_matrix(m', size(x)));

 }

 //Likelihood model for CASANDRE
 real casandre_log(array[] matrix resps, vector guess, matrix sds, matrix sm, vector nconf, vector pconf) {

  //declare and initialize log likelihood variable
  real llhC;
  llhC = 0;

  //loop over participants to calculate log likelihood
  for (n in 1:size(resps)) {

   //declare the index-sizing variable
   int m[rows(resps[n])];

   //build the logical vector of non-zero trials
   for (tt in 1:size(m)) {
      m[tt] = sum(resps[n][tt]) == 1;
   }

   //declare the likelihood variable
   matrix[rows(sm),4] lhC;

   //check if subject has any non-zero data before running
   if (sum(resps[n]) != 0) {

    //declare incrementing and index variables
    int t;
    int ind[sum(m)];

    //initialize incrementing variable
    t = 1;

    //declare and calculate mu parameters for normal pdfs
    matrix[rows(sm),rows(sds)] avgs;
    avgs = rep_matrix(sm[,n],rows(sds)).*rep_matrix(sds[,n]',rows(sm));
  
    //loop over trials
    for (tr in 1:rows(sm)){

     //check if trial has any non-zero data before running
     if (sum(resps[n][tr]) != 0) {

      //declare sample vector
      matrix[3,rows(sds)] raws;

      //sample the cumulative density of normals along the transformed x-axis for each confidence bin
      for (rws in 1:rows(sds)) {
       raws[1,rws] = normal_cdf(-nconf[n],avgs[tr,rws],sds[rws,n]);
      }
      for (rws in 1:rows(sds)) {
       raws[2,rws] = normal_cdf(0,avgs[tr,rws],sds[rws,n]);
      }
      for (rws in 1:rows(sds)) {
       raws[3,rws] = normal_cdf(pconf[n],avgs[tr,rws],sds[rws,n]);
      }

      //declare the cumulative likelihood variable
      vector[3] ratiodist;

      //take the mean of the sampled densities
      ratiodist[1] = mean(raws[1]);
      ratiodist[2] = mean(raws[2]);
      ratiodist[3] = mean(raws[3]);

      //implement a lapse rate parameter to absorb unmodeled noise, and calculate likelihoods of each choice possibility in the trial
      lhC[tr,1] = ((guess[n]/4) + ((1-guess[n])*ratiodist[1]));
      lhC[tr,2] = ((guess[n]/4) + ((1-guess[n])*(ratiodist[2]-ratiodist[1])));
      lhC[tr,3] = ((guess[n]/4) + ((1-guess[n])*(ratiodist[3]-ratiodist[2])));
      lhC[tr,4] = ((guess[n]/4) + ((1-guess[n])*(1-ratiodist[3])));

      //add this trial to the index
      ind[t] = tr;

      //increment the index
      t = t + 1;

     }

    }

    //multiply the choice matrix by the log of the likelihood matrix
    llhC += sum(columns_dot_product(resps[n][ind], log(lhC[ind])));

   }
  
  }

  //return the log likelihood for all data given the sampled parameters
  return llhC;
  
 }

 //Partial sum function
 real partial_sum_casandre_log(array[] matrix slice_n_resps, int start, int end, vector guess, matrix sds, matrix sm, vector nconf, vector pconf) {

 //dynamically slice data into the log likelihood function
 return casandre_log(slice_n_resps, guess[start:end], sds[:,start:end], sm[:,start:end], nconf[start:end], pconf[start:end]);

 }
  
}
data {

 //declare the number of conditions, number of contrast levels, number of subjects, and the number of trials in the largest unpadded data set
 int ncond;
 int ncont;
 int ns;
 int nt;

 //declare the response matrix (one subject per array)
 array[ncond,ncont,ns] matrix[nt,4] resps;

 //declare the orientation matrix (ever subject in each array)
 array[ncond,ncont] matrix[nt,ns] oris;

 //declare the number of x-axis samples and the vector sampling the x-axis in terms of cumulative density
 int sampn;
 vector[sampn] sampx;

}
parameters {

 //Parameters
 vector<lower=0,upper=1>[ns] guess; //lapse rate
 matrix<lower=0>[ncont,ns] sens; //stimulus sensitivities
 matrix[ncont,ns] crit; //decision criteria
 matrix<lower=0>[ncond,ns] meta; //meta-uncertainties
 matrix<lower=0>[ncond,ns] nconf; //negative confidence criteria
 matrix<lower=0>[ncond,ns] pconf; //positive confidence criteria

 //Hyperparameters
 vector[ncont] ssm; //mu hyperparameters of each contrast level's stimulus sensitivity lognormal prior
 vector<lower=0>[ncont] sss; //sigma hyperparameters of each contrast level's stimulus sensitivity lognormal prior
 vector[ncont] scm; //mu hyperparameters of each contrast level's decision criterion normal prior
 vector<lower=0>[ncont] scs; //sigma hyperparameters of each contrast level's s decision criterion normal prior
 vector[ncond] mum; //mu hyperparameters of each condition's meta-uncertainty lognormal prior
 vector<lower=0>[ncond] mus; //sigma hyperparameters of each condition's meta-uncertainty lognormal prior
 vector[ncond] nccm; //mu hyperparameters of each condition's negative confidence criterion lognormal prior
 vector<lower=0>[ncond] nccs; //sigma hyperparameters of each condition's negative confidence criterion lognormal prior
 vector[ncond] pccm; //mu hyperparameters of each condition's positive confidence criterion lognormal prior
 vector<lower=0>[ncond] pccs; //sigma hyperparameters of each condition's positive confidence criterion lognormal prior

}
model {

 //Hyperpriors
 ssm  ~ normal(0,1);
 sss  ~ lognormal(0,1);
 scm  ~ normal(0,1);
 scs  ~ lognormal(0,1);
 mum  ~ normal(0,1);
 mus  ~ lognormal(0,1);
 nccm  ~ normal(0,1);
 nccs  ~ lognormal(0,1);
 pccm ~ normal(0,1);
 pccs ~ lognormal(0,1);

 //Priors
 guess  ~ beta(1,197.0/3.0);

 for (cond in 1:ncond) {
  meta[cond]   ~ lognormal(mum[cond],mus[cond]);
  nconf[cond]   ~ lognormal(nccm[cond],nccs[cond]);
  pconf[cond] ~ lognormal(pccm[cond],pccs[cond]);
 }

 for (cont in 1:ncont) {
  sens[cont] ~ lognormal(ssm[cont],sss[cont]);
  crit[cont] ~ normal(scm[cont],scs[cont]);
 }

 //Likelihood model (with local variable calculations)
 for (cond in 1:ncond) {

  //declare the transformed x-axis matrices
  array[ncond] matrix[sampn,ns] xtrans;
  array[ncond] matrix[sampn,ns] sds;

  //calculate the transformed x-axis matrices
  xtrans[cond] = log_normal_qf(sampx,-0.5*log1p(meta[cond]'.^2),sqrt(log1p(meta[cond]'.^2)));
  sds[cond] = 1./xtrans[cond];

  for (cont in 1:ncont) {

   //check if condition and contrast level combination has non-zero data
   if (sum(abs(oris[cond][cont])) != 0) {

    //declare matrices for calculating each contrast level's transformed data
    array[ncont] matrix[nt,ns] om;
    array[ncont] row_vector[ns] cm;
    array[ncont] matrix[nt,ns] sm;

    //orientations and decision criteria in terms of standard deviations
    om[cont] = oris[cond][cont].*rep_matrix(sens[cont],nt);
    cm[cont] = crit[cont].*sens[cont];
    
    //transform the data
    sm[cont] = om[cont]-rep_matrix(cm[cont],nt);

    //hand the data and parameters to the reduce sum wrap function for dynamic within-chain parallelization
    target += reduce_sum(partial_sum_casandre_log, resps[cond][cont], 1, guess, sds[cond], sm[cont], nconf[cond]', pconf[cond]');

   }

  }

 }

}

The code is working great now, with good parameter recovery, and a run time of just five-and-a-half days for our largest data set with 1000 warmups + 10000 iterations (mESS taken from this paper, infographic here) on our cluster utilizing approximately 80 CPU to parallelize four simultaneously sampled chains.

I want to give special thanks to jsocolar, stevebronder, spinkney, wds15, and Bob_Carpenter for all of their help getting this up and running! This forum and the documentation have been invaluable resources, particularly the Stan User’s Guide and Stan Functions Reference. I deeply appreciate all of the help, and hope this effectively passes on what I’ve learned throughout this three month process!

Sincerely,
Corey

3 Likes