Hey all,
The aim of this thread is to consolidate all of my previous threads about my code, and to provide the finished, tested, code so that others can benefit from all of the help I received. The program I’m going to share is a Bayesian hierarchical version of a two-stage process model of metacognition. The model works with data from psychophysical experiments, and experiments that generate similar data, where a decision is made and a confidence report is given.
The model had to be built to accomodate different metacognitive contexts (termed conditions) and task difficulties (termed contrast levels) in such a way that conditions could cut across contrast levels and vice versa. Because not all conditions contain all contrast levels, low confidence variability subjects are not included in the data set, data sets are of different sizes depending on the combination of condition and contrast, and there are lapse trials within some subjects’ data, the data set is very ragged and had to be extensively padded. Further, this model ran into significant performance issues where the expected run time for the full data set was, initially, months! These were the problems facing the model, addressed across the following threads:
Note that this code was written for cmdstan-2.30.1, which is the version of Stan available on the cluster our lab has access to. Data takes the form of a response matrix with trial-wise response rows of the form [0 0 0 0] (for two confidence levels), and the trial-matched orientations (or other independent variable).
Below is the fully parallelized, working, tested code:
functions {
//Custom quantile function for lognormal distribution
matrix log_normal_qf(vector x, vector m, vector s) {
//takes a vector sampling the x-axis in terms of cumulative density, plus vectors of mu and sigma parameters
return exp(rep_matrix(s' * sqrt(2), size(x)) .* rep_matrix(inv_erfc(-2 * x + 2), size(m)) + rep_matrix(m', size(x)));
}
//Likelihood model for CASANDRE
real casandre_log(array[] matrix resps, vector guess, matrix sds, matrix sm, vector nconf, vector pconf) {
//declare and initialize log likelihood variable
real llhC;
llhC = 0;
//loop over participants to calculate log likelihood
for (n in 1:size(resps)) {
//declare the index-sizing variable
int m[rows(resps[n])];
//build the logical vector of non-zero trials
for (tt in 1:size(m)) {
m[tt] = sum(resps[n][tt]) == 1;
}
//declare the likelihood variable
matrix[rows(sm),4] lhC;
//check if subject has any non-zero data before running
if (sum(resps[n]) != 0) {
//declare incrementing and index variables
int t;
int ind[sum(m)];
//initialize incrementing variable
t = 1;
//declare and calculate mu parameters for normal pdfs
matrix[rows(sm),rows(sds)] avgs;
avgs = rep_matrix(sm[,n],rows(sds)).*rep_matrix(sds[,n]',rows(sm));
//loop over trials
for (tr in 1:rows(sm)){
//check if trial has any non-zero data before running
if (sum(resps[n][tr]) != 0) {
//declare sample vector
matrix[3,rows(sds)] raws;
//sample the cumulative density of normals along the transformed x-axis for each confidence bin
for (rws in 1:rows(sds)) {
raws[1,rws] = normal_cdf(-nconf[n],avgs[tr,rws],sds[rws,n]);
}
for (rws in 1:rows(sds)) {
raws[2,rws] = normal_cdf(0,avgs[tr,rws],sds[rws,n]);
}
for (rws in 1:rows(sds)) {
raws[3,rws] = normal_cdf(pconf[n],avgs[tr,rws],sds[rws,n]);
}
//declare the cumulative likelihood variable
vector[3] ratiodist;
//take the mean of the sampled densities
ratiodist[1] = mean(raws[1]);
ratiodist[2] = mean(raws[2]);
ratiodist[3] = mean(raws[3]);
//implement a lapse rate parameter to absorb unmodeled noise, and calculate likelihoods of each choice possibility in the trial
lhC[tr,1] = ((guess[n]/4) + ((1-guess[n])*ratiodist[1]));
lhC[tr,2] = ((guess[n]/4) + ((1-guess[n])*(ratiodist[2]-ratiodist[1])));
lhC[tr,3] = ((guess[n]/4) + ((1-guess[n])*(ratiodist[3]-ratiodist[2])));
lhC[tr,4] = ((guess[n]/4) + ((1-guess[n])*(1-ratiodist[3])));
//add this trial to the index
ind[t] = tr;
//increment the index
t = t + 1;
}
}
//multiply the choice matrix by the log of the likelihood matrix
llhC += sum(columns_dot_product(resps[n][ind], log(lhC[ind])));
}
}
//return the log likelihood for all data given the sampled parameters
return llhC;
}
//Partial sum function
real partial_sum_casandre_log(array[] matrix slice_n_resps, int start, int end, vector guess, matrix sds, matrix sm, vector nconf, vector pconf) {
//dynamically slice data into the log likelihood function
return casandre_log(slice_n_resps, guess[start:end], sds[:,start:end], sm[:,start:end], nconf[start:end], pconf[start:end]);
}
}
data {
//declare the number of conditions, number of contrast levels, number of subjects, and the number of trials in the largest unpadded data set
int ncond;
int ncont;
int ns;
int nt;
//declare the response matrix (one subject per array)
array[ncond,ncont,ns] matrix[nt,4] resps;
//declare the orientation matrix (ever subject in each array)
array[ncond,ncont] matrix[nt,ns] oris;
//declare the number of x-axis samples and the vector sampling the x-axis in terms of cumulative density
int sampn;
vector[sampn] sampx;
}
parameters {
//Parameters
vector<lower=0,upper=1>[ns] guess; //lapse rate
matrix<lower=0>[ncont,ns] sens; //stimulus sensitivities
matrix[ncont,ns] crit; //decision criteria
matrix<lower=0>[ncond,ns] meta; //meta-uncertainties
matrix<lower=0>[ncond,ns] nconf; //negative confidence criteria
matrix<lower=0>[ncond,ns] pconf; //positive confidence criteria
//Hyperparameters
vector[ncont] ssm; //mu hyperparameters of each contrast level's stimulus sensitivity lognormal prior
vector<lower=0>[ncont] sss; //sigma hyperparameters of each contrast level's stimulus sensitivity lognormal prior
vector[ncont] scm; //mu hyperparameters of each contrast level's decision criterion normal prior
vector<lower=0>[ncont] scs; //sigma hyperparameters of each contrast level's s decision criterion normal prior
vector[ncond] mum; //mu hyperparameters of each condition's meta-uncertainty lognormal prior
vector<lower=0>[ncond] mus; //sigma hyperparameters of each condition's meta-uncertainty lognormal prior
vector[ncond] nccm; //mu hyperparameters of each condition's negative confidence criterion lognormal prior
vector<lower=0>[ncond] nccs; //sigma hyperparameters of each condition's negative confidence criterion lognormal prior
vector[ncond] pccm; //mu hyperparameters of each condition's positive confidence criterion lognormal prior
vector<lower=0>[ncond] pccs; //sigma hyperparameters of each condition's positive confidence criterion lognormal prior
}
model {
//Hyperpriors
ssm ~ normal(0,1);
sss ~ lognormal(0,1);
scm ~ normal(0,1);
scs ~ lognormal(0,1);
mum ~ normal(0,1);
mus ~ lognormal(0,1);
nccm ~ normal(0,1);
nccs ~ lognormal(0,1);
pccm ~ normal(0,1);
pccs ~ lognormal(0,1);
//Priors
guess ~ beta(1,197.0/3.0);
for (cond in 1:ncond) {
meta[cond] ~ lognormal(mum[cond],mus[cond]);
nconf[cond] ~ lognormal(nccm[cond],nccs[cond]);
pconf[cond] ~ lognormal(pccm[cond],pccs[cond]);
}
for (cont in 1:ncont) {
sens[cont] ~ lognormal(ssm[cont],sss[cont]);
crit[cont] ~ normal(scm[cont],scs[cont]);
}
//Likelihood model (with local variable calculations)
for (cond in 1:ncond) {
//declare the transformed x-axis matrices
array[ncond] matrix[sampn,ns] xtrans;
array[ncond] matrix[sampn,ns] sds;
//calculate the transformed x-axis matrices
xtrans[cond] = log_normal_qf(sampx,-0.5*log1p(meta[cond]'.^2),sqrt(log1p(meta[cond]'.^2)));
sds[cond] = 1./xtrans[cond];
for (cont in 1:ncont) {
//check if condition and contrast level combination has non-zero data
if (sum(abs(oris[cond][cont])) != 0) {
//declare matrices for calculating each contrast level's transformed data
array[ncont] matrix[nt,ns] om;
array[ncont] row_vector[ns] cm;
array[ncont] matrix[nt,ns] sm;
//orientations and decision criteria in terms of standard deviations
om[cont] = oris[cond][cont].*rep_matrix(sens[cont],nt);
cm[cont] = crit[cont].*sens[cont];
//transform the data
sm[cont] = om[cont]-rep_matrix(cm[cont],nt);
//hand the data and parameters to the reduce sum wrap function for dynamic within-chain parallelization
target += reduce_sum(partial_sum_casandre_log, resps[cond][cont], 1, guess, sds[cond], sm[cont], nconf[cond]', pconf[cond]');
}
}
}
}
The code is working great now, with good parameter recovery, and a run time of just five-and-a-half days for our largest data set with 1000 warmups + 10000 iterations (mESS taken from this paper, infographic here) on our cluster utilizing approximately 80 CPU to parallelize four simultaneously sampled chains.
I want to give special thanks to jsocolar, stevebronder, spinkney, wds15, and Bob_Carpenter for all of their help getting this up and running! This forum and the documentation have been invaluable resources, particularly the Stan User’s Guide and Stan Functions Reference. I deeply appreciate all of the help, and hope this effectively passes on what I’ve learned throughout this three month process!
Sincerely,
Corey