Newly Specified Bayesian Hierarchical Model with High Runtime

Hey all,

This should be my last thread on this particular Stan code, but I’m very new to Stan and BHM and could use some advice. The code is finally debugged to the point where it is returning parameter values on a test run with the data from two participants. There is a hitch, however, and I want to make sure it’s not the result of terribly inefficient (or incorrect) coding. I’m going to put the Stan version, interface, and code below, and then get into the issue.

Stan version: 2.31.0
Interface: MatlabStan 2.15.1.0


functions {

	//custom function for lognormal inverse CDF
	row_vector logncdfinv(row_vector x, real m, real s) {							

		int sz;												    //declaring size variable
		sz = size(x);											//calculating size variable
		row_vector[sz] y;										//declaring return variable
		y = exp( s .* sqrt(2) .* inv_erfc( -2 .* x + 2 ) + m );						//calculating result

		return y;											    //returning result

		}

	//likelihood model for CASANDRE				
	real casandre_log(matrix z, real j, real k, real l, real m, real n, vector o, int p) {			

		vector[p] sm;											//declaring the rescaled sensitivity vector
		real sc;											    //declaring the rescaled confidence criterion variable
		sm = k.*o;											    //calculating the rescaled stimulus sensitivity vector (stimulus sensitivity * orientation vector)
		sc = k*n;											    //calculating the rescaled confidence criterion variable (stimulus sensitivity * confidence criterion)
		row_vector[30] xtrans;								    //declaring the row vector for the transformed x-axis
		xtrans = logncdfinv( linspaced_row_vector(30,.5/30,1-.5/30),
		log( 1 / sqrt( m^2 + 1 ) ), 
		sqrt( log( m^2 + 1 ) ) );							    //calculating the lognormal inverse cdf for 30 equally spaced x-axis intervals

		matrix[p,4] llhC;										//declaring the matrix of choice likelihoods
		
		for (tr in 1:p){										//for loop through every trial for this subject
		
			row_vector[3] confs;						        //declaring the confidence level vector
			confs = [-n, 0, n];									//defining the confidence level vector
			vector[30] avgs;									//declaring the mu vector
			avgs = (1./xtrans').*(sm[tr]-sc);					//calculating the mu vector
			vector[30] sds;										//declaring the sigma vector
			sds = 1./xtrans';									//calculating the sigma vector

			matrix[30,3] raws;									//declaring the normal cdf matrix of the transformed data
			
			for (cls in 1:3) {									                    //for loop through each confidence criterion (i.e. output column)
				for (rws in 1:30) {								                    //for loop through each sampled normal distribution (i.e. output row)
					raws[rws,cls] = normal_cdf(confs[cls],avgs[rws],sds[rws]);		//calculating the normal cdf matrix of the transformed data
					}
				}

			vector[30] raw1;									//declaring vectors for the rows of the normal cdf matrix
			vector[30] raw2;
			vector[30] raw3;
			raw1 = raws[:,1];									//extracting vectors for the rows of the normal cdf matrix
			raw2 = raws[:,2];
			raw3 = raws[:,3];
			real avg1;										    //declaring the variables for the row averages
			real avg2;
			real avg3;
			avg1 = mean(raw1);									//calculating the row averages
			avg2 = mean(raw2);
			avg3 = mean(raw3);
			vector[3] ratiodist;								//declaring the vector for the means of the lognormally scaled normal distributions

			ratiodist[1] = avg1;								//collecting the means into the vector
			ratiodist[2] = avg2;
			ratiodist[3] = avg3;

			llhC[tr,1] = ((j/4) + ((1-j)*ratiodist[1]));						//likelihood of high confidence clockwise judgment
			llhC[tr,2] = ((j/4) + ((1-j)*(ratiodist[2]-ratiodist[1])));			//likelihood of low confidence clockwise judgment
			llhC[tr,3] = ((j/4) + ((1-j)*(ratiodist[3]-ratiodist[2])));			//likelihood of low confidence counter-clockwise judgment
			llhC[tr,4] = ((j/4) + ((1-j)*(1-ratiodist[3])));					//likelihood of high confidence counter-clockwise judgment

 		}

		real ll;											    //declare log likelihood sum for these choices, given these orientations, and these parameter choices
		ll = sum(z.*log(llhC));									//calculate the log likelihood sum for these choices, given these orientations, and these parameter choices
		return ll;											    //return the log likelihood value
		
	}
		
}
data {

	int ns;													        //number of subjects
	int nt; 												        //number of trials
	matrix[4,ns*nt] respslong;										//responses for all subjects as one long matrix
	row_vector[ns*nt] orislong;										//orientations for all subjects as one long vector
	
}
transformed data {

	
	matrix[ns*nt,4] respstall;										//declare variable for tall matrix of responses
	vector[ns*nt] oristall;											//declare variable for tall vector of orientations
	int nth;												        //declare index for breaking up subjects into separate matrices

	respstall = respslong';											//make tall matrix of responses
	oristall = orislong';											//make tall vector of orientations

	array[ns] matrix[nt,4] resps;									//declare final data array for responses
	array[ns] vector[nt] oris;										//declare final data array for orientations

	nth = 1;												        //begin index

	for (sub in 1:ns) {											    //for loop to build final data arrays

		resps[sub] = respstall[nth:(nth+nt-1),1:4];					//build each matrix of responses for final data array									

		oris[sub] = oristall[nth:(nth+nt-1)];						//build each vector of orientations for final data array

		nth = nth + nt;											    //increment the indices

	}

}
parameters {

	//Parameters
	vector<lower=0,upper=1>[ns] guess;								//guess rate
	vector<lower=0>[ns] sens;										//stimulus sensitivity
	vector[ns] crit;											    //decision criterion
	vector<lower=0.1>[ns] meta;										//meta-uncertainty
	vector<lower=0>[ns] conf;										//confidence criterion

	//Hyperparameters
	real<lower=0> ga;												
	real<lower=0> gb;
	real snm;
	real<lower=0> sns;
	real cm;
	real<lower=0> cs;
	real mm;
	real<lower=0> ms;
	real ccm;
	real<lower=0> ccs;

}
model {
	
	//Hyperpriors
	ga ~	lognormal(0,1);											//alpha hyperparameter for beta distribution of guess rate
	gb ~	lognormal(0,1);											//beta hyperparameter for beta distribution of guess rate
	snm ~	normal(0,1);											//mu hyperparameter for lognormal distribution of stimulus sensitivity
	sns ~	lognormal(0,1);											//sigma hyperparameter for lognormal distribution of stimulus sensitivity
	cm ~	normal(0,1);											//mu hyperparameter for normal distribution of decision criterion
	cs ~	lognormal(0,1);											//sigma hyperparameter for normal distribution of decision criterion
	mm ~	normal(0,1);											//mu hyperparameter for lognormal distribution of meta-uncertainty
	ms ~	lognormal(0,1);											//sigma hyperparameter for lognormal distribution of meta-uncertainty
	ccm ~	normal(0,1);											//mu hyperparameter for lognormal distribution of confidence criterion
	ccs ~	lognormal(0,1);											//sigma hyperparameter for lognormal distribution of confidence criterion


	//Loop through the particpants
	for (i in 1:ns) {

	//Priors
	guess[i] ~	beta(ga,gb);										//guess rate is a parameter bounded between 0 and 1
	sens[i] ~	lognormal(snm,sns);									//stimulus sensitivity is a strictly positive parameter
	crit[i] ~	normal(cm,cs);										//decision criterion is an unconstrained parameter
	meta[i] ~	lognormal(mm,ms);									//meta-uncertainty is a strictly positive parameter
	conf[i] ~	lognormal(ccm,ccs);									//confidence criterion is a strictly positive parameter


	//Likelihood Model						
	resps[i] ~ casandre(guess[i],sens[i],crit[i],meta[i],conf[i],oris[i],nt);				//likelihood model for this this participant

	}

}

As you can see, the model has ten hyperparameters, and five times as many parameters/priors as the number of participants. It takes a response matrix ([number of options] x [number of trials * number of subjects]), and a stimulus vector ([number of trials * number of subjects]) as an input and slices them up into an array to give the correct response matrix and trial orientations for each subject for use in the likelihood model.

The problem that I’m running into is time to compute. When the program starts up, it informs me that calculating the gradient takes approximately three seconds, and that, with ten leapfrogs per iteration, it will take approximately 30,000 seconds to run 1000 iterations. At 1000 warm ups, and 1000 iterations, for the computer we’re running it on, that’s about seventeen hours.

What I want to know is if there are ways I could have better written this Bayesian hierarchical model such that the computing time is not so high. We have the option of moving to a more powerful setup, but in the interim, knowing that we will eventually want to multiply these parameters by the number of conditions in the experiment to run the true full model, if I am doing something terribly inefficient or outright wrong, I want to fix it now.

I appreciate any help or advice!

Sorry it’s taking a little while for you to get a response to a well written question. This likelihood looks very very complicated and pretty exotic–unlike anything I’ve seen before anyway. That makes it hard for me to spot where the potential speedups are, and hard to understand what is essentially reshaping of parameters and data versus what is essentially likelihood computation. If you could provide a description of what’s going on here, and what the likelihood is for like a single trial, it might help me (or somebody) to find their way into this problem.

Note that it is a virtual certainty that fitting this model will require more than 10 leapfrog steps per iteration–possibly well over ten times that, so you are staring at some pretty obnoxious runtimes here!

See below for some suggested changes. In general you want to pull out any operations that you can do outside of a loop to outside the loops. Also I think you could get rid of the outer loop over casandre by rewriting this as vector and matrix operations. You could also try using --O1 in the stanc compiler, though I don’t think it will help much here unless you had more vector and matrix operations in the model block

functions {
  //custom function for lognormal inverse CDF
  row_vector logncdfinv(row_vector x, real m, real s) {
    int sz; //declaring size variable
    sz = size(x); //calculating size variable
    row_vector[sz] y; //declaring return variable
    y = exp(s .* sqrt(2) .* inv_erfc(-2 .* x + 2) + m); //calculating result

    return y; //returning result
  }

  //likelihood model for CASANDRE
  /* Steve: In general give these better names.
   * I thought j, k, and l were loop variables later in the code
   * m and n are usually used as sizes
   * and o and p just don't really make sense. Long names are better
   */
  real casandre_lpdf(matrix z, real j, real k, real l, real m, real n,
                     vector o, int p) {
    //calculating the rescaled stimulus sensitivity vector (stimulus sensitivity * orientation vector)
    /* Steve: Do this outside of loop */
    vector[p] sm = k .* o; //declaring the rescaled sensitivity vector
    //calculating the rescaled confidence criterion variable (stimulus sensitivity * confidence criterion)
    real sc = k * n; //declaring the rescaled confidence criterion variable
    //declaring the row vector for the transformed x-axis
    /* Steve: Pull everything you can outside of the loop */
    row_vector[30] xtrans = logncdfinv(linspaced_row_vector(30, .5 / 30,
                                                            1 - .5 / 30),
                                       log(1 / sqrt(m ^ 2 + 1)),
                                       sqrt(log(m ^ 2 + 1))); //calculating the lognormal inverse cdf for 30 equally spaced x-axis intervals

    matrix[p, 4] llhC; //declaring the matrix of choice likelihoods
    //for loop through every trial for this subject
    /* Steve: I think rep matrix here will work */
    matrix[30, p] avgs = (1. / xtrans') .* (rep_matrix(sm, 30) - sc); //declaring the mu vector
    //calculating the sigma vector
    vector[30] sds = 1. / xtrans'; //declaring the sigma vector
    for (tr in 1 : p) {
      matrix[30, 3] raws; //declaring the normal cdf matrix of the transformed data
      //for loop through each confidence criterion (i.e. output column)
      for (rws in 1 : 30) {
        //for loop through each sampled normal distribution (i.e. output row)
        raws[rws, 0] = normal_cdf(-n | avgs[rws], sds[rws]); //calculating the normal cdf matrix of the transformed data
      }
      for (rws in 1 : 30) {
        //for loop through each sampled normal distribution (i.e. output row)
        raws[rws, 1] = normal_cdf(0 | avgs[rws], sds[rws]); //calculating the normal cdf matrix of the transformed data
      }
      for (rws in 1 : 30) {
        //for loop through each sampled normal distribution (i.e. output row)
        raws[rws, 2] = normal_cdf(n | avgs[rws], sds[rws]); //calculating the normal cdf matrix of the transformed data
      }

      vector[3] ratiodist; //declaring the vector for the means of the lognormally scaled normal distributions
      ratiodist[1] = mean(raws[ : , 1]); //collecting the means into the vector
      ratiodist[2] = mean(raws[ : , 2]);
      ratiodist[3] = mean(raws[ : , 3]);
      llhC[tr, 1] = ((j / 4) + ((1 - j) * ratiodist[1])); //likelihood of high confidence clockwise judgment
      llhC[tr, 2] = ((j / 4) + ((1 - j) * (ratiodist[2] - ratiodist[1]))); //likelihood of low confidence clockwise judgment
      llhC[tr, 3] = ((j / 4) + ((1 - j) * (ratiodist[3] - ratiodist[2]))); //likelihood of low confidence counter-clockwise judgment
      llhC[tr, 4] = ((j / 4) + ((1 - j) * (1 - ratiodist[3]))); //likelihood of high confidence counter-clockwise judgment
    }

    /* Steve: Propogate log(llhs) to the expressions above to make
     * division / multiplication into subtraction and addition
     * the log should propagate all the way to the normal_lpdf functions
     */
    return sum(z .* log(llhC)); //calculate the log likelihood sum for these choices, given these orientations, and these parameter choices
  }
}
data {
  int ns; //number of subjects
  int nt; //number of trials
  matrix[4, ns * nt] respslong; //responses for all subjects as one long matrix
  row_vector[ns * nt] orislong; //orientations for all subjects as one long vector
}
transformed data {
  matrix[ns * nt, 4] respstall; //declare variable for tall matrix of responses
  vector[ns * nt] oristall; //declare variable for tall vector of orientations
  int nth; //declare index for breaking up subjects into separate matrices

  respstall = respslong'; //make tall matrix of responses
  oristall = orislong'; //make tall vector of orientations
  /** Steve: Why not just bring in the data like this? */
  array[ns] matrix[nt, 4] resps; //declare final data array for responses
  /* make this matrix[nt, nt] */
  array[ns] vector[nt] oris; //declare final data array for orientations

  nth = 1; //begin index

  for (sub in 1 : ns) {
    //for loop to build final data arrays

    resps[sub] = respstall[nth : (nth + nt - 1), 1 : 4]; //build each matrix of responses for final data array

    oris[sub] = oristall[nth : (nth + nt - 1)]; //build each vector of orientations for final data array

    nth = nth + nt; //increment the indices
  }
}
parameters {
  //Parameters
  vector<lower=0, upper=1>[ns] guess; //guess rate
  vector<lower=0>[ns] sens; //stimulus sensitivity
  vector[ns] crit; //decision criterion
  vector<lower=0.1>[ns] meta; //meta-uncertainty
  vector<lower=0>[ns] conf; //confidence criterion

  //Hyperparameters
  real<lower=0> ga;
  real<lower=0> gb;
  real snm;
  real<lower=0> sns;
  real cm;
  real<lower=0> cs;
  real mm;
  real<lower=0> ms;
  real ccm;
  real<lower=0> ccs;
}
model {
  //Hyperpriors
  ga ~ lognormal(0, 1); //alpha hyperparameter for beta distribution of guess rate
  gb ~ lognormal(0, 1); //beta hyperparameter for beta distribution of guess rate
  snm ~ normal(0, 1); //mu hyperparameter for lognormal distribution of stimulus sensitivity
  sns ~ lognormal(0, 1); //sigma hyperparameter for lognormal distribution of stimulus sensitivity
  cm ~ normal(0, 1); //mu hyperparameter for normal distribution of decision criterion
  cs ~ lognormal(0, 1); //sigma hyperparameter for normal distribution of decision criterion
  mm ~ normal(0, 1); //mu hyperparameter for lognormal distribution of meta-uncertainty
  ms ~ lognormal(0, 1); //sigma hyperparameter for lognormal distribution of meta-uncertainty
  ccm ~ normal(0, 1); //mu hyperparameter for lognormal distribution of confidence criterion
  ccs ~ lognormal(0, 1); //sigma hyperparameter for lognormal distribution of confidence criterion

  guess ~ beta(ga, gb); //guess rate is a parameter bounded between 0 and 1
  sens ~ lognormal(snm, sns); //stimulus sensitivity is a strictly positive parameter
  crit ~ normal(cm, cs); //decision criterion is an unconstrained parameter
  meta ~ lognormal(mm, ms); //meta-uncertainty is a strictly positive parameter
  conf ~ lognormal(ccm, ccs); //confidence criterion is a strictly positive parameter
  //Loop through the particpants
  for (i in 1 : ns) {
    //Likelihood Model
    /* Switch to go over columns with oris[, i] instead of rows */
    resps[i] ~ casandre(guess[i], sens[i], crit[i], meta[i], conf[i],
                        oris[i], nt); //likelihood model for this this participant
  }
}


1 Like

Hey, thanks for the response! I’m working on the code now. One question: why pull the priors out of the for loop? Each participant has their own set of parameters. Is it still BHM if the priors are not looped over for each participant?

Hey! Thanks for the response. Per stevebronder’s suggestion, I’m going to rewrite the code a bit for clarity – that should help me explain what’s going on a bit more clearly

A couple of other points

Create the linspaced_row_vector in the transformed data block and pass this in. Use the “data” signifier to tell Stan this doesn’t need to be AD by declaring it in the function like data row_vector linspaced_vector_input. Use log1p instead of log(m^2 + 1).

    row_vector[30] xtrans = logncdfinv(linspaced_row_vector(30, .5 / 30,
                                                            1 - .5 / 30),
                                       log(1 / sqrt(m ^ 2 + 1)),
                                       sqrt(log(m ^ 2 + 1))); //calculating the lognormal inverse cdf for 30 equally spaced x-axis interval

There seems to be an error here. Stan begins indexes at 1.

  raws[rws, 0] = normal_cdf(-n | avgs[rws], sds[rws]); //calculating the normal cdf matrix

The final return sum(z .* log(llhC)) can be written as trace(z * log(llhc)').

1 Like

The priors are all the same for each participant since you have a real there, so you can just use the vectorized version and it will give you the same result

1 Like

Great! Thanks!

As for pulling the rescaling of the x-axis out of the for loop – would that be accomplished by writing out its formulation in the transformed parameter block and then calling in the transformed parameter vector? That is:

transformed parameters {

matrix[ns,sampn] xtrans;

xtrans = logncdfinv( sampx,
(-1/2)*log1p(meta^2)
sqrt(log1p(meta^2))
}
model{
resps[i] = casandre(guess[i], sens[i], crit[i], xtrans[i,], conf[i], oris[i], nt);
}
1 Like

Thank you! I’ve begun implementing these changes.

I just wanted to clarify the use of log1p, would those statements correctly be rewritten as:

xtrans = logncdfinv(sampx,
         (-1/2)*log1p(meta^2),
         sqrt(log1p(meta^2));

Yep, that looks good!

1 Like

Alright, so I’ve overhauled the code, and this is what it looks like now:


functions {

	//custom function for lognormal inverse CDF
	matrix logncdfinv(vector x, vector m, vector s) {							

		int szx;													//declaring sample size variable
		int szm;													//declaring the mu/sigma size variable
		szx = size(x);												//calculating size variable
		szm = size(m);												//calculating the mu/sigma size variable
		matrix[szx,szm] y;										    //declaring return variable
		y = exp( rep_matrix(s',szx) * sqrt(2) 
            .* rep_matrix(inv_erfc( -2 .* x + 2 ),szm) 
            + rep_matrix(m',szx));		                            //calculating result
	
		return y;													//returning result

		}

	//likelihood model for CASANDRE				
	real casandre_log(matrix resps, real guess, vector sm, real sc, vector xtrans, real conf, int nt, int sampn) {			

		matrix[nt,4] llhC;										    //declaring the matrix of choice likelihoods
		
		for (tr in 1:nt){										    //for loop through every trial for this subject
		
			row_vector[3] confs;									//declaring the confidence level vector
			confs = [-conf, 0, conf];								//defining the confidence level vector
			vector[sampn] avgs;									    //declaring the mu vector
			avgs = (1./xtrans).*(sm[tr]-sc);				        //calculating the mu vector
			vector[sampn] sds;									    //declaring the sigma vector
			sds = 1./xtrans;									    //calculating the sigma vector

			matrix[sampn,3] raws;									//declaring the normal cdf matrix of the transformed data
			
			for (cls in 1:3) {									    //for loop through each confidence criterion (i.e. output column)
				for (rws in 1:sampn) {								//for loop through each sampled normal distribution (i.e. output row)
					raws[rws,cls] = normal_cdf(confs[cls],avgs[rws],sds[rws]);		//calculating the normal cdf matrix of the transformed data
					}
				}

			vector[sampn] raw1;									    //declaring vectors for the rows of the normal cdf matrix
			vector[sampn] raw2;
			vector[sampn] raw3;
			raw1 = raws[:,1];									    //extracting vectors for the rows of the normal cdf matrix
			raw2 = raws[:,2];
			raw3 = raws[:,3];
			real avg1;										        //declaring the variables for the row averages
			real avg2;
			real avg3;
			avg1 = mean(raw1);									    //calculating the row averages
			avg2 = mean(raw2);
			avg3 = mean(raw3);
			vector[3] ratiodist;									//declaring the vector for the means of the lognormally scaled normal distributions

			ratiodist[1] = avg1;									//collecting the means into the vector
			ratiodist[2] = avg2;
			ratiodist[3] = avg3;

			llhC[tr,1] = ((guess/4) + ((1-guess)*ratiodist[1]));					//likelihood of high confidence clockwise judgment
			llhC[tr,2] = ((guess/4) + ((1-guess)*(ratiodist[2]-ratiodist[1])));		//likelihood of low confidence clockwise judgment
			llhC[tr,3] = ((guess/4) + ((1-guess)*(ratiodist[3]-ratiodist[2])));		//likelihood of low confidence counter-clockwise judgment
			llhC[tr,4] = ((guess/4) + ((1-guess)*(1-ratiodist[3])));				//likelihood of high confidence counter-clockwise judgment

 		}

		real ll;											//declare log likelihood sum for these choices, given these orientations, and these parameter choices
		ll = sum(resps.*log(llhC));							//calculate the log likelihood sum for these choices, given these orientations, and these parameter choices
		return ll;											//return the log likelihood value
		
	}
		
}
data {

	int ns;													        //number of subjects
	int nt; 												        //number of trials
	int sampn;
	matrix[4,ns*nt] respslong;										//responses for all subjects as one long matrix
	row_vector[ns*nt] orislong;										//orientations for all subjects as one long vector
	vector[sampn] sampx;
	
}
transformed data {

	
	matrix[ns*nt,4] respstall;										//declare variable for tall matrix of responses
	vector[ns*nt] oristall;										    //declare variable for tall vector of orientations
	int nth;												        //declare index for breaking up subjects into separate matrices

	respstall = respslong';											//make tall matrix of responses
	oristall = orislong';											//make tall vector of orientations

	array[ns] matrix[nt,4] resps;									//declare final data array for responses
	matrix[nt,ns] oris;											    //declare final data array for orientations

	nth = 1;												        //begin index

	for (sub in 1:ns) {											    //for loop to build final data arrays

		resps[sub] = respstall[nth:(nth+nt-1),1:4];					//build each matrix of responses for final data array									

		oris[,sub] = oristall[nth:(nth+nt-1)];						//build each vector of orientations for final data array

		nth = nth + nt;											    //increment the indices

	}

}
parameters {

	//Parameters
	vector<lower=0,upper=1>[ns] guess;								//guess rate
	vector<lower=0>[ns] sens;										//stimulus sensitivity
	vector[ns] crit;											    //decision criterion
	vector<lower=0.1>[ns] meta;										//meta-uncertainty
	vector<lower=0>[ns] conf;										//confidence criterion

	//Hyperparameters
	real snm;
	real<lower=0> sns;
	real cm;
	real<lower=0> cs;
	real mm;
	real<lower=0> ms;
	real ccm;
	real<lower=0> ccs;

}
transformed parameters {

	matrix[nt,ns] sm;											    //declaring the rescaled sensitivity vector
	vector[ns] sc;												    //declaring the rescaled confidence criterion variable
	sm = oris.*rep_matrix(sens',nt);								//calculating the rescaled stimulus sensitivity vector (stimulus sensitivity * orientation vector)
	sc = crit.*sens;											    //calculating the rescaled confidence criterion variable (stimulus sensitivity * confidence criterion)

	matrix[sampn,ns] xtrans;										//declaring the transformed x axis matrix
	xtrans = logncdfinv(sampx,
             (-1/2)*log1p(meta.^2),
             sqrt(log1p(meta.^2)));					                //calculating the transformed x axis for each subject

}
model {
	
	//Hyperpriors
	snm 	~	normal(0,1);										//mu hyperparameter for lognormal distribution of stimulus sensitivity
	sns 	~	lognormal(0,1);										//sigma hyperparameter for lognormal distribution of stimulus sensitivity
	cm 	~	normal(0,1);										    //mu hyperparameter for normal distribution of decision criterion
	cs 	~	lognormal(0,1);										    //sigma hyperparameter for normal distribution of decision criterion
	mm 	~	normal(0,1);										    //mu hyperparameter for lognormal distribution of meta-uncertainty
	ms 	~	lognormal(0,1);										    //sigma hyperparameter for lognormal distribution of meta-uncertainty
	ccm 	~	normal(0,1);										//mu hyperparameter for lognormal distribution of confidence criterion
	ccs 	~	lognormal(0,1);										//sigma hyperparameter for lognormal distribution of confidence criterion

	//Priors
	guess 	~	beta(0.1,1);										//guess rate is a parameter bounded between 0 and 1
	sens	~	lognormal(snm,sns);									//stimulus sensitivity is a strictly positive parameter
	crit	~	normal(cm,cs);										//decision criterion is an unconstrained parameter
	meta  	~	lognormal(mm,ms);									//meta-uncertainty is a strictly positive parameter
	conf  	~	lognormal(ccm,ccs);									//confidence criterion is a strictly positive parameter

	//Loop through the particpants
	for (i in 1:ns) {

		//Likelihood Model						
		resps[i] ~ casandre(guess[i],sm[,i],sc[i],xtrans[,i],conf[i],nt,sampn);		//likelihood model for this this participant

	}

}

I implemented all of the advised changes with the exception of changing sum(z.* log(llhC)) to trace(z*log(llhc)') because upon running five tests on a single subject with it, I found that the time to calculate the gradient was three times higher on average, that it took twice as many leapfrogs per run on average, and that the overall time to run the model was twice as high on average, when using the latter over the former. After discussing it with my supervisor, we also decided that a hyperprior over the guess rate didn’t make good theoretical sense and made its prior fully specified with static parameters.

I am, however, seemingly still running into intractable run times. I’ve tried running the program on ten percent of the data per subject (n=149), which had a time to gradient evaluation of 0.276424 seconds (about a tenth of that needed for the full data set.) Running each chain on one core each, even with the reduced data set, it still took three and a half hours to get through just 100 warmups on all four chains. This is a problem as, in reality, we need to run about 10000 iterations to get the desired number of effective samples (~8500). And even with the reduced data set, at the rate that we’re seeing of one iteration per 158 seconds on average, that’s 17 days straight of fitting! We are trying to port the program over to a cluster, but even doing that will probably require that I rewrite the code for multithreading to really take advantage of it.

Is there anything more I can do to speed things up? Any more glaring inefficiencies in the code that could be addressed? Could reparameterization help? Is Stan simply not the appropriate sampler for our Bayesian hierarchical model? Or is this the right sampler, but the wrong hardware (2.3GHz 8-Core Intel Core i9, 32 GB 2667MHz DDR4, AMD Radeon Pro 5500M 8 GB)?

Just an update: I attempted to run the full data set for all 149 participants and the average time to complete an iteration on the machine with the aforementioned specifications is 1400s. For a run with 1000 warmups, and 10000 iterations, that’s a run time of 178 days. Even if we move on to using a much more powerful machine, I’m concerned that this intractable run time is simply insurmountable. Does anyone have any other suggestions about how to significantly decrease the run time for the model? Or, to reiterate my question from earlier, is Stan simply not the right sampler for this model?

178 days is pretty long, so you might need to find some additional speedups beyond what I mention here.

First, be aware that the early iterations don’t give good estimates of how long later iterations will take, because you might be dealing with high treedepth early that collapses down to a lower treedepth later (potentially even pretty quickly… in relative terms). If this is going on, you might expect a speedup on the order of fourfold (if the adapted sampler yields treedepth eight) to sixteenfold (if the adapted sampler yields treedepth six). (But note on the other hand that if you are seeing a lot of divergences early, the treedepth after adaptation might actually end up higher than the average treedepth that you are seeing early).

Second, note that your early experiments are possibly noisy to draw firm conclusions. For example, I’m pretty sure that when you say

there is no way that this change should affect the number of leapfrog steps on average. But the number of leapfrogs will be highly sensitive to the initialization (or possibly event to minor numerical variations in the trajectories from fixed inits with fixed seeds) which I suspect is what you’re seeing. That’s not to say that the rewritten way is definitely as fast or faster, but I’d be wary of drawing firm conclusions. Your conclusions about the time per gradient eval are likely to be less sensitive to this sort of issue.

Third, note that if you have access to a machine with many cores, you can very likely take advantage of within-chain parallelization to achieve quite a large speedup before you start to run hard into Amdahl’s Law. I’m not saying that you can get this down to a day with 178 cores per chain, but it wouldn’t be outrageous to think that you might get it down to a few weeks or a month with sufficient resources.

Thanks for the reply! We do have access to a machine with many cores, and that’s exactly where we’re trying to go next with this. In order to take advantage of such a set-up would I have to implement one (or more) of the coding changes described in the parallelization section of the Stan manual?

1 Like
	//custom function for lognormal inverse CDF
	matrix logncdfinv(vector x, vector m, vector s) {							

		int szx;													//declaring sample size variable
		int szm;													//declaring the mu/sigma size variable
		szx = size(x);												//calculating size variable
		szm = size(m);												//calculating the mu/sigma size variable
		matrix[szx,szm] y;										    //declaring return variable
		y = exp( rep_matrix(s',szx) * sqrt(2) 
            .* rep_matrix(inv_erfc( -2 .* x + 2 ),szm) 
            + rep_matrix(m',szx));		                            //calculating result
	
		return y;													//returning result

		}

Why do you need y to be a matrix? It’s all elementwise operations so I believe you can just do

	//custom function for lognormal inverse CDF
	matrix logncdfinv(vector x, vector m, vector s) {							

		int szx;													//declaring sample size variable
		int szm;													//declaring the mu/sigma size variable
		szx = size(x);												//calculating size variable
		szm = size(m);												//calculating the mu/sigma size variable
		matrix[szx,szm] y;										    //declaring return variable
		y = rep_matrix(exp(s' * sqrt(2) 
            .* inv_erfc( -2 .* x + 2 ) + m'), szm);		                            //calculating result
	
		return y;													//returning result

		}

Now, I see that you loop over the columns of xtrans later anyway and all these are the same. So just return a vector from logncdinv.

	//custom function for lognormal inverse CDF
	vector logncdfinv(vector x, vector m, vector s) {							

		int szx;													//declaring sample size variable
		int szm;													//declaring the mu/sigma size variable
		szx = size(x);												//calculating size variable
		szm = size(m);												//calculating the mu/sigma size variable
		vector[szm] y;										    //declaring return variable
		y = exp(s * sqrt(2) 
            .* inv_erfc( -2 .* x + 2 ) + m);		                            //calculating result
	
		return y;													//returning result

		}

I have a suspicion that the entire casandre_log can be rewritten. For example, anything with xtrans can be pulled out the loops since we know it’s the same vector for each i and you only need to do any operations on that once for the entire program. Right now you’re doing it for each ns and then for each nt within the function!

If the values in that normal_cdf call repeats, you can cache those so you’re not computing them again.

On that final element wise sum, that’s good to profile and good to know that the trace gradient is slower. I believe you can also try sum(columns_dot_product(resps, log(llhc))

2 Likes

Thanks for your help! In this case, the matrix returned from logncdfinv should have one mu and sigma parameter per column, and a different, equally spaced, x-axis value (to be transformed) per row, so every element is unique. This would result in a matrix with a column vector per participant containing their particular transformed x-axis, returned as a column vector each in the xtrans matrix that’s called into the likelihood function.

See my post above for suggested changes to your lpdf function. I think EOD without reparametrizing the model you will need to utilize reduce_sum. Though if you have 64 or 128 cores you should see a pretty big speedup. Not linear, but it should make the model take maybe a week or three instead of a few months.

If you are able to see what draws tend to need a very high tree depth for your current runs, that can help you understand where the model is having trouble so you can find places to reparameterize.

Also looking at your priors again, beta(0.1, 1) seems like a really tight prior. Can you try loosening up your priors to make them more semi-informative?

What is casandre btw? I’ve never heard of a distribution by that name and am not sure what to google

I’m working on implementing more of the changes you suggested! In the meantime, CASANDRE comes from this paper:

https://www.nature.com/articles/s41562-022-01464-x

CASANDRE is a two step process model of confidence that is used to evaluate metacognitive ability and metacognitive bias. It assumes that decisions about stimuli are made by comparing a noisy estimate of the relevant stimulus feature to a decision criterion, and that decisions about confidence are made by comparing the signal to noise ratio of the relevant stimulus feature to a confidence criterion. The signal strength is modeled as the distance between the noisy estimate of the relevant stimulus feature and the decision criterion, while the noise is modeled as the uncertainty present in the estimation. Importantly, the model assumes that the uncertainty present in the estimation cannot be directly known, but is, itself, an estimation, the accuracy of which depends on metacognitive ability and bias.

Metacognitive ability is mathematically modeled in CASANDRE as the standard deviation of this estimated uncertainty, termed meta-uncertainty, where larger values suggest weaker metacognitive ability, while metacognitive bias is modeled as the confidence criterion to which this signal to noise ratio is compared to to determine if confidence is low or high.

What you see specified in our code is the likelihood of the choice data given that likelihood is accurately modeled by CASANDRE. This takes into account the guess rate, stimulus sensitivity, decision criterion, meta-uncertainty, confidence criterion, and stimulus strength. In this case the stimulus strength depends only on the magnitude of the deflection from vertical of an oriented Gabor patch that a participant is asked to report the direction of rotation of. They can either respond clockwise or counterclockwise, and can report that they either have high or low confidence in their decision. These options are represented as the four columns of the choice matrix, where a ‘1’ in any column represents the actual decision of the participant, and ‘0’ in the remaining columns represents the choices not made. This choice matrix is multiplied by the parameter-and-stimulus-dependent likelihood matrix calculated by the CASANDRE model to give the likelihood of the parameters given the participant’s choices.