Using "Reduce Sum" to Parallelize Code in Multi-level Psychometric Model

There are ~1000 participants and fourteen conditions for which I am estimating thresholds via psychometric functions. Intensity is the RW for the tasks, which lengthens as participants answer wrong and decreases (as participants answer right) via an adaptive paradigm.

I have access to a super computer, but no job can last longer than 48 hours; hence, I’m taking on trying to run using multi-threads, as one person on this forum suggested.

I’ve looked through the documentation and think that I am pretty close to having everything ready to go, but I’m wondering whether in the space marked HERE, I should use merely psi (as in the “model” section of the code), or use the entire psi function (as listed in “transformed parameters”).

I might also be way off. Hopefully not.

Thanks much!

functions {
  real partial_sum_lpmf(int[] slice_n_subjects,
                        int start, int end,
                        int[] n_levels,
                        vector intensity,
                        vector lapse,
                        real mum,
                        real muw,
                        vector fA,
                        vector sA,
                        vector sB,
                        vector fsA) {
    return binomial_logit_lupmf(slice_n_subjects |
                                n_levels[start:end],
                               **HERE** );
  }
}

data{  // Everything that must be input by the user
  int<lower=4> n_total;                         // Total number of trials to analyze
  int<lower=2> n_subjects;                      // Number of unique observers
  int<lower=2> n_levels;                        // Number of levels of Factor
  real intensity[n_total];                      // Intensity of the stimulus on each trial
  int<lower=0> subject[n_total];                // Observer on each trial
  int<lower=0> level[n_total];                  // Level of Factor on each trial
  int<lower=0,upper=1> correct[n_total];        // Whether the response was correct (1) on each trial
  real<lower=0,upper=1> chance_performance;     // Chance performance for experiment (e.g., 1/n for n-AFC)
  int<lower=1> grainsize;
}
transformed data{
  int df_levels_sA;          // Number of degrees of freedom in the interaction
  int n_levels_sA;          // Number of levels in the interaction
  
  df_levels_sA = (n_levels - 1) * (n_subjects - 1);
  n_levels_sA = n_levels * n_subjects;
}
parameters{
  vector<lower=0,upper=1>[n_subjects] lapse;        // Observer's lapse rate
  real mum;
  real muw;
  vector[n_levels-1] fA;
  vector[n_subjects-1] sA;
  vector[n_subjects-1] sB;
  //matrix[n_subjects-1,n_levels-1] fsA;
  vector[(n_subjects - 1) * (n_levels-1)] fsA;
}

transformed parameters {
  real m[n_subjects,n_levels];
  real w[n_subjects,n_levels];
  real lapse_alpha;
  real lapse_beta;
  vector [n_total] psi;
  {
    real mean_beta;              // Mean of beta prior for individual lapse rate
    real betaEta;                       // Precision parameter of beta prior for individual lapse rate
    real threshold[n_total];
    real width[n_total];
    vector[n_levels] factor_alpha = append_row(fA, -sum(fA));
    vector[n_subjects] subject_alpha = append_row(sA, -sum(sA));
    vector[n_subjects] subject_beta = append_row(sB, -sum(sB));
    matrix[n_subjects,n_levels] interaction_alpha;

    interaction_alpha = rep_matrix(0, n_subjects, n_levels);
    for(sj in 1:(n_subjects - 1)) {
    for(l in 1:(n_levels - 1)) {
    interaction_alpha[sj, l] = fsA[(sj - 1) * (n_levels - 1) + l];
    }
    interaction_alpha[sj, n_levels] = -1 * sum(interaction_alpha[sj,]);
    }
    for (l in 1:n_levels) {
    interaction_alpha[n_subjects,l] = -1 * sum(interaction_alpha[,l]);
    }


    for (sj in 1:n_subjects) {
    for (l in 1:n_levels) {
    m[sj,l] = mum + factor_alpha[l] + subject_alpha[sj] + interaction_alpha[sj, l];
    w[sj,l] = exp(muw + subject_beta[sj]);
    }
    }

    for (tr in 1:n_total) {
    threshold[tr] = m[subject[tr],level[tr]];
    width[tr] = w[subject[tr],level[tr]];
    psi[tr] = chance_performance
                  + (1 - lapse[subject[tr]] - chance_performance)
                  * inv_logit((intensity[tr]-threshold[tr])/width[tr]);
    if (is_nan(psi[tr])) {
      print("lapse = ", lapse[subject[tr]]);
      print("width = ", width[tr]);
      print("threshold = ", threshold[tr]);
    }
    }

    mean_beta  = 0.01;
    betaEta = 100;
    lapse_alpha = mean_beta * betaEta;
    lapse_beta = (1-mean_beta) * betaEta ;
  }
}

model {
//mean_beta ~ beta(1,60);
//betaEta ~ gamma(1,0.01);
fsA ~ normal(0, inv(sqrt(1 - inv(n_levels_sA)))); // Produces a standard normal on on interaction_alpha
fA ~ normal(0, inv(sqrt(1 - inv(n_levels))));
sA ~ normal(0, inv(sqrt(1 - inv(n_subjects)))); // Produces a standard normal on on interaction_alpha
sB ~ normal(0, inv(sqrt(1 - inv(n_subjects)))); // Produces a standard normal on on interaction_alpha

muw ~ gamma(2,.5);
lapse ~ beta(lapse_alpha,lapse_beta);
correct ~ bernoulli(psi);

target += reduce_sum(partial_sum_lupmf, n_subjects, grainsize,
                       n_levels, intensity, lapse, mum, muw, fA, sA, sB, fsA);
           
}

I have a bit of background with psychophyisical models, but I’m rather lost looking at your code; is there a paper implementing something akin to what you want that I could check out to achieve a better understanding?

Thanks. Yes…this paper here: https://www.sciencedirect.com/science/article/pii/S0042698918300567?via%3Dihub. The context is slightly different, but similar models.

Oh! It’s just a standard cumulative normal model? I’m pretty sure your model is massively over-complicated then.

Check out this tutorial: https://vuorre.netlify.app/post/2017/10/09/bayesian-estimation-of-signal-detection-theory-models-part-1/

Then check out this demo to achieve fastest sampling: https://gist.github.com/mike-lawrence/39412616302925a1941ad92777daa036

Sorry, I had my head in the wrong place in sending you those earlier links. SDT isn’t quite the right domain for this, and while some of the code in my gist will be useful, you need something a bit different.

So you have a binomial outcome that is a 2-component mixture, one being chance (given N options) and the other modelled as a cumulative normal as a function of intensity where the mean of the normal is what you call the threshold and the SD of the normal is what you call the width. You also have another predictor in the level variable by which presumably you want to allow threshold and width to potentially vary. And of course multiple participants. You don’t seem to have any correlation structure in there, which presumably was because you were building up to that but got stuck when this didn’t sample well. There’s a bunch of stuff in there I don’t follow, but if the above accurately describes things, I actually coded something very efficient to achieve nearly this last spring. Gimme a bit and I’ll post a demo…

Quick Q: are these 14 conditions actually structured somehow? (ex. how 3 two-level variables plus all interactions would yield 8 conditions) Or would it be reasonable to treat them as random effects?

(editing my previous answer bc discourse won’t let me post 3 replies in a row!)

The more I look at this, the more I realize you probably had things pretty well optimized in the first place :P But here’s a script that generates data akin to what you have and fits a Stan model. I use terminology from a slightly different psychophysics context, temporal order judgements, but the mapping to yours is pss==threshold and jnd==width, and soa==intensity. This version treats the subjects as “random effects” but any other variables as fixed effects. That’s going to yield a large correlation matrix to fit for your case of 14 conditions, so this may even be slower than your code, but it at least does follow Barr’s “keep it maximal” proscription.

Because I’m curious, I’m going to code up what I think would be a little more similar to what you’ve already done which is a model with that 14-level variable as a random effect, with a third random effect reflecting the interaction between subject and that variable. No promises that it will be any faster, but it’ll at least have a few correlation terms you’re “missing”.

And I really should actually address your original Question: what to put inside the call to reduce_sum. I’d start with just the final likelihood call, and check that it works (possibly with a smaller fake data set like that generated by the R script here), then work on getting as much of the rest of the computation in there as well. As I complete the multiple-random-effects version I might have some input on this latter; in the single-fixed-effect version here I use a trick to reduce redundancy in some of the earlier computations that would prohibit straightforward inclusion of those computations in the reduce_sum call, but I’m not sure those tricks are going to be applicable to the multiple-random effects model.
helper_functions.r (4.9 KB)
lapsing_toj.r (9.1 KB)
lapsing_toj.stan (3.7 KB)

Thanks for your, response…sorry, somehow I missed this in my inbox.

To answer your first question, yes, there is one task that is four conditions, and five tasks that are 2 conditions (congruent/incongruent).

I’ll look through everything you’ve given me and report back soon!