Computational Model of Choice Behavior using BRMS

I am building a model of choice behavior based on published choice behavior (2 options) equations in the context of task data (120 trials, 4 runs of 30 trials). The goal is to estimate values for random effects used to define participant choice for each trial. The equation below defines the response space:

SV ~ [p - beta(A/2)] * V^alpha

p = objective probability (from task)
A = Ambiguity (from task)
V = Amount (from task)
alpha = random effect (individual specific parameter)
beta = random effect (individual specific parameter)

Choice ~ 1/(1 + exp^(lambda(SVa-SVb))

lambda = slope of logistic function (individual specific parameter)
SVa = SV for A = 0
SVb = SV for A > 0

I am relatively new to Bayesian/BRMS modeling, so I am still getting comfortable with the specification/coding of multilevel models, so all feedback is helpful. This was my first attempt to model the above problem and the estimation took about 4 hrs and I am not sure I translated it appropiately. The model didn’t mix well and Rhat values suggest I am misspecifying the model.

final_model <- brm(
    bf(Choice ~ 1 + (1 / (1 + exp(lambda * (((P- beta (A=0/ 2))* V^alpha) - (P - beta(A>0/ 2))* V^alpha),
     alpha ~ 1 + (1 | Subject),     
     beta ~ 1 + (1 | Subject),     
     lambda ~ 1 + (1 | Subject),     
     nl = TRUE),  
  data = DataFile,
  family = binomial(),
prior = c(
    set_prior("student_t(3, 1, 0.5)", nlpar = "alpha"), # t-distribution for alpha
    set_prior("student_t(3, 0, 0.5)", nlpar = "beta"),  # t-distribution for beta
    set_prior("normal(0, 5)", nlpar = "lambda"),        # Normal distribution for lambda
    set_prior("normal(0, 5)", class = "sd", group = "Subject", coef = "Intercept", nlpar = "alpha"),
    set_prior("normal(0, 1)", class = "sd", group = "Subject", coef = "Intercept", nlpar = "beta"),
    set_prior("normal(0, 1)", class = "sd", group = "Subject", coef = "Intercept", nlpar = "lambda")
  chains = 4,
  iter = 8000,
  control = list(adapt_delta = 0.99)

I am particularly interestested in how to estimate the trial by trial values for each of the random effects and realize that trial isn’t in my formula yet, but I plan to build this as I better understand how to estimate the hyperparameters in this type of model.

Couple thoughts that you’ll usually see expressed here:

  • Start with a simpler model. One case/subject. Intercepts only on the parameters – no varying effects. If you can get it working there, then build up.
  • Use the sample_prior = “only” argument to see how well your priors work together in capturing the likely outcomes space.
  • If you know your parameters (i.e., alpha, beta, lambda) can only be positive, wrap them in exp() in the formula. Similarly, if they have to be between 0 and 1, you can use inv_logit(). That allows the parameter sampling to be unconstrained and then transformed later.
  • If your model doesn’t work well with 1-2K iterations, try to fix the model before increasing iterations to 8K or increasing adapt_delta.

I can think of two potential reason for the model converge problem.
First, sigmoid/softmax often confront the numerical instability problem. Therefore, instead of using your custom sigmoid function, inv_logit function embodied in the brms/stan might be a better option.

Second, for expected utility model, alpha parameter should be larger than 0, however, in your current model, there is no boundary for alpha. exp(alpha) can make sure alpha indeed larger than 0.

Oh wow, I didn’t know you could do that type of model with brms? In the past, for human reinforcement learning and choice models, I have used Stan directly, you can find many examples online, I’ll just post one I have below. This is for one person, I’ve found that if my main purpose was to use Bayesian modelling to avoid unreasonable parameter estimates, using non-hierarchical with a prior worked better than hierarchical (faster, fewer problems converging). But it might be that it’s because of my special situation where I have >500 participants to fit, so hierarchical models take very long and then if there was a problem, you have to rerun everything.

  int ntr;          // number of total data points
  real L_prob[ntr];  // left reward prob
  real L_rew_mag[ntr]; // left rew mag
  real L_loss_mag[ntr]; // left loss mag
  real R_prob[ntr];  // right reward prob
  real R_rew_mag[ntr]; // right rew mag
  real R_loss_mag[ntr]; // right loss mag
  int choice[ntr];  // choice on each trial (1 for left, 0 for right)
  // between trial terms
  real loss_scale[ntr];

  real<lower=0> invTemp;
  real lossW;
  real b_loss_scale;

  real utils[ntr,2];
  real invTxUtil[ntr];

  invTemp     ~ normal(5,7);
  // center biases on zero as this would mean 'no bias', center loss weight on -1 as this would mean 'as much as reward'
  lossW                 ~ normal(-1,3);
  b_loss_scale          ~ normal(0,2);

  // Actual model
  for (itr in 1:ntr){
    // Compute utilities
    // Left
    utils[itr,1] = L_prob[itr]*L_rew_mag[itr] +
            (lossW + b_loss_scale*loss_scale[itr])*
    // Right
    utils[itr,2] = R_prob[itr]*R_rew_mag[itr] +
      (lossW + b_loss_scale*loss_scale[itr])*

    // Combine with inverse temp
    invTxUtil[itr] = invTemp* (utils[itr,1]-utils[itr,2]);
  // Relate utility to choice
  choice ~ bernoulli_logit(invTxUtil);

1 Like

@mingqian.guo & @franzsf Thank you both for your feedback. Using exp(alpha) reduced compute time by 2/3 and mixing was much better.

I went back and examined N=1 for the sample and I have two subjects where the model perform well. A much simpler model appears to function well with these two subjects and my initial thought is to just exclude them from the analysis, but it raised a question I had about whether anyone knows of work being done on empirical methods for excluding cases or for partitioning them into other models?

One potential way is to use a mixture model. An example is p(SV_a) = (1-omega) \times expected utility model prediction + (omega) \times simpler model. prediction. Parameter omega can be regarded as the probability using the simpler model. Peters & D’Esposito, 2020 used this approach to model the orbitofrontal cortex lesion patients data.