The Problem: I am preparing to run an experiment where we attempt to identify which of several models of expected value best predicts which of several informational cues individual subjects will choose. We anticipate that different subjects will value cues differently, and so we want to obtain a measure of the model the explains each individual subject’s cue preferences. To do this, for each subject we aim to fit 4 models, each predicting participant’s cue preferences from one of 3 different measures of expected value. Then, for each subject, we intend to run model comparison to determine the measure of expected value that best explained that subject’s choices.
The issue with this is that some subjects will presumably never pick a given cue, which leads brms to output the following error (even though we set nl = TRUE):
Error: The parameter ‘mu2’ is not a valid distributional or non-linear parameter. Did you forget to set ‘nl = TRUE’?
I’m wondering if there is some way around this. Given our model estimates how much a measure of expected value predicts cue preferences across the experiment, I’m not sure if there is an inherent issue a participant never choosing a cue.
The data: Note that even though there are 5 cues total in this experiment, players will only be presented with a subset of 3 cues to choose from on any given trial. Every block of 15 trials, the subset will change, as well as other conditions that effect the expected value of each cue. There are 7 blocks (conditions) in total. Attached is simulated data that produces this issue. In this data the participant never chooses the 2nd cue throughout the entire experiment. Q1-5 are variables that encode the expected value of each cue under one of our models (centered around Q3). Q1-5_is_absent are variables that encode whether or not a cue is absent on any given block. The 3rd cue is a reference cue that is never absent.
Simulated_Data.csv (6.6 KB)
The Model: to estimate how much the measure of expected value predicts participant’s cue choices we use the following model:
When including Stan code in your post it really helps if you make it as readable as possible by using Stan code chunks (```stan) with clear spacing and indentation. For example, use
d <- read.csv("Simulated_Data.csv")
Q_Model <- brm(data = d,
family = categorical(link = logit, refcat = 3),
bf(Cue_Choice ~ 1,
nlf(mu1 ~ a1 + bEV * Q1 + -10000*Q1_Is_Absent),
nlf(mu2 ~ a2 + bEV * Q2 + -10000*Q2_Is_Absent),
nlf(mu4 ~ a4 + bEV * Q4 + -10000*Q4_Is_Absent),
nlf(mu5 ~ a5 + bEV * Q5 + -10000*Q5_Is_Absent),
bEV ~ 1,
a1 + a2 + a4 + a5 ~ 1),
nl = TRUE,
prior = c(prior(normal(0, 1), class = b, nlpar = a1),
prior(normal(0, 1), class = b, nlpar = a2),
prior(normal(0, 1), class = b, nlpar = a4),
prior(normal(0, 1), class = b, nlpar = a5),
prior(normal(0, 5), class = b, nlpar = bEV)),
iter = 2000, warmup = 1000, cores = 3, chains = 3, seed = 123)
a1-5 are intercepts that code for inherent bias towards or against any of the cues, bEV is the beta predicting on our measure of expected value, and does not vary for each cue. Finally, -10,000*Q1-5_is_absent is a term that means to zero out the probability of choosing a given cue on a trial on which it is absent by adding an absurdly negative value that should always evaluate to 0 when fed through the logit link function.