# Error in categorical models where one category is not observed in the data

The Problem: I am preparing to run an experiment where we attempt to identify which of several models of expected value best predicts which of several informational cues individual subjects will choose. We anticipate that different subjects will value cues differently, and so we want to obtain a measure of the model the explains each individual subject’s cue preferences. To do this, for each subject we aim to fit 4 models, each predicting participant’s cue preferences from one of 3 different measures of expected value. Then, for each subject, we intend to run model comparison to determine the measure of expected value that best explained that subject’s choices.

The issue with this is that some subjects will presumably never pick a given cue, which leads brms to output the following error (even though we set nl = TRUE):

Error: The parameter ‘mu2’ is not a valid distributional or non-linear parameter. Did you forget to set ‘nl = TRUE’?

I’m wondering if there is some way around this. Given our model estimates how much a measure of expected value predicts cue preferences across the experiment, I’m not sure if there is an inherent issue a participant never choosing a cue.

The data: Note that even though there are 5 cues total in this experiment, players will only be presented with a subset of 3 cues to choose from on any given trial. Every block of 15 trials, the subset will change, as well as other conditions that effect the expected value of each cue. There are 7 blocks (conditions) in total. Attached is simulated data that produces this issue. In this data the participant never chooses the 2nd cue throughout the entire experiment. Q1-5 are variables that encode the expected value of each cue under one of our models (centered around Q3). Q1-5_is_absent are variables that encode whether or not a cue is absent on any given block. The 3rd cue is a reference cue that is never absent.

Simulated_Data.csv (6.6 KB)

The Model: to estimate how much the measure of expected value predicts participant’s cue choices we use the following model:

When including Stan code in your post it really helps if you make it as readable as possible by using Stan code chunks (```stan) with clear spacing and indentation. For example, use

Q_Model <- brm(data = d,
family = categorical(link = logit, refcat =  3),
bf(Cue_Choice ~ 1,
nlf(mu1 ~ a1 + bEV * Q1 + -10000*Q1_Is_Absent),
nlf(mu2 ~ a2 + bEV * Q2 + -10000*Q2_Is_Absent),
nlf(mu4 ~ a4 + bEV * Q4 + -10000*Q4_Is_Absent),
nlf(mu5 ~ a5 + bEV * Q5 + -10000*Q5_Is_Absent),
bEV ~ 1,
a1 + a2 + a4 + a5 ~ 1),
nl = TRUE,
prior = c(prior(normal(0, 1), class = b, nlpar = a1),
prior(normal(0, 1), class = b, nlpar = a2),
prior(normal(0, 1), class = b, nlpar = a4),
prior(normal(0, 1), class = b, nlpar = a5),
prior(normal(0, 5), class = b, nlpar = bEV)),
iter = 2000, warmup = 1000, cores = 3, chains = 3, seed = 123)

a1-5 are intercepts that code for inherent bias towards or against any of the cues, bEV is the beta predicting on our measure of expected value, and does not vary for each cue. Finally, -10,000*Q1-5_is_absent is a term that means to zero out the probability of choosing a given cue on a trial on which it is absent by adding an absurdly negative value that should always evaluate to 0 when fed through the logit link function.

I think you need to set nl = TRUE in your call to bf(), not to brm(). Try:

Q_Model <- brm(data = d,
family = categorical(link = logit, refcat =  3),
bf(Cue_Choice ~ 1,
nlf(mu1 ~ a1 + bEV * Q1 + -10000*Q1_Is_Absent),
nlf(mu2 ~ a2 + bEV * Q2 + -10000*Q2_Is_Absent),
nlf(mu4 ~ a4 + bEV * Q4 + -10000*Q4_Is_Absent),
nlf(mu5 ~ a5 + bEV * Q5 + -10000*Q5_Is_Absent),
bEV ~ 1,
a1 + a2 + a4 + a5 ~ 1,
nl = TRUE),
prior = c(prior(normal(0, 1), class = b, nlpar = a1),
prior(normal(0, 1), class = b, nlpar = a2),
prior(normal(0, 1), class = b, nlpar = a4),
prior(normal(0, 1), class = b, nlpar = a5),
prior(normal(0, 5), class = b, nlpar = bEV)),
iter = 2000, warmup = 1000, cores = 3, chains = 3, seed = 123)

Although I’m not sure you should/need to use nlf() inside bf(), so perhaps:

bf(Cue_Choice ~ 1,
mu1 ~ a1 + bEV * Q1 + -10000*Q1_Is_Absent,
mu2 ~ a2 + bEV * Q2 + -10000*Q2_Is_Absent,
mu4 ~ a4 + bEV * Q4 + -10000*Q4_Is_Absent,
mu5 ~ a5 + bEV * Q5 + -10000*Q5_Is_Absent,
bEV ~ 1,
a1 + a2 + a4 + a5 ~ 1,
nl = TRUE)

Thanks for the suggestion!. However, running the first set of code:

Q_Model <- brm(data = d,
family = categorical(link = logit, refcat =  3),
bf(Cue_Choice ~ 1,
nlf(mu1 ~ a1 + bEV * Q1 + -10000*Q1_Is_Absent),
nlf(mu2 ~ a2 + bEV * Q2 + -10000*Q2_Is_Absent),
nlf(mu4 ~ a4 + bEV * Q4 + -10000*Q4_Is_Absent),
nlf(mu5 ~ a5 + bEV * Q5 + -10000*Q5_Is_Absent),
bEV ~ 1,
a1 + a2 + a4 + a5 ~ 1,
nl = TRUE),
prior = c(prior(normal(0, 1), class = b, nlpar = a1),
prior(normal(0, 1), class = b, nlpar = a2),
prior(normal(0, 1), class = b, nlpar = a4),
prior(normal(0, 1), class = b, nlpar = a5),
prior(normal(0, 5), class = b, nlpar = bEV)),
iter = 2000, warmup = 1000, cores = 3, chains = 3, seed = 123)

generates the same error:

Error: The parameter 'mu2' is not a valid distributional or non-linear parameter. Did you forget to set 'nl = TRUE'?

The second set of code (dropping the nlfs) generates the following error:

Error in terms.formula(formula, ...) :
invalid model formula in ExtractVars

A potential solution I have found is to include an impossible observation in the data. In short, at the beginning of the data, I add an artificial trial where the unobserved cue, cue 2 in this case, is chosen in a condition where it is also absent (an impossibility). You can see this data here:
Simulated_Data_Fix.csv (6.7 KB)

Then I run this modified model that zeros out all cue-specific intercepts when a cue is absent:

Q_Model <- brm(data = d,
family = categorical(link = logit, refcat =  3),
bf(Cue_Choice ~ 1,
nlf(mu1 ~ a1*(1-Q1_Is_Absent) + bEV * Q1 + -10000*Q1_Is_Absent),
nlf(mu2 ~ a2*(1-Q2_Is_Absent) + bEV * Q2 + -10000*Q2_Is_Absent),
nlf(mu4 ~ a4*(1-Q4_Is_Absent) + bEV * Q4 + -10000*Q4_Is_Absent),
nlf(mu5 ~ a5*(1-Q5_Is_Absent) + bEV * Q5 + -10000*Q5_Is_Absent),
bEV ~ 1,
a1 + a2 + a4 + a5 ~ 1,
nl = TRUE),
prior = c(prior(normal(0, 1), class = b, nlpar = a1),
prior(normal(0, 1), class = b, nlpar = a2),
prior(normal(0, 1), class = b, nlpar = a4),
prior(normal(0, 1), class = b, nlpar = a5),
prior(normal(0, 5), class = b, nlpar = bEV)),
iter = 2000, warmup = 1000, cores = 3, chains = 3, seed = 123)

Because all parameters are set to 0 when a cue is absent, in theory the added impossible trial should not influence parameter values. Because all cues are observed, the model appears to run fine. However this method feels pretty jenky. I’m wondering if there is a reason I should be suspicious of it? Recovered parameters have relatively high variance, but that may just be a reflection of the complexity of the model.