Ordinal Mixture Model w/ Set Outcome for One Component

I have an uncharacteristically simple question.
I am trying to fit a mixture model using ordinal data. The first component of the mixture is a standard ordinal regression model. The second component of the mixture asserts that participants always respond with the highest rating all of the time (6 in my case). Is there a simple manner in which to implement this in BRMS? I was hoping I could do something like:

// y is an ordinal value [1-6] and iv is categorical
brm(bf(y ~ 1,
mu1 ~ iv + (iv | subject),
mu2 = 6,
theta1 ~ iv + (iv | subject)),
dat,
family = mixture(cumulative(link=“probit”), cumulative(link=“identity”)),
chains = 4)

The idea here is that mu2 = 6 is stating that for that component all of the probability density is on y = 6 and none is on 1-5. But – of course – I know this does not work (cumulative cannot have an identity link and I likewise do not know whether you can assign immutable values as done above).

you have to define a custom family for the 2nd component which you can then pass to the mixture.

Thanks, @paul.buerkner!
That was the first thing I tried, using code similar to the below:

// define a custom family
rec_dist ← custom_family(
“rec_dist”,
type = “int”
)

// define the corresponding Stan density function
stan_funs ← "
real rec_dist_lpmf(int y, real mu) {
if(y == 6)
{
return(log(.9995));
} else {
return(log(.0001));
}
}
"

stanvars = stanvar(scode = stan_funs, block = “functions”)

mix ← mixture(rec_dist, cumulative(link=“probit”))

This should return a probability of ~1 when resp = 6 and ~0 when resp < 6, although if you have advice on how better to implement that I am also all ears!

The problem is that it throws the following error:

Error: Cannot mix ordinal and non-ordinal families.

Is there a way to make BRMS believe this to be an ordinal family or to override the error?

Add specials = "ordinal" to the custom_family call.

Brilliant!
You are a living legend, @paul.buerkner.
I will give this a try and report back only if there is an issue.

1 Like

I have a comment and a few follow-up questions for you @paul.buerkner. The model fits fine now, and even produces reasonable results. I have posted a question pertaining to my specification of the lpmf to the general board, but I think it is fine too.

1.) For anyone else that might try something like this, to get the code to work, I had to make a slight modification, adding a variable titled “threshold” to my custom distribution as follows:

rec_dist ← custom_family(
“rec_dist”,
type = “int”,
special = ‘ordinal’
)
rec_dist$threshold=‘equidistant’

Worked fine once I did.

2.) My full model is a little more complex than the above. The mixture is asymmetric, with the mixing proportion set to 0 for certain trials. I do not believe this can be done in the current version of BRMS, so I had to generate the Stan code and make some changes there. Is there a way to convert the resulting model back into a BRMS object? None of the outputs or parameters change, only a small portion of the likelihood calculation (where I treat certain trials as coming from Component 2 of the mixture, directly). Bonus points if there is a way to set mixing proportions to 0 for certain trials without modifying the Stan code. Best I could come up with was to force the intercept of a predictor (reflecting the trials in question) on Theta1 to an arbitrarily small value using a strong prior. Code modification seemed more principled.

3.) I would like to compare my mixture model to a non-mixture variant to gauge evidence for each. What functions, if any, must be included for bayes_factor and loo to work with my custom distribution within BRMS? Or should they work and produce reasonable results with any custom distribution so long as the lpmf is defined? The reason I ask is because bayes_factor is telling me that my mixture is something like 1,000,000 x more probable than the non-mixture (given similar priors). I want to ensure I was not meant to do something to ensure proper functionality before interpreting that!

Hi Jon, thanks for sharing this. This is a great extension to SDT models available through brms. If you have the code on github I could do a pull request for (2.). Basically you need to just put the stanfit object into brmsfit$fit, and you should be able to then use the usual brms methods on brmsfit. Something like this:

# fit_stan is the stanmodel that you have modified and sampled from
# fit_brms is the original brms model
fit_brms$fit <- fit_stan
fit_brms_new <- update(fit_brms, recompile = FALSE)

(It was a while ago that I did it so make sure that it works by checking the resulting object carefully.)

@paul.buerkner it might be useful to have some kind of an online library for the custom brmsfamilies that people have created–even if they are not officially supported. (I suppose just a github repo would do, and people could source R scripts from there.)

I added the threshold argument to custom_family directly so that one doesn’t have to change it manually after creating the function.

I agree a repo for custom families would be nice. I will make one tomorrow.

3.) For bayes_factor you “just” need reasonable priors on all your parameters.
For loo you need the log_lik function. See vignette("brms_customfamilies") for an example.

Thanks @paul.buerkner!
Sorry – I missed that part of the vignette. I should be in the clear from here on.

@matti, I don’t know how to use GitHub, but I intend to email you back with my code in the near future. It seems to work just fine, producing reasonable output. However, in testing it against an optimization script I developed to fit the typical (non-hierarchical) model, the recollection parameters tend to be lower for this new model. The next step is to simulate some data and run a parameter recovery series to verify all is above board. As a side note, I have also implemented DPSDT models using this procedure in a 2AFC task, which is drastically simpler and requires no fudging of the Stan code at all. I will send those, too, when I get around to it.

I have now made a github repository in which users can share their custom families:

1 Like

Great, I’d be very interested to see how you’ve implemented the 2afc dpsdt model. Thanks

I have run into a new error. I am trying to calculate loo on my mixture model, but any model including the new distribution spits out the following error:

Error in array(eta, dim = c(dim(eta), ncat - 1)) :
negative length vectors are not allowed
In addition: Warning messages:
1: In max(ncat) : no non-missing arguments to max; returning -Inf
2: In array(eta, dim = c(dim(eta), ncat - 1)) :

Traceback:
26. array(eta, dim = c(dim(eta), ncat - 1))
25. .predictor_cs(eta, X = p(cs[[“Xcs”]], i), b = cs[[“bcs”]], ncat = ncat,
r = rcs)
24. predictor_cs(eta, draws, i)
23. predictor.bdrawsl(x, i = i, fdraws = draws)
22. predictor(x, i = i, fdraws = draws)
21. get_dpar(draws, dpar = dp)
20. log_lik_internal.brmsdraws(draws, combine = combine)
19. log_lik_internal(draws, combine = combine)
18. log_lik.brmsfit(object = .x1, newdata = .x2, pointwise = .x3,
resp = .x4)
17. .fun(object = .x1, newdata = .x2, pointwise = .x3, resp = .x4) at #1
16. eval(expr, envir, …)
15. eval(expr, envir, …)
14. eval2(call, envir = args, enclos = parent.frame())
13. do_call(log_lik, ll_args)
12. .fun(criterion = .x1, pointwise = .x2, resp = .x3, k_threshold = .x4,
reloo = .x5, reloo_args = .x6, x = .x7, model_name = .x8,
use_stored = .x9) at #1
11. eval(expr, envir, …)
10. eval(expr, envir, …)
9. eval2(call, envir = args, enclos = parent.frame())
8. do_call(compute_loo, args)
7.=.fun(models = .x1, criterion = .x2, pointwise = .x3, compare = .x4,
resp = .x5, k_threshold = .x6, reloo = .x7, reloo_args = .x8) at #1
6. eval(expr, envir, …)
5. eval(expr, envir, …)
4. eval2(call, envir = args, enclos = parent.frame())
3. do_call(compute_loos, args)
2. loo.brmsfit(m1)
1.loo(m1)

I imagine this has to do with the fact that my custom distribution is masquerading as an ordinal distribution. I have appended the code used to create my custom distribution. I could create some simulated data and provide code necessary to fit the precise model too, but it does not seem it is needed here.

Best I can tell, the issue is either with eta or ncat (as used by predictor_cs). I don’t know what eta represents but I assume ncat refers to the number of ordinal categories. I don’t know how to specify that for the custom distribution.

0.1 Custom Families.R (1.0 KB)

Have you tried it with the latest github version of brms?

Just updated to BRMS 2.8.8 from GitHub and now the model won’t even compile. If I run the following code:

m1 = brm(bf(cor_conf ~ 1,
                         mu1 ~ 0,
                         mu2 ~ condition*exp + (condition | s | participant2) + (1 | w | pair),
                         theta1 ~ condition*exp + (condition | s | participant2) + (1 | w | pair)),
                      family = mix,
                      data = e1_test_dat,
                      stanvars = stan_vars_rect_dist,
                      prior = c(
                        prior(normal(0, 1.25), class = "Intercept", dpar='mu2'),
                        prior(normal(0, 1), class = "b", dpar='mu2'),
                        prior(normal(0, 1), class = "sd", dpar='mu2'),
                        prior(normal(-1, 1), class = "Intercept", dpar='theta1'),
                        prior(normal(0, 1), class = 'b', dpar='theta1'),
                        prior(normal(0, 1), class = "sd", dpar='theta1'),
                        prior(lkj(4), class = "cor")
                      ), 
                     cores=4, save_all_pars=TRUE,  iter=1000, warmup=500, init_r=.5)

I receive the following error:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

Info: left-hand side variable (name=theta1) occurs on right-hand side of assignment, causing inefficient deep copy to avoid aliasing.
Info: left-hand side variable (name=theta2) occurs on right-hand side of assignment, causing inefficient deep copy to avoid aliasing.
variable "temp_Intercept" does not exist.
  error in 'model10614f2a79d6_file10614a3e8763' at line 147, column 70
  -------------------------------------------------
   145:     for (n in 1:N) {
   146:       real ps[2];
   147:       ps[1] = theta1[n] + rec_dist_lpmf(Y[n] | mu1[n], temp_Intercept);
                                                                             ^
   148:       ps[2] = theta2[n] + cumulative_probit_lpmf(Y[n] | mu2[n], temp_mu2_Intercept, disc2);
  -------------------------------------------------

Error in stanc(model_code = paste(program, collapse = "\n"), model_name = model_cppname,  : 
  failed to parse Stan model 'file10614a3e8763' due to the above error.
SYNTAX ERROR, MESSAGE(S) FROM PARSER:

Info: left-hand side variable (name=theta1) occurs on right-hand side of assignment, causing inefficient deep copy to avoid aliasing.
Info: left-hand side variable (name=theta2) occurs on right-hand side of assignment, causing inefficient deep copy to avoid aliasing.
variable "temp_Intercept" does not exist.
  error in 'model106142179dfb0_file10614a3e8763' at line 147, column 70
  -------------------------------------------------
   145:     for (n in 1:N) {
   146:       real ps[2];
   147:       ps[1] = theta1[n] + rec_dist_lpmf(Y[n] | mu1[n], temp_Intercept);
                                                                             ^
   148:       ps[2] = theta2[n] + cumulative_probit_lpmf(Y[n] | mu2[n], temp_mu2_Intercept, disc2);
  -------------------------------------------------

Error in stanc(model_code = paste(program, collapse = "\n"), model_name = model_cppname,  : 
  failed to parse Stan model 'file10614a3e8763' due to the above error.

If I revert to BRMS 2.8.0 on CRAN, it runs fine again.

ok I will take a look at what is going on.

It works now, but you have to amend the signature of your custom family to

real rec_dist_lpmf(int y, real mu, vector threshold)

Still, the current approach to your custom family feals a little bit awkward. I think it would be easier to directly define a custom family which includes the mixture already rather than using the mixture feature. Examples are zero-inflated families which are technically a mixture but have it built-in already.
Actually, your family is pretty much like a zero-inflated family just that it is ordinal and 6 inflated.

Thanks again.
The thought of putting it all-in-one had occurred to me, and perhaps I should do that instead. It only means I would then need to wrangle with the mixture component. I will look into that. If it works out, it would be a good candidate for your new distribution repository!

@paul.buerkner, I hate to keep pestering you over this family of model, but I wanted to see if I could get your opinion on something before I throw in the towel. I have implemented a variant of the model described in this thread and have been preparing to use it for a paper I am working on. It fits the data in question quite well and the results are reasonable. However, being a little uptight, I usually run parameter recovery simulations for any exotic analyses I might use to ensure there are no biases. Long story short, my model is producing biased estimates and I can’t seem to sort out why.

Simulations

I have appended an R script with my current code and simulations, but I quote the relevant portion below:

# This function generates a single subject with a specific d' and mixing proportion (rho)
gen_2afc_sdt_dat = function(d=1, rho=.4, crits=seq(-1,1,.5), ntrials=100, scale=1)
{
  expand.grid(trial=1:ntrials, is_right= c(0, 1)) %>%
    mutate(signal = rnorm(n(), d, scale), 
           noise = rnorm(n(), 0, scale), 
           difference = signal - noise,
           is_recollect = rbinom(n(), 1, rho),
           rating = case_when(
             is_right == 1 & is_recollect == 1 ~ 6,
             is_right == 1 & difference < crits[1] ~ 1,
             is_right == 1 & difference < crits[2] ~ 2,
             is_right == 1 & difference < crits[3] ~ 3,
             is_right == 1 & difference < crits[4] ~ 4,
             is_right == 1 & difference < crits[5] ~ 5,
             is_right == 1 ~ 6,
             is_right == 0 & is_recollect == 1 ~ 1,
             is_right == 0 & difference < crits[1] ~ 6,
             is_right == 0 & difference < crits[2] ~ 5,
             is_right == 0 & difference < crits[3] ~ 4,
             is_right == 0 & difference < crits[4] ~ 3,
             is_right == 0 & difference < crits[5] ~ 2,
             is_right == 0 ~ 1,
             TRUE ~ 999
           )) -> sim2afc
  
  return(sim2afc)
}

# This allows simulation of multilevel data wherein participants vary
gen_2afc_sdt_dat_multi = function(n, m_d=1, sd_d=.3, m_r=.4, sd_r=.1, crits=seq(-.5,.5,.25), ntrials=100, scale=1)
{
  subj_d = rnorm(n, m_d, sd_d)
  subj_r = rnorm(n, m_r, sd_r)
  
  all_dat = NULL
  for(i in 1:n)
  {
    temp = gen_2afc_sdt_dat(subj_d[i], subj_r[i], crits, ntrials, scale)
    temp$id = i
    
    all_dat = rbind(all_dat, temp)
  }
  
  return(all_dat)
}

Basically, participants are given two words on each trial and must indicate which they had studied previously using a confidence rating (1 = Very Sure Left and 6 = Very Sure Right). I assume the response is driven by a mixture of two processes: (a) a process modelled by an ordinal regression based on the difference in activation of either response candidate (the difference of which is called d’); or, (b) a process that always returns a high confidence judgment in favour of the target (which would be 1 if it is on the left and 6 if it is on the right). That is what I simulate above.

The Model

The model I fit in BRMS reflects a mixture model between (1) an ordinal regression with a scale of 2**.5 (the difference between two normal distributions with SD = 1 will be 2**.5), and, (2) a distribution that always returns a high confidence judgment in favour of the correct item (if the item is on the left, this would be 1; if the item is on the right, this would be 6). Theta, reflects the mixture proportion and d’ is calculated by arbitrarily contrasting trials where the item is on the right to trials where the item is on the left. Estimates of d’ are converted post-hoc into its original scale by dividing by 2**.5 (reversing the impact of our disc parameter). The code is here:

# Simulate some data
sim_dat = gen_2afc_sdt_dat_multi(30)

# Define the custom family
rec2_dist <- custom_family(
  "rec2_dist",
  type = "int",
  special = 'ordinal',
  vars = 'is_right[n]'
)
rec2_dist$threshold='equidistant'

# Define the corresponding Stan density function
stan_funs <- "
  real rec2_dist_lpmf(int y, real mu, int is_right) {
    if(is_right == 1) {
      if(y == 6)
      {
        return(log(.9995));
      } else {
        return(log(.0001));
      }
    } else {
      if(y == 1)
      {
        return(log(.9995));
      } else {
        return(log(.0001));
      }
    }
  }
"

# Put them together for inclusion in our model; we also need to provide a vector
# indicating which items are right (used in the above function); I can take this from
# the earlier simulated data so long as the analyzed data have similar structure
stanvars = stanvar(scode = stan_funs, block = "functions") + 
  stanvar(sim_dat$is_right, "is_right", scode = "  int is_right[N];")

# Create the mixture
mix <- mixture(rec2_dist, cumulative(link="probit"))

# Set some priors
Prior <- c(prior(normal(1, 1), class = "b", dpar='mu2'),
           prior(normal(0, 1.5), class = "Intercept", dpar='mu2'),
           prior(logistic(0, 1), class = 'Intercept', dpar='theta1'),
           prior(normal(0, 1), class = "sd", dpar='mu2'),
           prior(normal(0, 1), class = 'sd', dpar='theta1'),
           prior(lkj(4), class = "cor")
)

# Fit the model
mod_2afc <- brms(bf(rating ~ 1,
                             mu1 ~ 0,
                             mu2 ~ is_right + (is_right|id),
                             theta1 ~ 1 + (1|id),
                             disc2=(1/2**.5)),
                             family = mix,
                             data = sim_dat,
                             stanvars = stanvars,
                             prior = Prior,
			     cores=4, iter=1000, init_r=.5,
                       	     control = list(adapt_delta=.9))

Bottom line is that the model fits fine – both to simulated data and to my actual data – but produces biased estimates in the former. Under some circumstances it overestimates theta/rho (I observed this when d’ is small and criteria are tightly clustered, producing an ROC curve with most responses at 1 or 6) whereas under other circumstances it overestimates d’ (this is observed in the simulation provided wherein corrected d’ is estimated closer to 1.3 than to 1 after correction). I have been trying to sort this out for weeks but am at my wit’s end. Is there something obvious I am missing?

PS: I have not incorporated a threshold parameter or converted the distribution into a self-encapsulated form yet; I figure the first step is to develop a working model.

2AFC_DPSDT_Sims.R (9.4 KB)

1 Like

There is nothing obvious I see going wrong here, but I am not deep enough into the details of the model to see clearly. Generally, I would recommend to write down your mixture as a single custom family instead of using the mixture feature of brms. This should result in a cleaner implementation of the model.

Chiming in late here. Maybe I’m misunderstanding something, but I wonder if the log(.9995)/log(.0001) bit might be an issue. I agree with @paul.buerkner’s thinking that looking at the examples of zero-inflated models might be helpful here. For 6-inflation on an ordinal scale, couldn’t one do:

  for (n in 1:N) {
    if (y[n] == 6)
      target += log_sum_exp(bernoulli_lpmf(1 | theta),
                            bernoulli_lpmf(0 | theta)
                              + ordered_logistic_lpmf(y[n] | gamma, cutpoints));
    else
      target += bernoulli_lpmf(0 | theta)
                  + ordered_logistic_lpmf(y[n] | gamma, cutpoints);
  }

In which case I think the appropriate shorthand function for incrementing the target would be:

  real rec_dist_lpmf(int y, real gamma, vector cutpoints) {
    if(y == 6){
      return(log_sum_exp(bernoulli_lpmf(1 | theta),
                         bernoulli_lpmf(0 | theta)
                          + ordered_logistic_lpmf(y[n] | gamma, cutpoints)));
    } else {
      return(bernoulli_lpmf(0 | theta)
                  + ordered_logistic_lpmf(y[n] | gamma, cutpoints));
    }
  }

No?