How can I write a hurdle ordinal model custom family in brms?

I’m interested in writing a hurdle ordinal regression model as a brms custom family. I’ve successfully implemented something in Stan using the following model:

data{
  int n;
  int n_condition;
  int condition[n];
  int y[n];
  int n_y;
  int weight[n];
}
transformed data{
  int num_cuts = n_y - 2;
  int K = n_y-1;
}
parameters{
  real<lower=0, upper=1> theta[n_condition];
  ordered[num_cuts] cuts[n_condition];
}
model{
  theta ~ beta(2, 2);
  
  for(nn in 1:n){

  if(y[nn] == max(y)){
    target += weight[nn] * log1m(theta[condition[nn]]);
  }
  else{
    target += weight[nn] *  (log(theta[condition[nn]]) + ordered_logistic_lpmf(y[nn] |  0, cuts[condition[nn]])) ;  
  }
  }
  
}
generated quantities{
  
  array[n_condition, K] real proba;
  array[n_condition, K] real logit_proba;


  
  for(i in 1:n_condition){
    for(k in 1:K){
      if(k==1){
        proba[i, k] = inv_logit(cuts[i, 1]);
      }
      else if(k==K){
        proba[i, k] = 1 - inv_logit(cuts[i, K-1]);
      }
      else{
        proba[i, k] = inv_logit(cuts[i, k]) - inv_logit(cuts[i, k-1]);
      }
      
    }
  }
  
  logit_proba = logit(proba);
}

In short, I’ve coded outcomes which fail to pass the hurdle as the largest category (this is arbitrary and could be recoded so that they are 0). For these outcomes, I add \log(1-\theta) to the target density. Here, \theta is the probability of passing the hurdle).

For the reamining outcomes, I use an ordered logistic likelihood, where the cuts vary by condition (essentially a group identifier). The data are counts of outcomes, and so the contributions to the target density are weighted by the counts.

I’m attempting to write this model in brms. Following some of the documentaiton, I’ve written a custom family as…

hurdle_ordinal <- custom_family(
  'hurdle_ordinal', dpars = c('theta', 'mu', 'Intercept'),
  links = c('logit', 'identity', 'identity'),
  type = 'int'
)

stan_density<-'
real hurdle_ordinal_lmpf(int y, real theta, real mu, ordered Intercept){
    if(y==0}
    return log1m(theta);
  }
  else{
    return log(theta) + ordered_logistic_lpmf(y |  mu, Intercept) ;  
  }
}'

stanvars <- stanvar(scode = stan_density, block='functions')


f<- brm(y|weights(weight) ~ condition, 
    data = brm_model_data,
    family = hurdle_ordinal,
    stanvars = stanvars)

But when I attempt to run the model I’m returned the following error:Error: Currently 'dirichlet' is the only valid prior for simplex parameters. See help(set_prior) for more details.

Clearly, there is a problem with the priors. Here is what is returned from get_prior

f<- bf(y|weights(weight) ~ condition, 
    family = hurdle_ordinal)

get_prior(f, data=brm_model_data)

  prior     class               coef group resp dpar nlpar   lb   ub       source
               (flat)         b                                                         default
               (flat)         b    conditionAugust                                 (vectorized)
               (flat)         b  conditionFebruary                                 (vectorized)
               (flat)         b   conditionJanuary                                 (vectorized)
               (flat)         b      conditionJuly                                 (vectorized)
               (flat)         b proba                                 (vectorized)
               (flat)         b     conditionMarch                                 (vectorized)
               (flat)         b       conditionMay                                 (vectorized)
               (flat)         b   conditionOctober                                 (vectorized)
               (flat)         b conditionSeptember                                 (vectorized)
 student_t(3, 2, 2.5) Intercept                                                         default
       logistic(0, 1)     theta                                          <NA> <NA>      default

I believe the Intercept class should have a student t prior, so I think that’s the problem.

Questions

  • Have I set up the custom family for my model correctly?
  • If not, how can I correctly set it up? Where is my error?

Not too familiar with ordered logistic regression, but the doc for ordered_logistic_lpmf seems to say that the argument c in ordered_logistic_lpmf(k | eta, c) should be an ordered vector of length at least 2. Your custom family is using Intercept as this argument, but for any row in the data Intercept will be a single real, not an ordered vector of length at least 2.

This doesn’t quite answer your question, but if you haven’t seen it, this GitHub issue is on topic: Feature request: adding a hurdle cumulative family · Issue #1429 · paul-buerkner/brms · GitHub

2 Likes

To follow up on this, a hurdle_cumulative() family is now in brms. You can use it by installing the dev version of brms from github. The hurdle value must be 0 or the lowest level in an ordered factor.

See an example here, though you’ll need to load brms normally rather than from local (loading locally was done to make sure it worked before merging forks).

Let me know what is broken.

2 Likes