I’m interested in writing a hurdle ordinal regression model as a brms custom family. I’ve successfully implemented something in Stan using the following model:
data{
int n;
int n_condition;
int condition[n];
int y[n];
int n_y;
int weight[n];
}
transformed data{
int num_cuts = n_y - 2;
int K = n_y-1;
}
parameters{
real<lower=0, upper=1> theta[n_condition];
ordered[num_cuts] cuts[n_condition];
}
model{
theta ~ beta(2, 2);
for(nn in 1:n){
if(y[nn] == max(y)){
target += weight[nn] * log1m(theta[condition[nn]]);
}
else{
target += weight[nn] * (log(theta[condition[nn]]) + ordered_logistic_lpmf(y[nn] | 0, cuts[condition[nn]])) ;
}
}
}
generated quantities{
array[n_condition, K] real proba;
array[n_condition, K] real logit_proba;
for(i in 1:n_condition){
for(k in 1:K){
if(k==1){
proba[i, k] = inv_logit(cuts[i, 1]);
}
else if(k==K){
proba[i, k] = 1 - inv_logit(cuts[i, K-1]);
}
else{
proba[i, k] = inv_logit(cuts[i, k]) - inv_logit(cuts[i, k-1]);
}
}
}
logit_proba = logit(proba);
}
In short, I’ve coded outcomes which fail to pass the hurdle as the largest category (this is arbitrary and could be recoded so that they are 0). For these outcomes, I add \log(1-\theta) to the target density. Here, \theta is the probability of passing the hurdle).
For the reamining outcomes, I use an ordered logistic likelihood, where the cuts vary by condition
(essentially a group identifier). The data are counts of outcomes, and so the contributions to the target density are weighted by the counts.
I’m attempting to write this model in brms. Following some of the documentaiton, I’ve written a custom family as…
hurdle_ordinal <- custom_family(
'hurdle_ordinal', dpars = c('theta', 'mu', 'Intercept'),
links = c('logit', 'identity', 'identity'),
type = 'int'
)
stan_density<-'
real hurdle_ordinal_lmpf(int y, real theta, real mu, ordered Intercept){
if(y==0}
return log1m(theta);
}
else{
return log(theta) + ordered_logistic_lpmf(y | mu, Intercept) ;
}
}'
stanvars <- stanvar(scode = stan_density, block='functions')
f<- brm(y|weights(weight) ~ condition,
data = brm_model_data,
family = hurdle_ordinal,
stanvars = stanvars)
But when I attempt to run the model I’m returned the following error:Error: Currently 'dirichlet' is the only valid prior for simplex parameters. See help(set_prior) for more details.
Clearly, there is a problem with the priors. Here is what is returned from get_prior
f<- bf(y|weights(weight) ~ condition,
family = hurdle_ordinal)
get_prior(f, data=brm_model_data)
prior class coef group resp dpar nlpar lb ub source
(flat) b default
(flat) b conditionAugust (vectorized)
(flat) b conditionFebruary (vectorized)
(flat) b conditionJanuary (vectorized)
(flat) b conditionJuly (vectorized)
(flat) b proba (vectorized)
(flat) b conditionMarch (vectorized)
(flat) b conditionMay (vectorized)
(flat) b conditionOctober (vectorized)
(flat) b conditionSeptember (vectorized)
student_t(3, 2, 2.5) Intercept default
logistic(0, 1) theta <NA> <NA> default
I believe the Intercept
class should have a student t prior, so I think that’s the problem.
Questions
- Have I set up the custom family for my model correctly?
- If not, how can I correctly set it up? Where is my error?