# New Model+code - Ordered beta regression for binary responses plus proportions

Hi all -

EDIT 1/30: I updated the code below to be 1) much faster and 2) calculate LOO values. The R code now directly compares the ZOIB model, rstanarm stan_betareg, and the ordered beta regression model.

I recently implemented a new model that tries to incorporate the over-dispersion in proportion/Beta distribution outcomes as often occurs in experiments where people have to divide some number between 0 and 1 (we’re always so over-confident and select 0 or 1). Compared to other models out there, like the ZOIB, this one fully incorporates the information in the 0/1 “degenerate” responses with the proportions in a framework similar to ordered logistic regression.

As a result, this model will produce a single interpretable coefficient estimate for the entire dataset (0s, 1s, the whole shebang), and generally has less uncertainty in estimates than the ZOIB while still fitting the whole data. Unlike the standard Beta regression, the posterior predictive distribution can reproduce a highly inflated (lots of 0s and 1s) distribution. See the R code for details.

zoib.stan (2.5 KB) ordered_beta_reg.R (4.6 KB) beta_logit.stan (4.4 KB)

I’m posting here to get feedback (I know a lot of other people have these types of responses/outcomes). If you want to use the model in published work, please let me know first as it’s still a work in progress.

The Stan code is as follows and is also attached:

//
// Ordinal beta regression model for analying experimental outcomes
// with proportion and degenerate responses (i.e. 0 and 1)
// Models 0/1 as ordered categories above/below (0,1)
// Robert Kubinec
// New York University Abu Dhabi
data {
int<lower=0> N_prop; // number of proportion observations (0,1)
int<lower=0> N_degen; // number of 0/1 observations
int X; // number predictors
vector[N_prop] outcome_prop; // Y in (0,1)
int outcome_degen[N_degen]; // Y in {0,1}
matrix[N_prop,X] covar_prop; // covariate X for proportion outcome
matrix[N_degen,X] covar_degen; // covariate X for degenerate (0,1) outcome
int N_pred_degen; // number of posterior predictive samples for 0/1
int N_pred_prop; // number of posterior predictive samples for (0,1)
int indices_degen[N_pred_degen]; // random row indices to use for posterior predictive calculation of 0/1
int indices_prop[N_pred_prop]; // random row indices to use for posterior predictive calculation of (0,1)
int run_gen; // whether to use generated quantities
}
parameters {
vector[X] X_beta; // common predictor
ordered[2] cutpoints; // cutpoints on ordered (latent) variable (also stand in as intercepts)
real<lower=0> kappa; // scale parameter for beta regression
}
transformed parameters {
// store matrix calculations

vector[N_degen] calc_degen;
vector[N_prop] calc_prop;

// drop the intercepts so everything is relative to the cutpoints
calc_degen = covar_degen*X_beta;
calc_prop = covar_prop*X_beta;

}
model {

// vague priors
X_beta ~ normal(0,5);
kappa ~ exponential(1);
cutpoints[2] - cutpoints[1] ~ normal(0,3);

// need separate counters for logit (0/1) and beta regression

for(n in 1:N_degen) {
if(outcome_degen[n]==0) {
// Pr(Y==0)
target += log1m_inv_logit(calc_degen[n] - cutpoints[1]);
} else {
//Pr(Y==1)
target += log_inv_logit(calc_degen[n] - cutpoints[2]);
}
}

for(n in 1:N_prop) {
// Pr(Y in (0,1))
target += log(inv_logit(calc_prop[n] - cutpoints[1]) - inv_logit(calc_prop[n] - cutpoints[2]));
// Pr(Y==x where x in (0,1))
target += beta_proportion_lpdf(outcome_prop[n]|inv_logit(calc_prop[n]),kappa);
}

}

generated quantities {

vector[run_gen==0 ? 0 : N_pred_degen+N_pred_prop] regen_degen; // which model is selected (degenerate or proportional)
vector[run_gen==0 ? 0 : N_pred_degen+N_pred_prop] regen_all; // final (combined) outcome -- defined as random subset of rows
vector[run_gen==0 ? 0 : N_pred_degen+N_pred_prop] ord_log; // store log calculation for loo

if(run_gen==1) {
if(N_pred_degen>0) {
// first do degenerate outcomes
// note: these could be *re-generated* as beta/propotions
for(i in 1:num_elements(indices_degen)) {

// draw an outcome 0 / prop / 1
regen_degen[i] = ordered_logistic_rng(covar_degen[indices_degen[i],]*X_beta,cutpoints);

if(outcome_degen[i]==0) {
ord_log[i] = log1m_inv_logit(calc_degen[i] - cutpoints[1]);
} else {
ord_log[i] = log_inv_logit(calc_degen[i] - cutpoints[2]);
}

if(regen_degen[i]==1) {
regen_all[i] = 0;
} else if(regen_degen[i]==3) {
regen_all[i] = 1;
} else {
// did not occur in original data but could re-occur probabilistically
regen_all[i] = beta_proportion_rng(inv_logit(covar_degen[indices_degen[i],]*X_beta),kappa);
}

}

if(N_pred_prop>0) {
// now do originally proportional outcomes
// can be re-generated as 0s or 1s

int skip = num_elements(indices_degen);

for(i in 1:num_elements(indices_prop)) {

// draw an outcome 0 / prop / 1
regen_degen[i+skip] = ordered_logistic_rng(covar_prop[indices_prop[i],]*X_beta,cutpoints);

ord_log[i+skip] = log(inv_logit(calc_prop[indices_prop[i]] - cutpoints[1]) - inv_logit(calc_prop[indices_prop[i]] - cutpoints[2])) +
beta_proportion_lpdf(outcome_prop[indices_prop[i]]|inv_logit(calc_prop[indices_prop[i]]),kappa);

if(regen_degen[i+skip]==1) {
regen_all[i+skip] = 0;
} else if(regen_degen[i+skip]==3) {
regen_all[i+skip] = 1;
} else {
// did not occur in original data but could re-occur probabilistically
regen_all[i+skip] = beta_proportion_rng(inv_logit(covar_prop[indices_prop[i],]*X_beta),kappa);
}

}
}
}
}

}

5 Likes

Can’t comment much on the approach, but I’ve got some suggestions for the code.

There are some composed functions that you could use here:

• log_inv_logit
• log1m_inv_logit
• log_inv_logit_diff

Also for alpha1 + covar_degen[n,]*X_beta, can you pull that out of the loop as a single matrix multiplication? Same with alpha1 + covar_prop[n,]*X_beta

With the creation of regen_degen, the theta probabilities are constructed in the same way as is done for the ordered_logistic_rng, so you could use that directly instead.

Hope some of that helps!

3 Likes

Hi @saudiwin,

I cannot look into the technical implementation right now, but the idea is attractive. A more formal treatment would be very valuable (and I would personally love to read a preprint or a paper about your model).

1 Like

Sure thing, I’ll send it your way once I have something written up. I am thinking of submitting to StanCon as a reproducible document I think would be very helpful for this modeling problem.

I’m currently doing simulations comparing regular beta regression, OLS, my model, and 0/1 inflated beta regression on a bunch of metrics (RMSE, marginal effects, LOO, etc). Results so far are pretty interesting – 0/1 inflation does well on RMSE/returning correct coefficient, but is less precise than the model I’m proposing. It also underestimates marginal effects (seems to be a result of using separate processes for 0/1). OLS does really well at returning marginal effects, at least under certain conditions (e.g., uni-modality). OLS also dominates in LOO due to model parsimony. Transforming the outcome to be within (0,1) seems like a bad idea – you get both incorrect coefficients & marginal effects.

Do you have any datasets you would recommend running these models on? Something from published work perhaps?

1 Like

Looks promising, @saudiwin, thanks for sharing! I can’t look at it right now, but I agree with @matti that this would easily be worth a publication since IMO it’s a much more elegant solution to “extremes inflation” than other alternatives. Even if some other method has tighter fits (OLS, etc.), there mere fact that you can generate the inflations as part of the ordered process rather than as an “independent” process is really strong.

It would be great with a solution that handles N cutpoints, i.e., an arbitrary number of response categories, which would render it a solution to this thread.

1 Like

Thanks! I finished doing simulations and the results are fascinating. The big difference seems to be in kurtosis – when kurtosis of the distribution is too high or too low, OLS either has too little or too much uncertainty as OLS kurtosis is always fixed at 3. see the plot ->

ZOIB seems to work pretty similarly to my model but has higher uncertainty (as you can see in the plot, no dots below 1 = equal variance with ordinal beta regression). I expect this penalty gets worse as the number of parameters increase as ZOIB requires 2 additional sets of regressors.

I think I am happy enough to start writing it up. This is essentially the model:

For a set of ordered cutpoints K=2, k \in R, Beta distribution function in the \mu, \phi parameterization, a regressor X and associated coefficient \beta and the inverse logit function g(\cdot), the probability of the outcome Y is equal to:

If Y=0:

1 - g(X \beta' - k_1)

If Y>0 and Y<1:

[g(X \beta' - k_1) - g(X \beta' - k_2)]Beta(g(X \beta'),\phi)

If Y=1:

g(X \beta' - k_2)

The trick was to estimate the Beta regression joint with the probability for the middle category. I imagine I could add more cutpoints but not sure to what end – further ordered probability distributions?

I’m hunting for “real world” data to use on this thing, so any recs you have are much appreciated.

2 Likes

Well, I was particularly interested in “slider scale” data sets, where participants answer questions / rate their opinions on digital analogue scales by dragging an indicator with their mouse. This is typical response in psychology. If that interests you, you can probably find data sets with google scholar by using “visual analog scale” or “slide scale”. (Although now that I think of it, I used simulations probably because I couldn’t find open data sets. Oh well.)

Just an update: if you want a fun dataset to try these models on, check out the Pew Forum American Trends Panel. You can get feeling thermometers for all kinds of things, including U.S. politicians, foreign countries, even “college professors” :O.

1 Like

I have some data on canopy loss of infected trees over years. It would awesome to try your model. In my case g is a inv_logit of Gaussian Process. If you could share your model I will update it appropriately with Gaussian Process and we could run it on real data.

Actually the code is available, and you can even fit it with brms. That may be a good solution for you if you want to use a GP prior as I believe those are implemented in brms. See this link to the working paper:

https://osf.io/preprints/socarxiv/2sx6y/

2 Likes