New paper using Stan/brms to estimate feeling thermometers/VAS scales

Hi all -

I posted on this a while ago but I have a working paper now of a new model used to fit bounded DVs/responses such as are found with survey sliders/VAS scales/feeling thermometers. Essentially any continuous variable with lower and upper bounds where it is likely that there will be observations on the bounds.

The reason I came up with this model is because the existing approaches, like the zero-one inflated beta regression (ZOIB), tend to be pretty top heavy. This model is more lightweight but can still incorporate observations on the bounds (i.e. 0/1). Furthermore, the interpretation of the effect of a predictor on the outcome is much more straightforward than for the ZOIB model as it represents a monotonic effect on the outcome across both 0/1s and continuous values.

Any feedback much appreciated!

Paper link: https://osf.io/preprints/socarxiv/2sx6y/

In a Github repo I have both Stan code to fit the model, an Rmarkdown reproducible version of the paper, and a vignette showing how to fit this model in brms with @paul.buerkner 's amazing custom family functionality.

Repo: https://github.com/saudiwin/ordbetareg
Vignette: https://htmlpreview.github.io/?https://github.com/saudiwin/ordbetareg/blob/master/estimate_with_brms.html

14 Likes

Just an update that I caught a bug in the brms code relating to factors and updated the vignette.

Thanks @saudiwin, this looks very useful!
In your empirical example, you have a distribution with peaks at both bounds (0 and 1), and a range of values in between the bounds. Do you have an intuition of what would happen if you only have data at one of the bounds? For example, let’s say you have VAS data ranging from 0.2 to 1, with a clear peak at the upper bound of 1 but no data at the lower bound of 0. How would the lower cutpoint be estimated?

I know you already touch on this in the paper’s Discussion, but I’m wondering if it would then make sense to fit a model with just a single cutpoint, or to fit your model with both cutpoints anyway. Thanks again!

Hi Frank -

Great question. I added a little info to an appendix about this as I think it is a pertinent question. I added the appendix to the Github here:

https://github.com/saudiwin/ordbetareg/blob/master/kubinec_ord_betareg_appendix.pdf

The cutpoints are weakly identified by the prior (i.e., the difference between them is Normally distributed), so it is really OK to fit a model with no observations at the bounds (or just one bound). It’s probably better to do so in the likely case that future iterations of your data could have observations at the bounds. You would need to have data that you were sure did not have any observations at the bounds (something where the probability of the data vanishes as the bounds approach).

The cutpoints will simply be estimated at far ends of the latent scale, and you will be able to generate them from the posterior predictive distribution with small probability. As such, it’s not a bad thing to have them as you could later combine data from future samples that have such observations at the bounds (or create realistic predictive data with observations at the bounds).

Let me know if you have any other questions!

2 Likes

Hi Robert,

I played around a bit with the model because this is going to be really helpful to a lot of the work that I am doing. TL;DR I think there is a bias in the way the model is parameterised and the priors are chose and I think I have a solution.

After some testing with simulated data and simulated priors from the data, I found that there is a downward absolute bias in all the parameters. Especially if the \phi parameter of the beta(\mu, \phi) distribution is large. I think that this parametrisation has trouble when the proportional outcome is relatively uniform in the center of the distribution because the uniformity could be because of large coefficient on X or small \phi. My guess is that if there are a lot of 0s and 1s in the outcome, those are missing observations mainly from the tails of the beta, and it’s those tails that are most informative on \phi. So there is not a lot of data to inform \phi, the prior pushes \phi down, which pushes the absolute value of the coefficients down, which is compensated by a decrease in the absolute value of the cutoff points to generate enough extreme values. I think this could also explain the slight upward bias in the marginal effects in the paper. The estimated model predicts more extreme outcomes of 0 and 1. Anyway, this is merely informed guessing.

I reimplemented your model (sorry the nomenclature is a bit different from yours) to get around the problem I think is there. I basically used the beta(\alpha, \beta) parameterisation because I could put a prior on \beta so that it doesn’t suffer from the problem that I think \phi has. It’s admittedly a slightly different model but not that different I think.

The full model is below. It assumes that X is centered and scaled. The R code for simulation and the stan model are here.beta_logit_reparam.stan (833 Bytes) test_beta_param.R (1.6 KB)

\gamma \sim normal(0, \sigma_\gamma \sqrt{N_\gamma}) \\ \beta \sim lognormal(0, 1) \\ cut_1 \sim \sigma_\gamma * normal(-2, 1) \\ cut_2 \sim \sigma_\gamma * normal(2, 1) \\ y \sim \begin{cases} binomial(1 - invlogit(X\gamma - cut_1)) &y = 0 \\ binomial(invlogit(X\gamma - cut_2)) &y = 1 \\ beta(\beta e^{X\gamma}, \beta) & 0 < y < 1 \\ \end{cases}

Most of the structure is the same. I used a slightly different prior on the coefficient \gamma so that the total sd of X \gamma is roughly \sigma_\gamma which can be set in Stan. N_\gamma is the number of explanatory variables. That helps to make the cuts also a bit more interpretable, I choose for twice the sd of X\gamma as the mean but that can adjusted obviously. After the invlogit, that leads to about 15% 0s and 15% 1s in expectation.

The beta distribution is the only structural difference. The interpretation of \beta is that it forces the beta distribution to be beta(\beta, \beta) when X\gamma = 0. I am not sure whether that is acceptable for all applications. The nice thing is that the \beta parameter is more sensitive to the shape of the distribution (e.g. tilting towards 0 or 1) and not just the concentration of the distribution. The other nice feature is that the prior biases towards 1, so if there is not a lot of information in the data X\gamma \approx 0 then the beta distribution is approximately uniform. In that sense, it’s an uninformative prior I guess.

I tried to run the simulation a couple of times and I could recover the parameters pretty well. I probably should run a proper SBC when I find the time. Let me know if that is something that would be valuable.

3 Likes

The SBC of the new model for 3 predictors looks reasonable.

beta_logit_reparam_scb.stan (2.4 KB) scb_basic.R (662 Bytes)

1 Like

thanks for your work on this, @stijn, and sorry for not responding earlier! I haven’t been working on this paper during the pandemic but I have finally gotten back to it.

As far as your parameterization goes, I think it’s intriguing, especially the use of some empirical information to scale the parameters. However, I will say that I think that moving away from the mean/scale parameterization makes the beta distribution harder to use. With the \beta/\alpha parameterization, the mean is a function of both parameters and there’s no real way to model the variance with just one set of parameters either. So I think there is something loss. You can still derive marginal effects that would be identical across parameterizations, but I think the ability to model the scale is a real loss.

That said, I agree that the mean/scale parameterization is probably going to have some computational problems given that it’s a multiplication of two probabilities using the inverse logit function. I don’t think there is noticeable bias in the parameters, but I’m re-doing the simulation currently so we’ll see. However, I agree that the priors are going to have a lot of weight in some of the situations where you mention. On balance, though, I think that the \alpha/\beta parameterization is too far removed from what most people want from the model for it to be a viable option.

On the flip side, if you can find a way of using it that separates the meaning of the parameters, i.e., prior successes versus prior failures, then it would be really useful & also fit better than the mean/scale parameterization.

1 Like

I have done some work incorporating induced Dirichlet priors for the thresholds, enabling e.g. using an ordered Beta regression model for a response strictly between 0 and 1 (for whatever reason) or a case where we only have observed ones or zeroes, but not both, together with values between 0 and 1.

The code is available here induced_dirichlet_ordbeta.R (github.com) and builds upon discussion in this thread: Dirichlet prior on ordinal regression cutpoints in brms

I did also some small changes to @saudiwin’s code for handling a modeled phi.

3 Likes

I did a small mistake by specifying cutzero and cutone as ordered, when it should be cutzero and cutzero+exp(cutone). I updated the script. I also noticed that the suggested prior for Intercept is a constant zero, which does not make sense (if we aren’t really sure the mean for the regression baseline within 0 and 1 is 0.5). I made a comparison with an example model and put together a notebook here: Ordered Beta - with and without intercept (htmlpreview.github.io)

1 Like

Thanks Staffan, I’ll have a look at the code. The function to interface with brms have to manipulate some of the options to fit a beta regression with cutpoints, so that is why the intercept was set to zero. brms syntax does change though and the original code had some limitations, so I’ll double check all this.

Weird. So I ran your new prior with the data I use in the vignette, and I don’t get any difference. ELPD is almost identical and the pp plots very similar. But then with the data you use, big difference. I am wondering if this is a data size issue as the dataset you use is much smaller.

In any case, this prior makes more sense from a principled point of view so I’m happy to incorporate it. Is there a way of citing you, perhaps the dharma stuff? I can’t add you as a code contributor as there isn’t an R package for it.

2 Likes

I think the best explanation is that in your model the intercept is (for predictive purposes) more or less identical to zero.

I think you should cite Florian Hartig directly for DHARMa. See the suggested citation from citation("DHARMa"). For the specification of the induced Dirichlet prior and the underlying code it is originally (as far as I know) derived by @betanalpha, see his case study on Ordinal Regression.

1 Like

Sounds good!

I derived the prior independently but I believe that it, or a similar version of it, was also derived and implemented in rstanarm independently but never readily documented.

2 Likes

Thanks, I’ll be sure to cite both!