Hi Robert,
I played around a bit with the model because this is going to be really helpful to a lot of the work that I am doing. TL;DR I think there is a bias in the way the model is parameterised and the priors are chose and I think I have a solution.
After some testing with simulated data and simulated priors from the data, I found that there is a downward absolute bias in all the parameters. Especially if the \phi parameter of the beta(\mu, \phi) distribution is large. I think that this parametrisation has trouble when the proportional outcome is relatively uniform in the center of the distribution because the uniformity could be because of large coefficient on X or small \phi. My guess is that if there are a lot of 0s and 1s in the outcome, those are missing observations mainly from the tails of the beta, and it’s those tails that are most informative on \phi. So there is not a lot of data to inform \phi, the prior pushes \phi down, which pushes the absolute value of the coefficients down, which is compensated by a decrease in the absolute value of the cutoff points to generate enough extreme values. I think this could also explain the slight upward bias in the marginal effects in the paper. The estimated model predicts more extreme outcomes of 0 and 1. Anyway, this is merely informed guessing.
I reimplemented your model (sorry the nomenclature is a bit different from yours) to get around the problem I think is there. I basically used the beta(\alpha, \beta) parameterisation because I could put a prior on \beta so that it doesn’t suffer from the problem that I think \phi has. It’s admittedly a slightly different model but not that different I think.
The full model is below. It assumes that X is centered and scaled. The R code for simulation and the stan model are here.beta_logit_reparam.stan (833 Bytes) test_beta_param.R (1.6 KB)
Most of the structure is the same. I used a slightly different prior on the coefficient \gamma so that the total sd of X \gamma is roughly \sigma_\gamma which can be set in Stan. N_\gamma is the number of explanatory variables. That helps to make the cuts also a bit more interpretable, I choose for twice the sd of X\gamma as the mean but that can adjusted obviously. After the invlogit, that leads to about 15% 0s and 15% 1s in expectation.
The beta distribution is the only structural difference. The interpretation of \beta is that it forces the beta distribution to be beta(\beta, \beta) when X\gamma = 0. I am not sure whether that is acceptable for all applications. The nice thing is that the \beta parameter is more sensitive to the shape of the distribution (e.g. tilting towards 0 or 1) and not just the concentration of the distribution. The other nice feature is that the prior biases towards 1, so if there is not a lot of information in the data X\gamma \approx 0 then the beta distribution is approximately uniform. In that sense, it’s an uninformative prior I guess.
I tried to run the simulation a couple of times and I could recover the parameters pretty well. I probably should run a proper SBC when I find the time. Let me know if that is something that would be valuable.