I have fitted a categorical logistic regression in rstanarm
. I’m now looking to obtain posterior estimates for odds ratios and probabilities.
Assuming the Stan model is in a variable called fit
, getting point estimates is easy:
odds <- exp(fit$coefficients)
probs <- odds / (1 + odds)
The same can’t be said for credible intervals. Typically, one could exponentiate the interval for the log odds; however, credible intervals aren’t usually invariant under monotonic transformation. If they were, one could do:
odds_cri <- exp(posterior_interval(fit, prob = 0.95))
prob_cri <- odds_cri / (1 + odds_cri)
Posterior SDs are another story altogether.
The model could be fitted in Stan itself via rstan
, but more work from the modeller will be needed to set things up for valid inference.
Is there an easy way to do this in rstanarm
? I’ve not been able to find anything in the documentation (vignettes, etc.).
I’m not sure what you mean here. If you have a credible interval (a, b) for a random variable X, and you have a strictly monotonic upward function f, then the then you know the credible interval of the same coverage for Y = f(X) is indeed (f(a), f(b)). The densities change, but the boundaries adapt to handle it.
This is perhaps easiest to see from the MCMC estimators. Let’s say I have M draws
x_1, \ldots, x_M \sim p_X(x).
We estimate the central posterior interval at \alpha \in (0, 1) coverage by sorting the elements of x_{1:M} into z_1, \ldots, z_M in ascending order and using \left(z_{\lfloor M \cdot \alpha / 2 \rfloor}, z_{M - \lfloor M \cdot \alpha / 2 \rfloor}\right) as our estimate of the bounds of the central \alpha posterior interval of X.
Now take a strictly monotone increasing f:\mathbb{R} \rightarrow \mathbb{R} (i.e., a < b implies f(a) < f(b)). Then my transformed draws are
f(x_1), \ldots, f(x_M) \sim p_Y(y)
and it turns out the sort of f(x_1), \ldots, f(x_M) is just f(z_1), \ldots, f(z_M) because f is monotonic. Hence my estimated bounds for the central \alpha interval of Y = f(X) is \left(f(z_{\lfloor M \cdot \alpha / 2 \rfloor}), f(z_{M - \lfloor M \cdot \alpha / 2 \rfloor})\right).
I’m sure it’s also possible to do with calculus by applying change of variables and then integrating between the new bounds. I’m just better at sampling than calculus :-).
Ah OK.
This is reassuring and makes things a lot easier. I had read a Cross Valiated post here: bayesian - Are HPD intervals invariant to reparameterization? - Cross Validated which piqued my attention.
I can now get all this information from the objects outputted via rstanarm::stan_glm()
. However, is there a simple way to get the posterior standard deviations for the odds and probabilities?
I could write a Stan model from scratch and declare odds and probabilities within a generated quantities block
, but that would likely be a lot of extra work. Hopefully rstanarm
can output what I’m needing.