Gamma quantile function

I wasn’t too concerned with the speed of the inverse calculation but with auto-diff to get the gradients since the inverse gamma calculation is an approximation to begin with. I don’t think there’s any way to get around that (no published algorithm that I can find that doesn’t invoke something awkward like Meijer’s G function) so I’m going to take your advice. I can test code I write against values from R’s qgamma and gradients against finite diff? If there are other suggestions on testing I’d welcome them.

Regarding the derivatives, for any distribution we have icdf’(y) = 1 / pdf(icdf(y)) – this is just the chain rule. So there’s no extra work to get the derivatives, assuming we have a pdf and a good approximation of icdf. If we were willing to do newton searches we could do a generic implementation inverse cdfs with derivatives for all the univariate distributions at once, by implementing inverses for univariate functions. If you need the derivative with respect to the shape parameters, there’s a similar generic computation using the multivariate chain rule. But I do expect a lot of overflows out in the tails.

thel Developer
December 8
Regarding the derivatives, for any distribution we have icdf’(y) = 1 / pdf(icdf(y)) – this is just the chain rule. So there’s no extra work to get the derivatives, assuming we have a pdf and a good approximation of icdf. If we were willing to do newton searches we could do a generic implementation inverse cdfs with derivatives for all the univariate distributions at once, by implementing inverses for univariate functions.

I think this’d be worth exploring. It’d be nice to have
the inverse CDFs.

It’d be nice to expose the actual optimizer and its derivatives,
I think.

If you need the derivative with respect to the shape parameters,

Yes, we need derivatives with respect to everything. At least
with the way functions are architected and documented now.
They can be used anywhere the Stan types match.

  • Bob

In this approach the story with the derivative with respect to a parameter q is to just to write

cdf(q, icdf(q, x)) = x

differentiating with respect to q and writing partial_q for the derivative we find

partial_q cdf(q, icdf(q,x)) + pdf(q, icdf(q,x)) * partial_q icdf(q,x) = 0

so

partial_q icdf(q,x) = - partial_q cdf(q, icdf(q,x)) / pdf(q, icdf(q,x))

again I worry about overflows but I think they’re “true” overflows in the sense that icdf is genuinely badly behaved in the tails (swap horizontal for vertical). But the pattern is the same as before: if I can evaluate the function, its inverse, and the derivatives of the inverse, I can evaluate all derivatives of the function with generic formulae.

The downside of this is of course that these formulas depend on the inverse function relationship being exact, so the “derivatives” would not actually be derivatives of the approximation, nor are they obviously the best approximation to the derivative. I’m starting to doubt it. Looking at those pdfs in the denominator I’m quite worried.

Anyway, a generic enough approach but I’m suspicious of the accuracy. Might be worth trying if we’ve got time to do a lot of testing.

-thel

This all makes sense, I like the idea of getting a newton root solver in and having an implementation for all the inverse cdf’s. I would be interested in working on that sometime in January but won’t have any time for it before that really. In the next week or so I will get Boost’s inverse gamma and derivatives done at least on a branch and I’ll use this idea for the shape derivative so maybe that can be a test case.

Maybe there are some ideas to take from this blog post:
http://www.shapeoperator.com/2014/02/22/bisecting-floats/
where some floating point traps in the context of bisection are described

Julia implementation of some methods are there:

This derivative free method looks especially interesting:

That’s fun stuff! I hadn’t run into such a clear example of the discreteness of floating point numbers yet.

That’s fun stuff! I hadn’t run into such a clear example of the discreteness of floating point numbers yet.

That is also the case for me, the example is quite illuminating.

I have not implemented New Eighth-Order Derivative-Free Methods for Solving Nonlinear Equations yet. To be honest I fear a little bit about numerical stability. However if it works in a stable way, the 8th order of CV must be impresive.

The usual place we run into it is that 1 - epsilon evaluates
to 1 for any epsilon with -1e-15 << epsilon << 1e-15 or so.
There’s much better resolution around 0, hence the need for
complementary CDFs.

1 Like

Were you ever able to incorporate Boost’s inverse gamma as you mention here? I have been struggling to do the same.