How to speed up calculation of Mittlag-Leffler's function in Stan?


#1

I was coding the log of the Mittag-Leffler function in Stan. Everything fine, except its terrible slow.
Any ideas?

real mlf_log(real z, real a, real b) {
  real s = -lgamma(b);
  real si = s;
  int k = 1;
  real log_z = log(z);
  if(z < 10e-15)
    return s;
  while(si > -35) { // precision limit exp(-35) below 6.305117e-16 stop iteration
     si = k * log_z - lgamma(a * k + b);
     s = log_sum_exp(s, si);
     k += 1;
  }
  return s;
}

https://en.wikipedia.org/wiki/Mittag-Leffler_function


#2

Pretty sure the slow part here is going to be the lgamma (and derivatives) since each of those involves an iterative algorithm as well. Could you calculate the lgamma once and then just update it on each iteration?


#3

Thanks for the quick reply. You’re right the iterative lgamma is slowing down the calculation!
If a is a positive integer, then the lgamma in the loop could replaced by a precalculation
of initial lgamma calculation and log updates in the loop.


#4

If this were coded in c++ directly I think you could also roll over much of the work from one lgamma to the next (at least in derivatives).


#5

Yes, you are right. Thank you!! I forgot about the derivatives:

from https://arxiv.org/pdf/0909.0230.pdf

That’s sophisticating “easy” then. Except I’ve to learn Stans C++ interface.

Added later: Not so easy though, if considering the derivatives of alpha and beta.
No closed form exists in this case.


#6

Yeah, it’s a significant chunk of work. So first question is whether you need it to be faster!


#7

None of this is closed form so that’s ok, it’s just a matter of getting representations that are good enough and carry over as much work as possible iteration-to-iteration.


#8

My implementation is way too slow, about one million seconds per 1000 iterations.
I found out that log_sum_exp calls are not the fastest too.

The Mittag-Leffler function covers many cases whereas the shape parameters are
continuous. I coded already a lot of alternative functions already, but I’m always keen
on to improve my models and learn more.
I used it as a mixture with the Poisson distribution. It’s a very flexible type
of a Hyper-Poisson distribution.

I don’t expect the Stan Team to take any action on this. Maybe it’s a good starter to learn
the Stan C++ API.


#9

Sure, if you want to do it I’m happy to give pointers.


#10

1e-16 is too much precision to ask for. You don’t get that precision on addition and multiplication. Probably cutting off at 1e-8 or so precision would be enough and a lot cheaper to calculate. And you probably want the precision to be proportional, not absolute. Something like when s stops chanign the way you have it here.

As the algorithm’s written, you could store the si and use a single log_sum_exp. That’s a lot less arithmetic.

Move the definition of log_z and test up to first two operations after defining s. That helps with clarity and will save some operations when the condition triggers.

Rather than all the k * log_z, you can just keep a running total and add log_z at a time.

None of this other than fixing precision will matter much compared to the cost of lgamma.


#11

Bob, Many Thanks for the ideas and to Sakrejda as well.

If the required precision of Stan is only 1e-8 then I would suggest to think of using
a good approximation of the log-gamma function. And this would lead me to another
question wouldn’t it be beneficial to have approximations in Stan?

Why did Ben use exp(-745) as termination criteria in this Conway-Maxwell-Poisson
Code? http://discourse.mc-stan.org/t/conway-maxwell-poisson-distribution/2370

Approx. of Gamma Functions
http://www.rskey.org/CMS/index.php/the-library/11
https://arxiv.org/abs/1003.6020

Approx. of Mittag-Lefflers function
This is not an easy task
http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=95FB100080A764194C9BE37F0FDD4FCD?doi=10.1.1.706.6903&rep=rep1&type=pdf
https://www.icp.uni-stuttgart.de/~hilfer/publikationen/pdfa/ITSF-2006-17-637-a.pdf


#12

Anything much lower than that underflows to 0 when fed into exp().

That’s a possibility. When I say that level of precision, I mean end-to-end. And this is only approximate from playing with things through our ODE solver, where we can control precision directly. The reality is going to depend on exact factors of the curvature in the model you’re dealing with. So we’ve been reluctant to recommend anything that reduces precision.

The problem is that if you start introducing intermediate functions with less precision and then stacking them, you can lose precision fast.


#13

What you can do is back off on these internal thresholds and see if you loose accuracy against an external reference. Sometimes you don’t (because some other source of error dominates whatever the internal threshold is responsible for). But be careful to check a wide range of input values because you can get huge variation in error across values.