Sum of log-normal distributions and Quadrature Integration

aaronjg · June 3, 2017, 12:33am

I am trying to model some data as:

Y = log(sum^k (e^X_i))
X_i ~ N(0,sigma)

I had been putting all of the X_i as parameters in the model, which works reasonably well, but it was rather slow. To speed things up, I was using the Schwartz-Yeh approximation, which is a lognormal approximation for the sum of lognormals, however to find the parameters for the approximation, one must use numerical integration, and people normally use quadrature eg:

http://ieeexplore.ieee.org/document/467959/
and MATLAB code
http://www.snowelm.com/~t/doc/tips/20110902.en.html#sy

However, it seems strange to be doing the quadrature within the HMC, and I haven’t been getting the speed up I was expecting. The number of numerical integrations scales O(log2(k)), and with 8 quadrature points it’s not clear that much computation is saved.

Reading the manuals and forum, I’ve seen HMC described as an integration algorithm, so it seems like it might be possible to somehow replace the quadrature with new latent variables, but I’m not sure how to proceed.

JulianK · June 5, 2017, 3:25am

Try defining the X_i values to be an ordered parameter rather than real. This will stop swapping of the parameters between modes which is probably killing your performance. The issue is that your original model is unidentified under exchange of parameters. Ie x_1 --> x_2 and x_2 --> x_1 gives you the same likelihood.

aaronjg · June 5, 2017, 8:30am

That makes sense about the parameters swapping around - but with the ordered constraint is it still the same model? This application seems somewhat different from the cutpoint regression introduced in the manual.

JulianK · June 5, 2017, 9:56am

It’s still the same problem formulation. It just means that your variables are ordered, so the problem is no longer unidentified. You can’t swap X_2 and X_1 in this case.

I’ve done something similar before in a number of cases and it has worked really well.

Give it a try and let me know how it goes…

Bob_Carpenter · June 6, 2017, 9:06pm

Well, it’s not technically exactly the same model, but you can think of the ordering as doing the normal distribution, then sorting. So yes, it’ll be the same for all intents and purposes. We use this technique to identify mixtures, and now that you’e going down this route, you might want to read @betanalpha’s case study on mixtures (under doc on our web pages).

If you want to code this

Y = log(sum^k (e^X_i))
X_i ~ N(0,sigma)

you want to code this in Stan as

X[i] ~ normal(0, sigma);
...
Y = log_sum_exp(X);

That’ll keep the arithmetic stable and won’t lose precision.

I couldn’t follow what the integral was you were trying to do, but I’d stay away from approximations if at all possible.

aaronjg · June 7, 2017, 6:06am

Thanks! I started to look into it and it seemed like it was taking a while for the parameters to converge. All of the X_i are pretty far into the right tail of the distribution and they seemed to all get smashed together, but not at a value low enough.

The Schwartz-Yeh approximation models
Z = log(e^X + e^Y)
Where Z is approximately normally distributed with mean equal to E[Z] and var = E[Z^2] - E[Z]^2.

Although it is an approximation I’m looking at sums of 100-500 variables, and by recursively doing the integral, I am only looking at adding O(log(k)) computations to the model rather than O(k) variables…

aaronjg · July 20, 2017, 8:22pm

Just a quick update - I was able to sidestep the problem by using a gamma rather than log-normal distribution, and then using the sum of logarithmized Gamma random variables using
the method of Marques, Coelho & Carvalho (2014).

Bob_Carpenter · July 21, 2017, 11:35pm

Thanks for writing back with the solution (which I marked—at least I hope it was the solution to this problem).

Topic		Replies	Views
Soft constraint with min/max - does this break continuous differentation? Modeling	3	490	December 15, 2018
Specifying a model with sum of latent variables Modeling techniques , fitting-issues , specification	2	537	March 16, 2023
Lognormal() sampling Modeling	4	1451	May 16, 2019
Specifying density that is product of normal densities Modeling	4	417	November 14, 2019
Lognormal vs taking the exponential of the parameter Modeling specification	13	2157	April 30, 2019

Sum of log-normal distributions and Quadrature Integration

Related Topics