Adding Lambert W Transforms to Stan?

A guy I know wrote a paper titled The Lambert Way to Gaussianize Heavy-Tailed Data with the Inverse of Tukey’s h Transformation as a Special Case and has an associated R package LambertW. The TL;DR is that the paper has a parameterization to remove skewness and left/right kurtosis from random variables that reminds me a lot of non-centered parametrizations.

The package has a good example of this

# Univariate example
library(LambertW)
set.seed(20)
y1 <- rcauchy(n = 100)
plot(density(y1))
out <- Gaussianize(y1, return.tau.mat = TRUE)
plot(density(out$input) # huh!
x1 <- get_input(y1, c(out$tau.mat[, 1]))  # same as out$input
test_normality(out$input) # Gaussianized a Cauchy!

Wanted to throw it out there in case anyone though it would be a useful tool. On the backend we would need to add boosts LambertW implementation and just coding over the stuff in the R package to stan math

5 Likes

I think it is a good idea. Adding something from boost::math is pretty easy, although reimplementing the Gaussianize stuff is more ambitious.

2 Likes

Yeah the boost part is easy peasy. tbh I’m not really sure how we would want to expose something like Gaussianize. We could just add a tukey_hh_lpdf distribution etc. On the other hand the framework in the paper supports a lot of distributions and we could do something with a functor + specializations

// Could call it noncenter(...)?
target += lambertW_transform(normal_lpdf, y, X, mu, sigma, alpha, rho_l, rho_r);
1 Like

Do we want to do the specializations for the distributions like in the LambertW package that uses MLE? Or can we do the moment matching?

The issue with the former is that one must specify the original distribution. That’s not that big of a deal, we do it all the time, but then we need all the specializations in the functor format you suggest. The second way of estimating the distribution is by moment matching. In fact, TF does this with their gaussianize function. TF bases their code on the Characterizing Tukey h and hh-Distributions through L-Moments and the L-Correlation by Headrick and Pant. They numerically solve for the left and right parameters when the distribution is asymmetric. The TF code uses binary search, see -https://github.com/tensorflow/transform/blob/879f2345dcd6096104ae66027feacb099e228e66/tensorflow_transform/gaussianization.py.

What’s cool about that L-moments paper is we could write a tukey_symmetric_lpdf and tukey_skew_lpdf that Stan samples from on the normal 0,1 scale. Along with estimating the location and scale, the symmetric versions would take in h and the skew versions an h_l and h_r as parameters. What is really cool though - and something TF does not do - is that we could do a multi_tukey specification. Where each marginal density has their own skew and/or kurtosis and connected via a correlation matrix. See equation 4.1 in the paper that uses the choleksy factor of the correlation matrix values.

We wouldn’t need all the specializations but we would need to implement code similar to what TF does and then all the correlation stuff in the paper.

2 Likes

I like the moment matching idea, maybe we could setup a prototype branch in stan math to sort out what this would look like? The big thing is being able to go back and forth between the corrected and uncorrected data

Realistically, it’s going to be some time before I can try this as there’s a ton at work for the new year. I’d probably write it in Stan before trying it on stan math but that’s because my comfort with Stan >> C++.

Didn’t know about Lambert transforms. That’s cool! Reminds me of rank inverse normal transforms which are use a lot in genetics.

Is this typically used to transform data beforehand or were you thinking of using this to transform parameters to make them easier to sample?

I checked out the volatility modeling example in the paper where they transform the returns. What are the pros and cons of transforming the data then fitting with a Gaussian observational model versus just fitting raw returns with, for example, an SGT distribution?

Section 3.1 of the lambertw transform paper goes over this a bit. For going to a normal there’s not much of a difference really but with the general lambert framework you can use different underlying distributions so it opens it up to different modeling approaches