The matlab way: function [f, G, H] = myFct()
Inside the function one could write “return_gradient”, or simply assign like matlab style,
where f, G, H is bound and used for value, gradient and Hessian return.
The GNU R way; where you specify another function as gradient function.
Maybe some naming convention could do. Additionally to _lpmf
, _grad. The compiler checks if a _grad functions exists, then use it, otherwise
defaults to autodiff.
If there is no jacobian, then f = myFct(), or in GNU R example, the _grad function
not exist. So it defaults.
Would SGT be a good candidate to be implemented in Stan-math and Stan proper? The distribution seems pretty straight forward, but are there good tricks for sampling from this?
(I could imaging having a bunch of switches and if the parameters are certain values, the code defaults to sampling from other, known distributions. But what about the general 5-parameter SGT?)
Hey @mathDR, I’m working on implementing skewed/heavy tailedness support via LambertW transforms (more info here). I took a quick look at this thread and the skewed generalized t (SGT) wiki, do you see any particular advantages of going to SGT over what I’m working on at-the-moment?
Curious to here either way; if you have (simple-ish) STAN program you’d like me to consider while working on this, it’d be good to share.
Hey @Neel This is very interesting. I am coming from this from a different perspective: I would like to use the distribution as a “function” that models some aspect of my data.
This all stems from a Bayesian Holiday Model that I am trying to speed up. Currently, this is constructed in an ad-hoc fashion, but a Skew-Generalized-T distribution yields all of the flexibility that I require to model the holidays.
Of course, having this distribution in Stan with known gradients would make any model I am doing inference on faster (the holidays portion is usually the culprit). Which is a reason I am proposing to add it to the Stan libraries.
I’ve used some stan-math functions that didn’t exist when Aki wrote the sgt and got about 2x speed up on 1 model I tested. Let me know if this helps you
real sgt_lpdf(vector x, real mu, real s, real l, real p, real q)
{ // Skewed generalised t
int N = dims(x)[1];
real lz1 = lbeta(1.0 / p, q);
real lz2 = lbeta(2.0 / p, q - 1.0 / p);
real v = q^(-1.0/p) * inv_sqrt( ( 3 * square(l) + 1 ) * exp(lbeta(3.0/p, q - 2.0/p) - lz1) - 4 * square(l *exp(lz2-lz1)) );
real m = 2 * v * s * l * q^(1.0/p) * exp(lz2 - lz1);
real out1 = N * (log(p) - log(v) - log(s) - lmultiply(1/p, q) - lz1);
real out2 = 0;
for (n in 1:N) {
real r = x[n] - mu + m;
if (r < 0)
out2 -= log1p((-r)^p / (q * (v * s * (1 - l))^p));
else
out2 -= log1p(r^p / (q * (v * s * (1 + l))^p));
}
return fma(out2, (1/p + q), out1);
}