Skewed distribution

Is there a way to use skewed distributions in stan like the one proposed by Fernandez and Steel [1998] in “On Bayesian Modeling of Fat Tails and Skewness” of the ASA journal?

Because the HMC algorithm in Stan needs gradients, the log-density needs to be differentiable (I think it needs to have 2 continuous derivatives, but it at the very least needs 1). I’m not sure these models do.

In the event that the model is smooth enough and you have an analytic form for the log density, you can use the
target += [log-density goes here]
form (See “Section 5.2: Increment Log Density” in the manual).

I would start with the skew_normal distribution, which is already implemented in Stan.

Thanks for the info!

What I need is a skewed t distribution. I had a short try of exp_mod_normal seems to work. More investigation needed.

Maybe this helps


Thanks! I will definitely try it out.

Hi Aki,
A short question. The sgt_log function implemented is equivalent to mean.cent = TRUE and var.adj = TRUE, which means pq must > 2 right?

Yes, that matches the linked pdf. I did this quickly and my constraints in the parameters block can also be wrong. If you find errors or improve the code otherwise, please post here. Note also that you might need to change _log → _lpdf.


Thanks Aki!
I checked the sgt_log function against the pdf and it seems correct.
I did modify the function a bit so that it fits my usage. The modified function is below for share

functions {
  real sgt_log(vector x, real mu, vector s, real l, real p, real q)
  { // Skewed generalised t
    int N;
    real lz1;
    real lz2;
    real v;
    real m;
    real r;
    real out;
    N = dims(x)[1];
    lz1 = lbeta(1.0/p,q);
    lz2 = lbeta(2.0/p,q-1.0/p);
    v = q^(-1.0/p)*((3*l^2+1)*exp(lbeta(3.0/p,q-2.0/p)-lz1)-4*l^2*exp(lz2-lz1)^2)^(-0.5);

    out = 0;
    for (n in 1:N) {
      m = 2*v*s[n]*l*q^(1.0/p)*exp(lz2-lz1);
      r = x[n]-mu+m;
      if (r<0)
        out = out+log(p)-log(2*v*s[n]*q^(1.0/p)*exp(lz1)*(fabs(r)^p /(q*(v*s[n])^p*(1-l)^p)+1)^(1.0/p+q));
        out = out+log(p)-log(2*v*s[n]*q^(1.0/p)*exp(lz1)*(fabs(r)^p /(q*(v*s[n])^p*(1+l)^p)+1)^(1.0/p+q));
    return out;


The current version of stan seems still accept _log. I changed the range and the priors for the parameters in my app of course.


That’s a monster, fabs and beta. Anything against, skew_normal from Stan and mix it with chi-squared distribution? see here:

Hi Andre,
Yes, sample from it is slow. Can you show me some stan code to mix skewed_normal and chi_square distributions?

  • fabs is not actually needed, as fabs(r) is inside if (r<0), so that could be removed
  • for (n in 1:N) loop is computing a lot of repeated computations, which could be moved out of the loop (which I think should reduce the autodiff expression tree, too)
  • lbeta is called only once, so I don’t think that’s slowing down much
  • I didn’t make the above optimizations to my original code, because I wanted to keep it readable so that I could check it’s computing what is in the pdf and I had only a small data, but now if this works, it would be easy to optimize it

Skew t is not as flexible as this generalized skew t, but if skew t is enough then scale mixture of skew normals could be useful option (scale mixtures are not always the best option, and furthermore they are only asymptotically equivalent…)


Cool! I will try to optimize a bit.


scode <- "
data {
  real<lower=1> nu;
  real alpha;
transformed data {
  real sqrt_nu = sqrt(nu);
parameters {
  real<lower=0> V;
  real Z;
transformed parameters {
  real T = Z * sqrt(V) * sqrt_nu;
model {
  V ~ inv_chi_square(nu); 
  Z ~ skew_normal(0, 1, alpha);

foo_data <- list(nu = 4, alpha = 1)

foo <- stan(model_code = scode, data = foo_data, chains = 1, iter = 4000, control=list(adapt_delta=0.8))
T <- extract(foo)$T

It just popped in to my mind that Skew normal has erf and chi-squared has gamma so the computation time should be similar (and did post before that fabs is not needed).


As to the _log, we still try to maintain backward compatibility and just raise deprecation warnings. We’ll probably eliminate it in Stan 3, which is on its way (though not soon!).

Since T isn’t used in the model, it’d be more efficient to define it as a generated quantity. That way, you don’t evaluate it’s partial derivatives.

Without fabs, sgt is good then. How often I wished, in cases like this, to have an easy way to specify the gradients directly.
BTW, there is an ST5 function around in the old google forum.

How would you specify gradients? We’ve never figured out how to do it syntactically given that we want the full Jacobian, but the output is of one shape and there can be any number of arguments of other shapes. If we could flatten everyting out into an R^M -> R^N function we could potentially return a matrix where the first column is the value and the remaining values makes up the Jacobian.

Something like the following could be implemented now to provide a function foo with its Jacobian.

matrix foo_jacobian(vector theta) {
  matrix J = ... jacobian ...
  vector v = ... value ...
  return append_col(v, J);

matrix temp = foo_jacobian(theta);
vector y = temp[, 1];  // value
matrix foo_J = temp[ , 2: ];  // Jacobian

You might then want to deal with data variables where there isn’t a Jacobian, which is another round of complications. Maybe the Jacobian could be only w.r.t. the first argument and there could be other arguments. These might be required to be data or autodiffed. You can see the problems we’re going to run into with this.