Skewed distribution

jzh · July 25, 2017, 8:23am

Is there a way to use skewed distributions in stan like the one proposed by Fernandez and Steel [1998] in “On Bayesian Modeling of Fat Tails and Skewness” of the ASA journal?

anon75146577 · July 25, 2017, 3:28pm

Because the HMC algorithm in Stan needs gradients, the log-density needs to be differentiable (I think it needs to have 2 continuous derivatives, but it at the very least needs 1). I’m not sure these models do.

In the event that the model is smooth enough and you have an analytic form for the log density, you can use the
target += [log-density goes here]
form (See “Section 5.2: Increment Log Density” in the manual).

bgoodri · July 25, 2017, 3:55pm

I would start with the skew_normal distribution, which is already implemented in Stan.

jzh · July 26, 2017, 8:05am

Thanks for the info!

jzh · July 26, 2017, 8:08am

What I need is a skewed t distribution. I had a short try of exp_mod_normal seems to work. More investigation needed.

avehtari · July 26, 2017, 8:28am

Maybe this helps

Aki

jzh · July 26, 2017, 9:17am

Thanks! I will definitely try it out.

jzh · July 26, 2017, 1:58pm

Hi Aki,
A short question. The sgt_log function implemented is equivalent to mean.cent = TRUE and var.adj = TRUE, which means pq must > 2 right?

avehtari · July 26, 2017, 4:16pm

Yes, that matches the linked pdf. I did this quickly and my constraints in the parameters block can also be wrong. If you find errors or improve the code otherwise, please post here. Note also that you might need to change _log → _lpdf.

Aki

jzh · July 27, 2017, 7:47am

Thanks Aki!
I checked the sgt_log function against the pdf and it seems correct.
I did modify the function a bit so that it fits my usage. The modified function is below for share

functions {
  real sgt_log(vector x, real mu, vector s, real l, real p, real q)
  { // Skewed generalised t
    int N;
    real lz1;
    real lz2;
    real v;
    real m;
    real r;
    real out;
    N = dims(x)[1];
    lz1 = lbeta(1.0/p,q);
    lz2 = lbeta(2.0/p,q-1.0/p);
    v = q^(-1.0/p)*((3*l^2+1)*exp(lbeta(3.0/p,q-2.0/p)-lz1)-4*l^2*exp(lz2-lz1)^2)^(-0.5);

    out = 0;
    for (n in 1:N) {
      m = 2*v*s[n]*l*q^(1.0/p)*exp(lz2-lz1);
      r = x[n]-mu+m;
      if (r<0)
        out = out+log(p)-log(2*v*s[n]*q^(1.0/p)*exp(lz1)*(fabs(r)^p /(q*(v*s[n])^p*(1-l)^p)+1)^(1.0/p+q));
      else
        out = out+log(p)-log(2*v*s[n]*q^(1.0/p)*exp(lz1)*(fabs(r)^p /(q*(v*s[n])^p*(1+l)^p)+1)^(1.0/p+q));
    }
    return out;
  }

}

The current version of stan seems still accept _log. I changed the range and the priors for the parameters in my app of course.

jicun

Andre_Pfeuffer · July 27, 2017, 3:25pm

That’s a monster, fabs and beta. Anything against, skew_normal from Stan and mix it with chi-squared distribution? see here: https://en.wikipedia.org/wiki/Student%27s_t-distribution#Characterization

jzh · July 27, 2017, 9:17pm

Hi Andre,
Yes, sample from it is slow. Can you show me some stan code to mix skewed_normal and chi_square distributions?
jicun

avehtari · July 27, 2017, 9:28pm

fabs is not actually needed, as fabs(r) is inside if (r<0), so that could be removed
for (n in 1:N) loop is computing a lot of repeated computations, which could be moved out of the loop (which I think should reduce the autodiff expression tree, too)
lbeta is called only once, so I don’t think that’s slowing down much
I didn’t make the above optimizations to my original code, because I wanted to keep it readable so that I could check it’s computing what is in the pdf and I had only a small data, but now if this works, it would be easy to optimize it

Skew t is not as flexible as this generalized skew t, but if skew t is enough then scale mixture of skew normals could be useful option (scale mixtures are not always the best option, and furthermore they are only asymptotically equivalent…)

Aki

jzh · July 28, 2017, 7:06am

Cool! I will try to optimize a bit.

jicun

Andre_Pfeuffer · July 30, 2017, 5:57am

library(rstan)
scode <- "
data {
  real<lower=1> nu;
  real alpha;
}
transformed data {
  real sqrt_nu = sqrt(nu);
}
parameters {
  real<lower=0> V;
  real Z;
}
transformed parameters {
  real T = Z * sqrt(V) * sqrt_nu;
}
model {
  V ~ inv_chi_square(nu); 
  Z ~ skew_normal(0, 1, alpha);
 }
"

foo_data <- list(nu = 4, alpha = 1)

foo <- stan(model_code = scode, data = foo_data, chains = 1, iter = 4000, control=list(adapt_delta=0.8))
T <- extract(foo)$T

avehtari · August 1, 2017, 12:47pm

It just popped in to my mind that Skew normal has erf and chi-squared has gamma so the computation time should be similar (and did post before that fabs is not needed).

Aki

Bob_Carpenter · August 1, 2017, 5:25pm

As to the _log, we still try to maintain backward compatibility and just raise deprecation warnings. We’ll probably eliminate it in Stan 3, which is on its way (though not soon!).

Bob_Carpenter · August 1, 2017, 5:26pm

Since T isn’t used in the model, it’d be more efficient to define it as a generated quantity. That way, you don’t evaluate it’s partial derivatives.

Andre_Pfeuffer · August 1, 2017, 6:19pm

Without fabs, sgt is good then. How often I wished, in cases like this, to have an easy way to specify the gradients directly.
BTW, there is an ST5 function around in the old google forum.

Bob_Carpenter · August 1, 2017, 6:31pm

How would you specify gradients? We’ve never figured out how to do it syntactically given that we want the full Jacobian, but the output is of one shape and there can be any number of arguments of other shapes. If we could flatten everyting out into an R^M -> R^N function we could potentially return a matrix where the first column is the value and the remaining values makes up the Jacobian.

Something like the following could be implemented now to provide a function foo with its Jacobian.

matrix foo_jacobian(vector theta) {
  matrix J = ... jacobian ...
  vector v = ... value ...
  return append_col(v, J);
}

matrix temp = foo_jacobian(theta);
vector y = temp[, 1];  // value
matrix foo_J = temp[ , 2: ];  // Jacobian

You might then want to deal with data variables where there isn’t a Jacobian, which is another round of complications. Maybe the Jacobian could be only w.r.t. the first argument and there could be other arguments. These might be required to be data or autodiffed. You can see the problems we’re going to run into with this.

Topic		Replies	Views
Multivariate skew-t distribution Modeling	13	2502	March 22, 2018
Modelling a multivariate skewed t-distribution Modeling	3	816	March 19, 2018
Hoping for some guidance / help with implementing custom log likelihood and gradient for research project (details below) General	23	2251	October 18, 2021
Estimating parameters from data for skew t distribution Modeling	2	726	November 6, 2019
Skew T distribution in Stan Modeling	7	1699	December 30, 2021

Skewed distribution

Related topics