Posterior predictive sampling from a user-defined probability distribution

I have been working on modeling some financial data with Stan and have settled on fitting a skewed generalized t-distribution to best capture the features of the distribution. I lifted the log-probability of this model from this post from 2017 and was able to successfully fit a model to it.

My question is this: with my fit model, how can I go about sampling from the posterior predictive distribution? Thus far with Stan, I have only fit models that have corresponding built-in _rng functions that makes it simple. As far as I’m aware, there’s no analytic form for the CDF of this distribution, so inverse sampling seems like a no-go. Are there suggestions for efficiently sampling from custom distributions like this within Stan?

Thanks in advance for any responses, I’m fairly new to Stan so any help would be appreciated.


The code used is below - I am still iterating on it, but this should be enough to get the idea of the question I’m asking. Also the model itself is quite slow, so any performance suggestions based on the code below would be appreciated as well!

functions {
    real sgt_lpdf(real x, real mu, real s, real l, real p, real q) {
    // Skewed generalised t
    int N;
    real lz1;
    real lz2;
    real v;
    real m;
    real r;
    real out;
    N = dims(x)[1];
    lz1 = lbeta(1.0/p,q);
    lz2 = lbeta(2.0/p,q-1.0/p);
    v = q^(-1.0/p)*((3*l^2+1)*exp(lbeta(3.0/p,q-2.0/p)-lz1)-4*l^2*exp(lz2-lz1)^2)^(-0.5);
    m = 2*v*s*l*q^(1.0/p)*exp(lz2-lz1);
    out = 0;
    r = x-mu+m;
    if (r<0)
        out = out+log(p)-log(2*v*s*q^(1.0/p)*exp(lz1)*(fabs(r)^p /(q*(v*s)^p*(l*(-1)+1)^p)+1)^(1.0/p+q));
    else
        out = out+log(p)-log(2*v*s*q^(1.0/p)*exp(lz1)*(fabs(r)^p /(q*(v*s)^p*(l*(1)+1)^p)+1)^(1.0/p+q));
    return out;
  }
}
data {
    int<lower=0> J;
    vector[J] X;
}
parameters {
    real alpha;
    real<lower=0> sigma;
    real<lower=-1, upper=0> lambda;
    real<lower=1> p;
    real<lower=3.0/p,upper=50*p> q;
    real<lower=-1, upper=1> beta;
}
transformed parameters {
    vector[J-1] mu;
    mu = alpha + beta*X[1:(J-1)];
}
model {
    alpha ~ normal(0, 1);
    sigma ~ normal(0, 1);
    p ~ normal(2, 1);
    q ~ normal(2, 1);
    lambda ~ normal(0, 1);
    beta ~ normal(0, 1);
    for (j in 2:J) {
        X[j] ~ sgt(mu[j-1], sigma, lambda, p, q);
    }
}
1 Like

This is not easy to do in Stan. A quick search yielded

https://scholar.google.com/scholar?cluster=7282994274836169847&hl=en&as_sdt=0,33&sciodt=0,33

which provides a mixture representation in its equation (4), but you would need a way to draw from a skewed exponential power distribution and a generalized gamma distribution (neither of which are in Stan).

2 Likes

Thanks for the response, @bgoodri.

Probably stepping beyond the scope of this forum, but I imagine my best bet from here would be to use my Stan model to generate posterior draws of my parameters. Then either rolling my own or using an existing sampling library, generate draws from an SGT distribution using my posterior parameters to get my posterior predictive draws. Does that seem like a reasonable approach?

Yeah, there is an R package that has a rsgt function you could use that utilizes the inverse CDF via rootfinding.

1 Like