Semiparametric Heavy Tailed Distributions in Stan

I have been working with some financial data where the t distribution doesn’t quite capture the downside risk. Even with low degrees of freedom, the t distribution is not quite as highly peaked as the empirical distribution and parts of the tail aren’t as fat either. Also, the left tail and right tail tend to have different degrees of fatness, resulting in skewness in the empirical distribution that is absent in the t distribution (requiring some kind of skew-t distribution to capture properly).

Inspired by an old Matlab blog post, I took a semiparametric approach in Stan (see code below) to break apart the distribution into three parts. A normally distributed middle and then each tail fit to its own pareto distribution. This allows for a higher peak in the middle and fatter (and asymmetric) tails. In order to better match the shape of the empirical distribution, I ensure that the contribution of each component of the distribution matches its share of the empirical distribution. Note that I use a trunc_pt of 1.5, which means that after standardizing the data, anything between -1.5 and +1.5 is the middle section (and fit with a normal distribution) and so on for the tails.

While the pdf is a bit discontinuous, the cdf is smoother and compares favorably to the empirical cdf (and better than the t distribution when eyeballing). Simulations from this distribution also seem to do a better job of capturing the empirical distribution than the t-distribution. The skewness is a bit more negative than the empirical distribution (vs. 0 for the t). The kurtosis is quite a bit larger (vs. a little bit less for the t). The pareto distribution can allow for some really extreme values that would typically not get observed in the real world (along with more simulations than observations, allowing more extreme values to show up). Tweaking the trunc_pt parameter or adjusting the priors on the tail parameters might result in better results.

Some other notes:

  • I used loo to compare this approach vs. the normal and t distribution and this version does a little bit better than the t and both do better than the normal.
  • On my machine it is still pretty fast (for Stan at least), taking around 5s per chain. It takes a little bit longer to fit than the t distribution (around 1.5s per chain).
  • There’s quite a bit in the generated quantities block, but most of it us just optional stuff to get calculations that I needed or found useful. The rest of it is pretty simple.

All bugs or errors are my own (some of the calculations were a bit tricky like the standard deviation or the log-likelihood)

data {
    int<lower=1> T;
    vector[T] y;
    real<lower=0> trunc_pt;
transformed data {
    vector[T] z;
    real mu_y0 = mean(y);
    real sd_y0 = sd(y);
    real count_low = 0;
    real count_mid = 0;
    real count_high = 0;
    real share_low;
    real share_mid;
    real share_high;
    real log_share_low;
    real log_share_mid;
    real log_share_high;

    for (i in 1:T) {
        z[i] = (y[i] - mu_y0) / sd_y0;
        if (z[i] <= trunc_pt && z[i] >= -trunc_pt) {
            count_mid += 1;
        } else if (z[i] > trunc_pt) {
            count_high += 1;
        } else {
            count_low += 1;
    share_low = count_low / T;
    share_mid = count_mid / T;
    share_high = count_high / T;
    log_share_low = log(share_low);
    log_share_mid = log(share_mid);
    log_share_high = log(share_high);
parameters {
    real mu_z_mid;
    real<lower=0> sd_z_mid;
    real<lower=2.005> alpha_z_up; //variance does not exist < 2, lower bound slightly above 2 to limit very fat tail outcome
    real<lower=2.005> alpha_z_down; //variance does not exist < 2, lower bound slightly above 2 to limit very fat tail outcome
model {
    mu_z_mid ~ normal(0, 0.1);
    sd_z_mid ~ cauchy(1, 0.25);
    alpha_z_up ~ gamma(4, 1);
    alpha_z_down ~ gamma(4, 1);

    for (i in 1:T) {
        if (z[i] <= trunc_pt && z[i] >= -trunc_pt) {
            z[i] ~ normal(mu_z_mid, sd_z_mid) T[-trunc_pt, trunc_pt];
        } else if (z[i] > trunc_pt) {
            z[i] ~ pareto(trunc_pt, alpha_z_up);
        } else {
            (-z[i]) ~ pareto(trunc_pt, alpha_z_down);
    target += count_low * log_share_low + count_mid * log_share_mid + count_high * log_share_high;
generated quantities {
    real mu_z;
    real sd_z;
    real mu_y;
    real sd_y;
    real z_sim;
    real y_sim;
    vector[T] log_lik;

        real normal_cdf_diff;
        real a_z0;
        real b_z0;
        real mu_z0;
        real mu_z1;
        real mu_z2;
        real var_z0;
        real var_z1;
        real var_z2;
        real E2_z0;
        real E2_z1;
        real E2_z2;
        real var_z;

        normal_cdf_diff = normal_cdf(trunc_pt, mu_z_mid, sd_z_mid) - normal_cdf(-trunc_pt, mu_z_mid, sd_z_mid);
        a_z0 = (-trunc_pt - mu_z_mid) / sd_z_mid;
        b_z0 = (trunc_pt - mu_z_mid) / sd_z_mid;
        mu_z0 = mu_z_mid + sd_z_mid * (exp(normal_lpdf(a_z0 | 0, 1)) - exp(normal_lpdf(b_z0 | 0, 1))) / normal_cdf_diff;
        mu_z1 = (alpha_z_up * trunc_pt) / (alpha_z_up - 1);
        mu_z2 = -(alpha_z_down * trunc_pt) / (alpha_z_down - 1);

        mu_z = share_mid * mu_z0 + share_high * mu_z1 + share_low * mu_z2;

        var_z0 = 1.0;
        var_z0 = var_z0 - (b_z0 * exp(normal_lpdf(b_z0 | 0, 1)) - a_z0 * exp(normal_lpdf(a_z0 | 0, 1))) / normal_cdf_diff;
        var_z0 = var_z0 - ((exp(normal_lpdf(b_z0 | 0, 1)) - exp(normal_lpdf(a_z0 | 0, 1))) / normal_cdf_diff) ^ 2;
        var_z0 = var_z0 * (sd_z_mid ^ 2);
        var_z1 = (alpha_z_up * trunc_pt ^ 2) / ((alpha_z_up - 1) ^ 2 * (alpha_z_up - 2));
        var_z2 = (alpha_z_down * trunc_pt ^ 2) / ((alpha_z_down - 1) ^ 2 * (alpha_z_down - 2));

        E2_z0 = var_z0 + mu_z0 ^ 2;
        E2_z1 = var_z1 + mu_z1 ^ 2;
        E2_z2 = var_z2 + mu_z2 ^ 2;

        var_z = (share_mid * E2_z0 + share_high * E2_z1 + share_low * E2_z2) - (mu_z ^ 2);
        sd_z = sqrt(var_z);
        real u_sim;
        u_sim = uniform_rng(0, 1);
        if (u_sim < share_low) {
            z_sim = -(trunc_pt * pow((1 - u_sim / share_low), -1.0 / alpha_z_down));
        } else if (u_sim < (1 - share_high)) {
            z_sim = inv_Phi(u_sim) * sd_z_mid + mu_z_mid;
        } else {
            z_sim = trunc_pt * pow((1 - (u_sim - (1 - share_high)) / share_high), -1.0 / alpha_z_up);
    mu_y = mu_z * sd_y0 + mu_y0;
    sd_y = sd_z * sd_y0;
    y_sim = z_sim * sd_y0 + mu_y0;
    for (i in 1:T) {
        if (z[i] <= trunc_pt && z[i] >= -trunc_pt) {
            log_lik[i] = normal_lpdf(z[i] | mu_z_mid, sd_z_mid);
            log_lik[i] -= log_diff_exp(normal_lcdf(trunc_pt | mu_z_mid, sd_z_mid),
                                       normal_lcdf(-trunc_pt | mu_z_mid, sd_z_mid));
            log_lik[i] += log_share_mid;
        } else if (z[i] > trunc_pt) {
            log_lik[i] = pareto_lpdf(z[i] | trunc_pt, alpha_z_up);
            log_lik[i] += log_share_high;
        } else {
            log_lik[i] = pareto_lpdf(-z[i] | trunc_pt, alpha_z_down);
            log_lik[i] += log_share_low;


Cool and thanks for sharing! A bit faster is replacing all those normal_lpdf(x | 0, 1) with std_normal_lpdf(x), it saves on the normalization that takes place in the normal_lpdf.

If you add graphs and expand on it a bit this would be a great case study. We don’t have many for 2022 so I’d be happy to help you if you want to get it in before the end of the year.

Thanks. I’ve done a number of charts when working through this, but they were just using base R plots and not yet put in a form worth sharing.

What format would a case study need to be in?

I’ve seen Rmarkdown and others. Probably Quarto works too. @WardBrian just posted a case study, maybe he can chime in (and Brian what syntax highlighter did you use for that case study?).

The link Stan - Case Studies has more information.

I wrote the HoloML in Stan case study in a Jupyter notebook. R markdown or Quarto would also be good choices I think.

The highlighter was the tango theme for pandoc. I believe the Stan docs use the same one


I’ve started writing something up in Quarto. I may also need to get permission from my employer.

It also occurs to me that this technique really isn’t semiparametric. The original approach in Matlab used Gaussian kernel smoothing in the middle of the distribution. I have replaced that with fitting a parametric normal. So “Piecewise Heavy Tailed Distributions in Stan” might be a better name.


Anyone know how to get the autowrite functionality in rstan working with Quarto/knitr? The default options recompile everything from scratch each time I render (unless I just skip those parts of the document).

I tried cache=TRUE, but that doesn’t seem to make any difference.

Ah, so cache=TRUE does indeed cache, but only seemed to start working after closing out R and opening it back up again. And of course, it caches the whole result of the cell, rather than having the same behavior as autowrite.

1 Like

Please see below for a first draft (you might need to download it and open up separately to look right, I wasn’t able to upload the qmd file here).
Semiparametric_fat_tails_in_Stan.html (70.0 KB)

I had COVID over the weekend, so it ended up taking a little longer than I was hoping to get the first draft done.

My boss gave me the go-ahead to work on it, but still might need some more approvals or something before it gets published.

Open to any feedback.


It seems to be missing figures. There is an option to include the images within html to make it stand alone Quarto HTML Self contained


Thanks! I think when I downloaded it and opened it separately it must have pulled in the images from the cache. This version should work.
Piecewise_fat_tails_in_Stan.html (1.1 MB)


Cool, works now