I have been working with some financial data where the t distribution doesn’t quite capture the downside risk. Even with low degrees of freedom, the t distribution is not quite as highly peaked as the empirical distribution and parts of the tail aren’t as fat either. Also, the left tail and right tail tend to have different degrees of fatness, resulting in skewness in the empirical distribution that is absent in the t distribution (requiring some kind of skew-t distribution to capture properly).
Inspired by an old Matlab blog post, I took a semiparametric approach in Stan (see code below) to break apart the distribution into three parts. A normally distributed middle and then each tail fit to its own pareto distribution. This allows for a higher peak in the middle and fatter (and asymmetric) tails. In order to better match the shape of the empirical distribution, I ensure that the contribution of each component of the distribution matches its share of the empirical distribution. Note that I use a trunc_pt
of 1.5, which means that after standardizing the data, anything between -1.5 and +1.5 is the middle section (and fit with a normal distribution) and so on for the tails.
While the pdf is a bit discontinuous, the cdf is smoother and compares favorably to the empirical cdf (and better than the t distribution when eyeballing). Simulations from this distribution also seem to do a better job of capturing the empirical distribution than the t-distribution. The skewness is a bit more negative than the empirical distribution (vs. 0 for the t). The kurtosis is quite a bit larger (vs. a little bit less for the t). The pareto distribution can allow for some really extreme values that would typically not get observed in the real world (along with more simulations than observations, allowing more extreme values to show up). Tweaking the trunc_pt
parameter or adjusting the priors on the tail parameters might result in better results.
Some other notes:
- I used loo to compare this approach vs. the normal and t distribution and this version does a little bit better than the t and both do better than the normal.
- On my machine it is still pretty fast (for Stan at least), taking around 5s per chain. It takes a little bit longer to fit than the t distribution (around 1.5s per chain).
- There’s quite a bit in the generated quantities block, but most of it us just optional stuff to get calculations that I needed or found useful. The rest of it is pretty simple.
All bugs or errors are my own (some of the calculations were a bit tricky like the standard deviation or the log-likelihood)
data {
int<lower=1> T;
vector[T] y;
real<lower=0> trunc_pt;
}
transformed data {
vector[T] z;
real mu_y0 = mean(y);
real sd_y0 = sd(y);
real count_low = 0;
real count_mid = 0;
real count_high = 0;
real share_low;
real share_mid;
real share_high;
real log_share_low;
real log_share_mid;
real log_share_high;
for (i in 1:T) {
z[i] = (y[i] - mu_y0) / sd_y0;
if (z[i] <= trunc_pt && z[i] >= -trunc_pt) {
count_mid += 1;
} else if (z[i] > trunc_pt) {
count_high += 1;
} else {
count_low += 1;
}
}
share_low = count_low / T;
share_mid = count_mid / T;
share_high = count_high / T;
log_share_low = log(share_low);
log_share_mid = log(share_mid);
log_share_high = log(share_high);
}
parameters {
real mu_z_mid;
real<lower=0> sd_z_mid;
real<lower=2.005> alpha_z_up; //variance does not exist < 2, lower bound slightly above 2 to limit very fat tail outcome
real<lower=2.005> alpha_z_down; //variance does not exist < 2, lower bound slightly above 2 to limit very fat tail outcome
}
model {
mu_z_mid ~ normal(0, 0.1);
sd_z_mid ~ cauchy(1, 0.25);
alpha_z_up ~ gamma(4, 1);
alpha_z_down ~ gamma(4, 1);
for (i in 1:T) {
if (z[i] <= trunc_pt && z[i] >= -trunc_pt) {
z[i] ~ normal(mu_z_mid, sd_z_mid) T[-trunc_pt, trunc_pt];
} else if (z[i] > trunc_pt) {
z[i] ~ pareto(trunc_pt, alpha_z_up);
} else {
(-z[i]) ~ pareto(trunc_pt, alpha_z_down);
}
}
target += count_low * log_share_low + count_mid * log_share_mid + count_high * log_share_high;
}
generated quantities {
real mu_z;
real sd_z;
real mu_y;
real sd_y;
real z_sim;
real y_sim;
vector[T] log_lik;
{
real normal_cdf_diff;
real a_z0;
real b_z0;
real mu_z0;
real mu_z1;
real mu_z2;
real var_z0;
real var_z1;
real var_z2;
real E2_z0;
real E2_z1;
real E2_z2;
real var_z;
normal_cdf_diff = normal_cdf(trunc_pt, mu_z_mid, sd_z_mid) - normal_cdf(-trunc_pt, mu_z_mid, sd_z_mid);
a_z0 = (-trunc_pt - mu_z_mid) / sd_z_mid;
b_z0 = (trunc_pt - mu_z_mid) / sd_z_mid;
mu_z0 = mu_z_mid + sd_z_mid * (exp(normal_lpdf(a_z0 | 0, 1)) - exp(normal_lpdf(b_z0 | 0, 1))) / normal_cdf_diff;
mu_z1 = (alpha_z_up * trunc_pt) / (alpha_z_up - 1);
mu_z2 = -(alpha_z_down * trunc_pt) / (alpha_z_down - 1);
mu_z = share_mid * mu_z0 + share_high * mu_z1 + share_low * mu_z2;
var_z0 = 1.0;
var_z0 = var_z0 - (b_z0 * exp(normal_lpdf(b_z0 | 0, 1)) - a_z0 * exp(normal_lpdf(a_z0 | 0, 1))) / normal_cdf_diff;
var_z0 = var_z0 - ((exp(normal_lpdf(b_z0 | 0, 1)) - exp(normal_lpdf(a_z0 | 0, 1))) / normal_cdf_diff) ^ 2;
var_z0 = var_z0 * (sd_z_mid ^ 2);
var_z1 = (alpha_z_up * trunc_pt ^ 2) / ((alpha_z_up - 1) ^ 2 * (alpha_z_up - 2));
var_z2 = (alpha_z_down * trunc_pt ^ 2) / ((alpha_z_down - 1) ^ 2 * (alpha_z_down - 2));
E2_z0 = var_z0 + mu_z0 ^ 2;
E2_z1 = var_z1 + mu_z1 ^ 2;
E2_z2 = var_z2 + mu_z2 ^ 2;
var_z = (share_mid * E2_z0 + share_high * E2_z1 + share_low * E2_z2) - (mu_z ^ 2);
sd_z = sqrt(var_z);
}
{
real u_sim;
u_sim = uniform_rng(0, 1);
if (u_sim < share_low) {
z_sim = -(trunc_pt * pow((1 - u_sim / share_low), -1.0 / alpha_z_down));
} else if (u_sim < (1 - share_high)) {
z_sim = inv_Phi(u_sim) * sd_z_mid + mu_z_mid;
} else {
z_sim = trunc_pt * pow((1 - (u_sim - (1 - share_high)) / share_high), -1.0 / alpha_z_up);
}
}
mu_y = mu_z * sd_y0 + mu_y0;
sd_y = sd_z * sd_y0;
y_sim = z_sim * sd_y0 + mu_y0;
for (i in 1:T) {
if (z[i] <= trunc_pt && z[i] >= -trunc_pt) {
log_lik[i] = normal_lpdf(z[i] | mu_z_mid, sd_z_mid);
log_lik[i] -= log_diff_exp(normal_lcdf(trunc_pt | mu_z_mid, sd_z_mid),
normal_lcdf(-trunc_pt | mu_z_mid, sd_z_mid));
log_lik[i] += log_share_mid;
} else if (z[i] > trunc_pt) {
log_lik[i] = pareto_lpdf(z[i] | trunc_pt, alpha_z_up);
log_lik[i] += log_share_high;
} else {
log_lik[i] = pareto_lpdf(-z[i] | trunc_pt, alpha_z_down);
log_lik[i] += log_share_low;
}
}
}