Hi all!
I’m currently working on a STAN implementation of the Dynamic Nelson-Siegel model for my economics thesis and am hitting some roadblocks when it comes to including stochastic volatility on the latent factors. My model, starting from the specification of Diebold & Li 2006, can be described with a measurement equation in the form
y_t(\tau) = \beta_{1,t} + \left(\frac{1-exp(-\tau\lambda)}{\tau\lambda}\right)\beta_{2,t} + \left(\frac{1-exp(-\tau\lambda)}{\tau\lambda} - exp(-\tau\lambda)\right)\beta_{3,t} + \epsilon_{t,\tau}, \ \epsilon_{t,\tau} \sim \mathcal{N}(0, \sigma^2_{\tau})
and dynamics on the latent factors
F_t = \mu_F + \Phi_FF_{t-1} + \eta_t, \ F_t = (\beta_{1,t}\ \beta_{2,t}\ \beta_{3,t})', \ \Phi_F = diag(\phi_1\ \phi_2\ \phi_3), \ \eta_t \sim \mathcal{N} (0, \Omega_t)
Currently I am fixing \lambda, the factor loading parameter, to 0.0609 (a value commonly used in the literature) to reduce the complexity of the modeling problem. I have success with models that are such that \Omega_t = diag(\sigma_{F,1}^2\ \sigma_{F,2}^2\ \sigma_{F,3}^2)\ \forall t \in \{1,\dots,T\} but when I apply the stochastic volatility dynamics found in this part of the reference guide I get plenty of runtime diagnostic errors, especially low BMFI, as well as high R-hat and low ESS warnings on the factors F and the vols \sigma_f. I am implementing a different process for each of the 3 factors in F following examples in the literature (for example Caldeira et al 2010) since they have different drivers, so there are essentially three separate stochastic volatility processes that are being fit on the latent states.
When I was running the model using the prior specification for the stochastic vol process found in this article (the STAN guide), the model was putting all of the “weight” of the process on the mean \mu_\sigma and essentially setting the autoregressive coefficient \phi = 0, but when I tighten the priors around \mu, \phi to put more emphasis on the autoregressive nature of the process, I can’t shake the errors and the fitted values are too large. My attempt at remedying this can be seen in the below code, where I change the initial \sigma_F from one determined by the parameters of the stochastic volatility process to one where I am setting the initial value to be distributed inv\_gamma(3,1), which is the prior specification I gave my time-invariant \sigma_F in the working model. This did not work.
Is there something wrong with my estimation strategy at large? I have seen some of the things Jeff Arnold has written about state space modeling in STAN (here) and I know that these models are usually sampled via multimove Gibbs, but is putting a Kalman filter/smoother in STAN a viable approach? I wasn’t really sure what to do with the functions that he implemented in the blog either, but I have a feeling that is down to my unfamiliarity with STAN. Here’s the code of the model I have so far:
data {
int<lower=1> dY; //Dimension of yields
int<lower=1> T; //Number of observations
int<lower=1> dF; //Dimension of the factors (3)
matrix[T,dY] yields; //Yields data
vector[dF] init_F; // initial F, determined by OLS
vector<lower=0>[dY] taus; //Array of values for tau maturity points (in months)
real<lower=0> lambda; //factor loading parameter
//Prior params
real<lower=0> sigmay_alpha; //alpha parameter for sigmay invgamma
real<lower=0> sigmay_beta; //beta parameter for sigmay invgamma
real<lower=0> sigmay_scale; //scaling for the draws from sigmay
//Prior params for the factor VAR parameters
real xiF_mean;
real<lower=0>xiF_std;
real muF_mean;
real<lower=0> muF_std;
}
parameters {
matrix[T,dF] F; //factors
vector<lower=0>[dY] sigmay; //measurement errors
vector<lower=-1, upper=1>[dF] xiF; //factor evolution matrix (diagonal)
vector[dF] muF; //factor evolution trend
array[dF] vector<lower=0>[T-2] sigmaf_std; //unscaled draws
array[dF] real<lower=-1, upper=1>phi; //array of autoregressive params
array[dF] real<lower=0> sigma; //array of scaling params
array[dF] real mu_sf; //trend of stochastic vol
array[dF] real sigmaf_init; //initial value for stochastic vols
}
transformed parameters {
matrix[T-1, dF] delta_F;
for (t in 2:T)
{
//Model the differences as per Bob Carpenter's comment in earlier discussion of latent models
delta_F[t-1] = transpose(F[t]' - muF - diag_matrix(xiF)*F[t-1]');
};
//Essentially follows the process outlined in the reference guide except for the initial value
array[dF] vector[T-1] sigmaf;
for (i in 1:dF)
{
sigmaf[i] = append_row(log(sigmaf_init[i]), sigmaf_std[i] * sigma[i]);
}
for (t in 2:T-1)
{
for (i in 1:dF)
{
sigmaf[i][t] += phi[i] * (sigmaf[i][t-1] - mu_sf[i]) + mu_sf[i];
}
}
}
model {
//PRIORS//
xiF ~ normal(xiF_mean, xiF_std);
muF ~ normal(muF_mean, muF_std);
sigmay ~ inv_gamma(sigmay_alpha, sigmay_beta*sigmay_scale);
sigmaf_init ~ inv_gamma(3,1);
for (i in 1:dF)
{
sigmaf_std[i] ~ std_normal();
phi[i] ~ normal(0.95,0.1);
sigma[i] ~ weibull(1,1);
mu_sf[i] ~ normal(0,0.1);
}
F[1] ~ normal(init_F, 0.1);
//END PRIORS//
//DATA//
for (t in 1:T-1)
{
for (i in 1:dF)
{
delta_F[t,i] ~ normal(0, exp(sigmaf[i][t]/2));
}
}
matrix[dY, dF] PhiV;
for (i in 1:dY) {
real t = taus[i];
PhiV[i,1] = 1;
PhiV[i,2] = (1-exp(-t*lambda))/(t*lambda);
PhiV[i,3] = (1-exp(-t*lambda))/(t*lambda) - exp(-t*lambda);
}
for (t in 1:T)
{
yields[t] ~ normal(PhiV*F[t]', sigmay);
}
//END DATA//
}
I am calling this model via cmdstanpy 1.2.0, cmdstan 2.33.1, on Debian 12 with the following prior specification
stan_data = {}
stan_data["sigmay_alpha"] = 101
stan_data["sigmay_beta"] = 510
stan_data["sigmay_scale"] = 10**(-2)
stan_data["xiF_mean"] = 0.99
stan_data["xiF_std"] = 0.25
stan_data["muF_mean"] = 0
stan_data["muF_std"] = 1
stan_data["lambda"] = 0.0609
stan_data["taus"] = np.arange(1,16)*12
stan_data["yields"] = period1
stan_data["dY"] = period1.shape[1] #just the number of yields
stan_data["T"] = period1.shape[0] #equivalent to T
stan_data["dF"] = 3
#Determine initial factor vector
first_obs = period1[0,:]
PhiV = np.ones((len(first_obs), stan_data["dF"]))
l = 0.0609
for i in range(stan_data["dY"]):
t = stan_data["taus"][i]
PhiV[i,1] = (1-np.exp(-t*l))/(t*l)
PhiV[i,2] = (1-np.exp(-t*l))/(t*l) - np.exp(-t*l)
xtx = np.linalg.inv(np.matmul(PhiV.T, PhiV))
xty = np.matmul(PhiV.T, first_obs)
stan_data["init_F"] = np.matmul(xtx, xty)
I’ve also attached a sample of the data I am attempting to fit. Any suggestions on the model (not just about how to remedy the issues with stochastic vol) are very welcome; I am new to STAN and Bayesian modeling more broadly but am really enjoying learning everything I can!
Thank you in advance for your help,
A confused undergrad
test_data_stan.csv (36.6 KB)