Stochastic Volatility for Latent Factor Models

Hi all!

I’m currently working on a STAN implementation of the Dynamic Nelson-Siegel model for my economics thesis and am hitting some roadblocks when it comes to including stochastic volatility on the latent factors. My model, starting from the specification of Diebold & Li 2006, can be described with a measurement equation in the form

y_t(\tau) = \beta_{1,t} + \left(\frac{1-exp(-\tau\lambda)}{\tau\lambda}\right)\beta_{2,t} + \left(\frac{1-exp(-\tau\lambda)}{\tau\lambda} - exp(-\tau\lambda)\right)\beta_{3,t} + \epsilon_{t,\tau}, \ \epsilon_{t,\tau} \sim \mathcal{N}(0, \sigma^2_{\tau})

and dynamics on the latent factors

F_t = \mu_F + \Phi_FF_{t-1} + \eta_t, \ F_t = (\beta_{1,t}\ \beta_{2,t}\ \beta_{3,t})', \ \Phi_F = diag(\phi_1\ \phi_2\ \phi_3), \ \eta_t \sim \mathcal{N} (0, \Omega_t)

Currently I am fixing \lambda, the factor loading parameter, to 0.0609 (a value commonly used in the literature) to reduce the complexity of the modeling problem. I have success with models that are such that \Omega_t = diag(\sigma_{F,1}^2\ \sigma_{F,2}^2\ \sigma_{F,3}^2)\ \forall t \in \{1,\dots,T\} but when I apply the stochastic volatility dynamics found in this part of the reference guide I get plenty of runtime diagnostic errors, especially low BMFI, as well as high R-hat and low ESS warnings on the factors F and the vols \sigma_f. I am implementing a different process for each of the 3 factors in F following examples in the literature (for example Caldeira et al 2010) since they have different drivers, so there are essentially three separate stochastic volatility processes that are being fit on the latent states.

When I was running the model using the prior specification for the stochastic vol process found in this article (the STAN guide), the model was putting all of the “weight” of the process on the mean \mu_\sigma and essentially setting the autoregressive coefficient \phi = 0, but when I tighten the priors around \mu, \phi to put more emphasis on the autoregressive nature of the process, I can’t shake the errors and the fitted values are too large. My attempt at remedying this can be seen in the below code, where I change the initial \sigma_F from one determined by the parameters of the stochastic volatility process to one where I am setting the initial value to be distributed inv\_gamma(3,1), which is the prior specification I gave my time-invariant \sigma_F in the working model. This did not work.

Is there something wrong with my estimation strategy at large? I have seen some of the things Jeff Arnold has written about state space modeling in STAN (here) and I know that these models are usually sampled via multimove Gibbs, but is putting a Kalman filter/smoother in STAN a viable approach? I wasn’t really sure what to do with the functions that he implemented in the blog either, but I have a feeling that is down to my unfamiliarity with STAN. Here’s the code of the model I have so far:

data {
    int<lower=1> dY; //Dimension of yields 
    int<lower=1> T; //Number of observations
    int<lower=1> dF; //Dimension of the factors (3)
    matrix[T,dY] yields; //Yields data
    vector[dF] init_F; // initial F, determined by OLS
    vector<lower=0>[dY] taus; //Array of values for tau maturity points (in months)
    real<lower=0> lambda; //factor loading parameter

    //Prior params
    real<lower=0> sigmay_alpha; //alpha parameter for sigmay invgamma
    real<lower=0> sigmay_beta; //beta parameter for sigmay invgamma
    real<lower=0> sigmay_scale; //scaling for the draws from sigmay

    //Prior params for the factor VAR parameters
    real xiF_mean;
    real<lower=0>xiF_std;
    real muF_mean;
    real<lower=0> muF_std;
}
parameters {
    matrix[T,dF] F; //factors
    vector<lower=0>[dY] sigmay; //measurement errors
    vector<lower=-1, upper=1>[dF] xiF; //factor evolution matrix (diagonal)
    vector[dF] muF; //factor evolution trend

    array[dF] vector<lower=0>[T-2] sigmaf_std; //unscaled draws
    array[dF] real<lower=-1, upper=1>phi; //array of autoregressive params
    array[dF] real<lower=0> sigma; //array of scaling params
    array[dF] real mu_sf; //trend of stochastic vol
    array[dF] real sigmaf_init; //initial value for stochastic vols
}
transformed parameters {

    matrix[T-1, dF] delta_F;
    for (t in 2:T)
    {
            //Model the differences as per Bob Carpenter's comment in earlier discussion of latent models
            delta_F[t-1] = transpose(F[t]' - muF - diag_matrix(xiF)*F[t-1]');
    };
    
    //Essentially follows the process outlined in the reference guide except for the initial value
    array[dF] vector[T-1] sigmaf;
    for (i in 1:dF)
    {
        sigmaf[i] = append_row(log(sigmaf_init[i]), sigmaf_std[i] * sigma[i]);
    }
   
    for (t in 2:T-1)
    {
        for (i in 1:dF)
        {
            sigmaf[i][t] += phi[i] * (sigmaf[i][t-1] - mu_sf[i]) + mu_sf[i];
        }
    }
}
model {
    //PRIORS//
    xiF ~ normal(xiF_mean, xiF_std);
    muF ~ normal(muF_mean, muF_std);

    sigmay ~ inv_gamma(sigmay_alpha, sigmay_beta*sigmay_scale);
    sigmaf_init ~ inv_gamma(3,1);
    for (i in 1:dF)
    {
        sigmaf_std[i] ~ std_normal();
        phi[i] ~ normal(0.95,0.1);
        sigma[i] ~ weibull(1,1);
        mu_sf[i] ~ normal(0,0.1); 
    }
    F[1] ~ normal(init_F, 0.1);
    //END PRIORS//

    //DATA//
    for (t in 1:T-1)
    {
        for (i in 1:dF)
        {
            delta_F[t,i] ~ normal(0, exp(sigmaf[i][t]/2));
        }
    }
    matrix[dY, dF] PhiV;
    for (i in 1:dY) {
        real t = taus[i];
        PhiV[i,1] = 1;
        PhiV[i,2] = (1-exp(-t*lambda))/(t*lambda);
        PhiV[i,3] = (1-exp(-t*lambda))/(t*lambda) - exp(-t*lambda);
    } 

    for (t in 1:T)
    {
        yields[t] ~ normal(PhiV*F[t]', sigmay);
    }
    //END DATA//
}

I am calling this model via cmdstanpy 1.2.0, cmdstan 2.33.1, on Debian 12 with the following prior specification

    stan_data = {}
    stan_data["sigmay_alpha"] = 101
    stan_data["sigmay_beta"] = 510
    stan_data["sigmay_scale"] = 10**(-2)
    
    stan_data["xiF_mean"] = 0.99
    stan_data["xiF_std"] = 0.25
    stan_data["muF_mean"] = 0
    stan_data["muF_std"] = 1    

    stan_data["lambda"] = 0.0609
    stan_data["taus"] = np.arange(1,16)*12
    stan_data["yields"] = period1
    stan_data["dY"] = period1.shape[1] #just the number of yields
    stan_data["T"] = period1.shape[0] #equivalent to T
    stan_data["dF"] = 3

    #Determine initial factor vector
    first_obs = period1[0,:]
    PhiV = np.ones((len(first_obs), stan_data["dF"]))
    l = 0.0609
    for i in range(stan_data["dY"]):
        t = stan_data["taus"][i]
        PhiV[i,1] = (1-np.exp(-t*l))/(t*l)
        PhiV[i,2] = (1-np.exp(-t*l))/(t*l) - np.exp(-t*l)

    xtx = np.linalg.inv(np.matmul(PhiV.T, PhiV))
    xty = np.matmul(PhiV.T, first_obs)

    stan_data["init_F"] = np.matmul(xtx, xty) 

I’ve also attached a sample of the data I am attempting to fit. Any suggestions on the model (not just about how to remedy the issues with stochastic vol) are very welcome; I am new to STAN and Bayesian modeling more broadly but am really enjoying learning everything I can!

Thank you in advance for your help,

A confused undergrad
test_data_stan.csv (36.6 KB)

Yes. If you literally want to implement a Kalman filter, then you can implement the conjugate posterior directly and don’t really need Stan. (I prefer to write “Stan” because it’s not an acronym.). You can exploit the conjugacy in Stan in some circumstances.

I don’t understand your bigger model (I’m not an experienced Kalman filter user!), but I can say that the best way I’ve found to build up these big programs is to start with something simple that works and then add one feature at a time. Then you can diagnose where non-identifiabilities or other bad posterior geometries enter the picture.

Very weird seeing my name in comments! There are some issues with this:

parameters {
  matrix[T, dF] F; //factors
  vector[dF] muF; //factor evolution trend
  vector<lower=-1, upper=1>[dF] xiF;
...
transformed parameters {
matrix[T-1, dF] delta_F;
for (t in 2:T)
  delta_F[t-1] = transpose(F[t]' - muF - diag_matrix(xiF)*F[t-1]')

First, this is going to be an identifiability problem because all of those terms are parameters, leading to a three-way additive non-identiability as well as a multiplicative non-identifiability between xiF and F. There’s also a much more efficient way to implement.

for (t in 2:T)
    delta_F[t - 1] = F[t] - muF' - xiF' .* F[t - 1];

Declaring muF and xiF as row vectors would avoid the remaining transpositions. The problem with diag_matrix is that it creates the full N x N matrix and then multiplies and autodiffs through all those elements. I just replaced with elementwise multiply, which has the same effect.

Things like this that take parameters off the unit scale can cause fitting difficulty:

    stan_data["sigmay_scale"] = 10**(-2)

You a mitigate by putting the equivalent <multiplier=sigmay_scale> on the parameter for which it’s the scale. Same for the very big scales. Having scales range between 0.01 and 500 is 5 orders of magnitude and can lead to stiffness in the Hamiltonian dynamics.

things like this can get unfolded to avoid loops, which helps minimize the size of data structures created for autodiff:

for (i in 1:dF) {
  sigmaf[i, t] += phi[i] * (sigmaf[i, t-1] - mu_sf[i]) + mu_sf[i];
}

can be

sigmaf[ , t] += phi .* (sigma[ , t - 1] - mu_sf) + mu_sf;

For that to work, mu_sf and phi will have to be declared as vectors.

the other loops can be simplified in similar ways. But for append_row, it’s better to just assign twice if you already have the vector, as in converting

sigmaf[i] = append_row(log(sigmaf_init[i]), sigmaf_std[i] * sigma[i]);

to

sigma[1:dF - 1] = log(sigmaf_init[i]);
sigma[dF] = sigmaf_std[i] * sigma[i];

as it avoids the allocation of a vector that’s then immediately reassigned.

this

for (t in 1:T)
  yields[t] ~ normal(PhiV*F[t]', sigmay);

can be just

to_vector(yields) ~ normal(to_vector(PhiV * F'), sigmay);

Again, it would help to declare F in transposed form if this is how it’s used.