Covariance issues with Dynamic Factor Model with large N

I am fairly new to Stan and I am experimenting with a Bayesian Dynamic Factor Model. The model has a large number of time series, N>75, and I want to estimate K~5 factors that capture most of the variation of these time series. Formally, the model is the following:

Y_t = \Lambda F_t + \Psi Y_{t-1} + \epsilon_t \quad\quad \epsilon \sim N \left(0, R\right)\\ F_t = \Phi F_{t-1} + \nu_t \quad\quad \nu \sim N\left(0, I\right)

Where Y_t is a N \times 1 vector of observations at time t, F_t is a K\times 1 vector of unobserved factors, \Lambda is the N \times K loadings matrix, \Psi and \Phi are VAR(1) transition matrices, \epsilon and \nu are appropriately sized vectors of errors.

My issues are with N>75, when the onion method fails and I cannot start sampling since the covariance matrix is not appropriate and/or contains nan values. Also, I get occasionally the following error

Chain 1: Rejecting initial value:
Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 1:   Stan can't start sampling from this initial value.

Strangely enough, though, I also seem not to sample correctly for one component of the onion machinery. Indeed, L ends up containing only constant values and inspecting Sigma I see correlation values which I cannot really interpret compared to observed ones.

Warning messages:
1: In validityMethod(object) :
  The following variables have undefined values:  L[1,2],The following variables have undefined values:  L[1,3],The following variables have undefined values:  L[2,3],The following variables have undefined values:  L[1,4],The following variables have undefined values:  L[2,4],

Now, is there something wrong with my Stan code that I should amend? Is it the onion implementation or some other part that does not check with the model outlined above?

Here is the full stan code I am working on

data {
    int<lower=0>    T;          // Number of time periods
    int<lower=0>    N;          // Number of time series
    int<lower=0>    K;          // Number of factors
    matrix[N, T]    Y;          // Scaled observed time series data, T cols are periods

parameters {
    matrix[N, K]    loadings;   // loadings
    matrix[K, K]    Phi;   // VAR(1) matrix for factors
    matrix[N, N]    Psi;     // VAR(1) matrix for observables
    matrix[K, T]    nu_s;      // errors on measurement equation
 // onion
  vector[choose(N, 2) - 1]  l;         // do NOT init with 0 for all elements
  vector<lower = 0,upper = 1>[N-1] R2; // first element is not really a R^2 but is on (0,1)      

transformed parameters {
   matrix[K, T]     factors;    // factors for the state equation
   // init factors and sample from the error to speed up
   factors[:, 1] = rep_vector(1, K);
   for (t in 2:T) {
    factors[:, t] = Phi*factors[:, t-1] + eps_s[:, t];
   // onion
   real<lower=0> eta = 2;
   real<lower = 0> alpha = eta + (N - 2) / 2.0;
   vector<lower = 0>[N-1] shape1;
   vector<lower = 0>[N-1] shape2;
   shape1[1] = alpha;
   shape2[1] = alpha;
   for(n in 2:(N-1)) {
     alpha = alpha - 0.5;
     shape1[n] = n / 2.0;
     shape2[n] = alpha;
  matrix[N,N] Sigma;
  matrix[N,N] L;
    int start = 1;
    int end = 2;

    L[1,1] = 1.0;
    L[2,1] = 2.0 * R2[1] - 1.0;
    L[2,2] = sqrt(1.0 - square(L[2,1]));
    for(n in 2:(N-1)) {
      int kp1 = n + 1;
      vector[n] l_row = segment(l, start, n);
      real scale = sqrt(R2[n] / dot_self(l_row));
      for(j in 1:n) L[kp1,j] = l_row[j] * scale;
      L[kp1,kp1] = sqrt(1.0 - R2[n]);
      start = end + 1;
      end = start + n - 1;
    Sigma = multiply_lower_tri_self_transpose(L);

model {
    // one restriction on the loadings
    to_vector(loadings[:, K]) ~ normal(1, .01);
    to_vector(loadings[:, 1:(K-1)]) ~ std_normal();
    to_vector(Phi) ~ std_normal();
    to_vector(eps_s) ~ std_normal();
    to_vector(Psi) ~ std_normal();
    // onion
    R2 ~ beta(shape1, shape2);
    vs_y ~ cauchy(0, 2.5);
    l ~ std_normal();

    for (t in 2:T){
        Y[:, t] ~ multi_normal(
            loadings*factors[:,t] + Psi*Y[:, t-1],

Here is the code I have written with bayesdfm to simulate some data and test the model. Playing around with the number of time series I notice that with N<25 all works fine (built-in methods and onion one), while with N>50 I start getting errors also with the onion method: the sampler sputters but gets a good draw before hitting the limit. After N>75, the sampler doesn’t even start

# number of factors
K <- 5

# periods
TT <- 300

# observables
N <- 250

# declare loadings matrix
loadings <- matrix(
  nrow = N, 
  ncol = K,
  data = c(abs(rnorm(K-1, 0, 3)), 1), 
  byrow = T

# simulate some data
sim <- sim_dfa(
  num_trends = K,
  num_years = TT,
  num_ts = N,
  loadings_matrix = loadings,
  sigma = 3

data_v <- sim$y_sim %>% 
  t %>% %>% 
 # make all series positive and take growth rates
      .cols = everything(),
      .fns = ~ .x + abs(min(.)) + 10
      .cols = everything(),
      .fns = ~ 100*(.x/dplyr::lag(.x) - 1)
  ) %>% 
 # scale series to be standard normal
  apply(2, scale) %>% 
  na.omit(.) %>% 
 # select some or all series
  # .[, 1:ncol(.)]
  .[, 1:10]

data_h <- data_v %>% 
list_h <- list(
  T = ncol(data_h),
  N = nrow(data_h),
  Y = data_h,
  K = 3

test <- stan_model("./bsln_onion.stan")

cmg_propcov <- rstan::sampling(
  object = test,
  data = list_h,
  iter = 100,
  chains = 1,
  cores = 1,
  verbose = T,
  show_messages = T

A side note: I am a big fan of Stan and its community, it would be great if more time series material was made available, increasing even further the user base and Stan’s applications!

1 Like

Maybe I miss something obvious, but could you say why you are using the onion here? I ask because I would think that Sigma should be diagonal, possibly with all equal entries along the diagonal. The factors are already accounting for correlation across series.

In theory having the first lag of Y should also help capturing some covariance across series. Since I am not sure that whatever is left is an identity matrix I wanted to estimate that separately, hence the Sigma matrix. A bunch of series indeed are highly correlated and volatile too.

The onion method is what I found when looking for numerically stable covariance matrices when N is large. Obviously if there are better methods or other workarounds that is also super useful!

I think one issue is that a Sigma with free off-diagonal elements is like adding extra factors, which potentially cause identification problems for the K factors that you originally specified. Plus, as you say, when N is large this covariance matrix becomes unwieldy and difficult to keep positive definite.

I would think to specify Sigma as a diagonal matrix, then the" covariance not captured" would be a posterior check of the model. Say, you could compute the \epsilon_t in generated quantities and then look at the corresponding correlation matrix. If it is not close to diagonal, then you might consider adding more factors or more lags.

Two things I would recommend generally speaking:

  1. Start with a simpler model and then make it more complicated (i.e. the model where Psi and Phi are matrices of zeros, then relax one or the other and see what makes the problems start)
  2. Use the fake data simulation technique instead of real data for testing

This is precisely the pipeline I followed, starting with some well defined DGP from simulated data and building up from there. I hit a wall precisely when the number of time series grows large, even for synthetic data.

Thanks, this could seem a workaround, although I hope it makes econometric sense. I will test on the simulated data and come back here with updates.

I also wanted to see if the onion method implementation is actually correct or it needs some correction in general, btw.

Just a quick update on the issue with this DFM’s large covariance matrix for the observables. Although the implementation of the onion method looks correct (at least to me) and works decently with simulated data with known DGP, when I increase the number of series above ~60 the sampler rarely starts due to failing initialisation.

Is there a numerically reliable (and fast, if possible) way to sample for large covariance matrices?