# Chains initialisation explode for large T in dynamic factor model

99_bdfm_tester.R (1.3 KB)
bdfm_tester.stan (1.1 KB)

Hello,
I have recently started using Stan for estimating Dynamic Factor Models. Following some of the (very few) examples found here and elsewhere, I coded up a simple DFM that I wanted to test on simulated data. I specified the factors in the `transformed parameters` bloc and sampled from their errors, as I understand is better than having them directly in the `model` bloc. I attach both `stan` code and `R` script to simulate some data and feed them to Stan.

The estimation of the model - if well identified - works generally well, although I noticed a strange behaviour when using multiple chains, as illustrated in the picture below (apologies for the low res).

In short for a total of T periods of time in the model, the initial point for some chains progressively explodes to unreasonable values as t grows towards T. I donâ€™t understand if it is a result of how I modelled the factors or an unintended side effect.

Typically, increasing the number of iterations (or setting `init=0`) solves the issue, especially with smaller models, although this becomes impractical and cumbersome with larger, more complicated models.

Given that there are reasonably constrained priors on `ar_facts` and `eps_s`, my guess sis that the multiplicative form of the factors is just stacking up given that itâ€™s a multiplicative time series. The reason I say that is that you can see the scale grow on your diagram. One thing you can do if you want to keep the time series of factors stationary is constrain `ar_facts` more strongly than with a `normal(0, 10)` prior. As is, if ar_facts = 5, then each step in the time series is going to scale everything by 5, which grows very quickly.

The model is degenerate here in that `Y[:, 1]` is not being modeled. You can just add a distribution for it up front if you want a fully generative model.

P.S. There are a few ways to dramatically speed up this code. You donâ€™t ever need to create a diagonal covariance matrix explicitlyâ€”you can just use the univariate normal to get the same effect, because if `sigma` and `y` are vectors, then these two forms are equivalent (up to a constant):

``````y ~ normal(mu, sigma)
==
y ~ multi_normal(mu, diag_matrix(sigma));
``````

Thus, you an replace:

1. You can replace
``````cov_matrix[K] cov_f = identity_matrix(K);
...
for (t in 1:T) {
eps_s[:, t] ~ multi_normal(mu_f, cov_f);
}
``````

with something that just gives each entry in the matrix a normal prior with location `mu_f` and scale 1.

``````to_vector(eps_s) ~ normal(mu_f, 1);
``````
1. You can replace
``````cov_matrix[N] cov_y = diag_matrix(sigma_y);
...
for (t in 2:T) {
}
``````

with

``````matrix[N, T] loadings_factors = loadings * factors;
for (n in 1:N) {
}``````

Thanks @Bob_Carpenter, it was more or less the same intuition I had, though the issue seem to persist even with tighter priors on the (V)AR part: the divergence of factors is way less pronounced but still there. Letting the iterations grow brings the distribution back to more reasonable values though.

Also, I added the coding tweaks you proposed and indeed the code runs in a fraction of the time, thanks a million for saving me a ton of time!

I played around with the code and tried a few things. One thing that seems to help to me is to constrain the factors to be stationary or at most random walk, by imposing them to be between -1 and 1. That should help rule out the more explosive behaviour, and I feel itâ€™s not an unreasonable restriction in this type of setting.

``````matrix<lower=-1,upper=1>[K, K]           ar_facts;
``````

This already imposes a stronger regularization than the normal(0, 10) prior you already have, but you may want to tweak that prior further.

With this approach, I generally get good behaviour, although indeed a couple of misbehaved chains seem to have the top loads close to 0 and get stuck there. Then the remaining variables get the wrong side in their loadings, and thus the estimated factors also have the wrong sign, but â€ścorrectâ€ť with a sign switch. I tried to ensure I was running the model with strong loadings on the first variable to ensure identifiability, as otherwise itâ€™s easier for that loading to be estimated as small positive 0 and then the remaining variables to end up in different chains on different sides of positive and negative. And I also played with simulating data with ar_facts with a larger number (0.7), as I wondered if -0.1 is too weak for any discernible factor structure to be estimated. Still, this did not seem to be enough to get good results every time I ran with your example (N = 5, T = 300).

First, it is often necessary to impose a specific set of constraints on the factor loadings to avoid identifiability issues. Of course these constraints can mean that different chains have different solutions, but there is a convenient trick one can use in `generated quantities` to sign correct the loadings / factors (see a detailed discussion of both issues in this Discourse post).