High Divergence Rate in Estimating Kalman Filter

I’m trying to estimate parameters (function coefficients, as well as noise variances) of a dynamical system, and the expression for state vector and its covariance are given by the Kalman filter. As is, the program yields very high divergence rate ~93%, and the estimation results are not sensible.

I’m not exactly sure what the underlying cause is, though I have a few guesses. If it’s numerical instability on the estimated state covariance matrix, I can try and use square-root filter, albeit it’s a lot more complicated and may easily introduce sneaky bugs. If the instability comes from matrix operations in computing K, then I can try and implement it manually using more efficient methods.

HLW_lewis.stan (3.2 KB)

1 Like

the model is a bit too complex for me to really understand but there some hints for debugging models at https://discourse.mc-stan.org/t/divergent-transitions-a-primer/17099 In particular, I think treating a subset of the parameters as known data could help you pin down where the problems arise (a bug in the code is always a possibility).

Also one thing that I’ve had problems with when building a simple 1D Kalman filter is that if you have just one time series, the data often don’t let you really distinguish between process variance and noise variance, so reparametrizing in terms of total variance as discussed e.g. at Suggestions for identifying summed random effects - #2 by martinmodrak might be helpful. But I would look into that only if it turns out that you can fit the model assuming one of the variances is known while you have problems when both are unknown.

Also the ctsem R package implements Kalman filtering in Stan, so it may serve as an inspiration on how to implement this stuff well.

Best of luck with your model

1 Like

Thanks a lot, martinmodrak. Your comments are very helpful. My model has two noise terms in the measurement equations and three noise terms in the state equations, so a total of five sources of variability. I’ll try reparametriring all the noises in terms of total variance, as you mentioned, and report back.

By the way, I wanted to mention that I was able to reduce the divergence rate down to 5% by removing the constraint on sum of two estimated parameters. It affects the estimated function coefficients but not so much on the estimated state vectors, so I don’t think the high divergence rate is the biggest issue here. With a 95% divergence rate, diagnostic values (r_hat) still indicate the chains have mixed. The bigger issue was that the estimated state vector exhibits too many variations, suggesting numerical issues with computing K the Kalman correction term and P, state covariance. The estimated log density is around -545, which is lower than -535, the value from estimating using sequential maximum likelihood, as originally done in HLW (attached below).
HLW-2017.pdf (2.3 MB)
This talks about the original sequential maximum likelihood approach in more detail.
Econometric issues with Laubach and Williams’ estimates of the natural rate of interest?.pdf (2.9 MB)