Hi everybody,
I am kinda new to Stan modelling, but I have some experience with state-space (linear dynamic) models already. My intention is to build and fit model using Stan in order to infer parameters of hierarchial state-space model (time varying dynamics/time varying process noice etc.).
But to start, I tried simple model that could be solvable using Kalman Filter (but I do not want to use KF because it does not have the flexibility to infer parameters of more complex model).
So the mathematical formulation of the model goes like this
\mathbf{x}_{t} \sim \mathcal{N} \left( \mathbf{A} * \mathbf{x}_{t-1}, \Sigma_{x} \right)
\mathbf{y}_{t} \sim \mathcal{N} \left( \mathbf{x}_{t}, \Sigma_{y} \right)
which is pretty straightforward and KF would estimate the latent states \mathbf{x}_{t} from measurements \mathbf{y}_{t} easily as long as the process noise covariance matrix \Sigma_{x} and measurement covariance matrix \Sigma_{y} are proper covariance matrices. My idea of formulation of this model in Stan is this (model is saved in the file “simpleKalman.stan”)
data {
real dt;
int N;
vector[2] y[N];
matrix[2,2] V;
matrix[2,2] A;
matrix[2,2] Q;
}
parameters {
vector[2] x[N];
}
model {
x[1] ~ multi_normal(y[1], Q + V);
for( t in 2:N )
{
x[t] ~ multi_normal(A*x[t-1], Q);
}
for( t in 1:N )
{
y[t] ~ multi_normal(x[t], V);
}
}
where N is the amount of data, dt is input prepared for more sophisticated models, A is the state transformation matrix, Q is the process noise matrix (in this formulation, i want Stan to only estimate the latent states) and V is the measurement covariance matrix. The following R code describes the way the data are generated (including the state transition matrix definition) and also the parameters that are provided to Stan in order to fit the model.
nT <- 50;
dt <- 0.9;
A <- matrix(c(1, dt, 0, 1), nrow = 2, ncol = 2, byrow = TRUE);
origX <- matrix(data = 0, nrow = 2, ncol = nT);
X <- matrix(data = 0, nrow = 2, ncol = nT);
x0 <- matrix(c(2000, 100), nrow = 2, ncol = 1);
X[, 1] <- x0;
origX[, 1] <- x0;
for( iT in 2:nT )
{
origX[, iT] <- A %*% origX[, iT - 1];
}
Q <- matrix(c(20^2, 0, 0, 0.01^2), nrow = 2, ncol = 2, byrow = TRUE);
sqrtQ <- t(chol(Q));
for(iT in 1:nT )
{
X[, iT] <- origX[, iT] + sqrtQ %*% matrix(rnorm(2), nrow = 2, ncol = 1);
}
Y <- matrix(0, nrow = 2, ncol = nT)
V <- matrix(c(100^2, 3e-5, 3e-5, 4^2), nrow = 2, ncol = 2);
sqrtV <- t(chol(V));
for( iT in 1:nT )
{
Y[, iT] <- X[, iT] + sqrtV %*% matrix(rnorm(2), nrow = 2, ncol = 1);
}
fit <- stan("simpleKalman.stan",
data=list(dt=dt, N=nT, y=t(Y), V=V, A=A, Q=Q),
iter=10000, chains=4, control = list(adapt_delta = 0.8));
Edit:
This is embarrasing, but I have noticed, that the process noise covariance matrix is stated as model data, however it was not provided to the model in the data list (already corrected in the provided model). The interesting part is that this is even possible (without even warning about the fact that something is stated as a data in the stan file, but is not present in the input).
After correction of this horrible mistake I have tried this model again. The warning about tree depth was still present. But I have read some stan articles since I have posted my question and looked at the n_eff parameter of estimated quantities (the latent states x). I have noticed that for the second part of the state vector this value was always approximately ten times smaller. Therefore I have tried the same model, but this time with
Q <- matrix(c(20^2, 0, 0, 1^2), nrow = 2, ncol = 2, byrow = TRUE);
and then there is no warning and the n_eff improved even for the first part of the state vectors (with the original covariance matrix x[ , 1] had n_eff approx. 1e4 and x[, 2] had approx 1e3 but with the new, all of the elements have n_eff approx. 2e4). The estimated states make sense to me, because if I try to estimate Q using them, I get approximately the right values.
Since I consider this part to be solved and I am able to infer the latent states, I have decided to expand the model. In this toy example, my intention is to estimate the process noise covariance matrix given the data and the measurement covariance matrix. I.e. I want to move the matrix Q from the data part of the stan file to the parameters part. Since I have done some reading about Bayesian modelling, I have decided to put some prior on this parameter. I know there is a lot of discussion about THE proper priors for covariance matrices but honestly I do not thing that the Wishart prior is the issue in this case.
Here is the model with process noise covariance matrix as parameter (the model is again saved in simpleKalman.stan model file)
data {
real dt;
int N;
vector[2] y[N];
cov_matrix[2] V;
matrix[2,2] A;
cov_matrix[2] Q0;
}
parameters {
vector[2] x[N];
cov_matrix[2] Q;
}
model {
Q ~ wishart(2, Q0);
x[1] ~ multi_normal(y[1], Q + V);
for( t in 2:N )
{
x[t] ~ multi_normal(A*x[t-1], Q);
}
for( t in 1:N )
{
y[t] ~ multi_normal(x[t], V);
}
}
and here is the R script which generates data (I use generative model approach, as is suggested in multiple resources) and provides them to stan
nT <- 100;
dt <- 0.9;
A <- matrix(c(1, dt, 0, 1), nrow = 2, ncol = 2, byrow = TRUE);
origX <- matrix(data = 0, nrow = 2, ncol = nT);
X <- matrix(data = 0, nrow = 2, ncol = nT);
x0 <- matrix(c(2000, 100), nrow = 2, ncol = 1);
X[, 1] <- x0;
origX[, 1] <- x0;
for( iT in 2:nT )
{
origX[, iT] <- A %*% origX[, iT - 1];
}
Q <- matrix(c(20^2, 0, 0, 1^2), nrow = 2, ncol = 2, byrow = TRUE);
sqrtQ <- t(chol(Q));
for(iT in 1:nT )
{
# X[, iT] <- X[, iT] + trnQ*rnorm(1, 0, sigmaa);
X[, iT] <- origX[, iT] + sqrtQ %*% matrix(rnorm(2), nrow = 2, ncol = 1);
}
Y <- matrix(0, nrow = 2, ncol = nT)
V <- matrix(c(100^2, 3e-5, 3e-5, 4^2), nrow = 2, ncol = 2);
sqrtV <- t(chol(V));
for( iT in 1:nT )
{
Y[, iT] <- X[, iT] + sqrtV %*% matrix(rnorm(2), nrow = 2, ncol = 1);
}
centerQ <- Q/5;
fit <- stan("simpleKalman.stan",
data=list(dt=dt, N=nT, y=t(Y), V=V, A=A, Q0=centerQ),
iter=10000, chains=4, control = list(adapt_delta = 0.8));
Inference using this model returns multiple warnings, so I suppose there is something fundamentally wrong with the model
Warning messages:
1: There were 127 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
2: There were 5211 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
3: There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-low
4: Examine the pairs() plot to diagnose sampling problems
5: The largest R-hat is 1.43, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat
6: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess
7: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess
but I have no clue about what is it. Please, if somebody could suggest improvement of my code, I would be grateful.