 # Kalman filter runs very slow for over 400 time-series

I am trying to predict median housing price at the block group level using a bunch of predictors (e.g., median household income, distance to the CBD, etc). I have 413 block groups, and each block group has an annual median housing price from 1993 to 2010. I try to fit a dynamic linear regression between the housing price and the predictors using kalman filter. The code is shown as below. When running for just several block groups, the speed is okay. But when using all the 413 block groups, it is extremely slow. I am trying to reparamterize this model, but so far haven’t found any effective solution.

``````data {
// number of observations; In our case, n = 18 years
int n;
// number of variable == 413; In our case, p = 413 as we have 413 block groups
int p;
// number of states; In our case, we have 5 predictors for the housing price, thus m = 5
int m;
// size of system error; In our case, the size of system error should equal to 5
int r;
// observed data; In our case, the observed data y is our time-series housing price data
vector[p] y[n];
// observation eq; In our case, the observation eq is our predictors
real ZZ[m, p, n];
// system eq; T is the evolution matrix.
//matrix[m, m] T; In our case, T is unknown, and will be estimated as parameter
matrix[m, r] R;
// initial values; In our case, the a_1 and P_a is unknow parameter
//vector[m] a_1;
//cov_matrix[m] P_1;
// measurement error
// vector<lower=0.0>[p] h;
// vector<lower=0.0>[r] q;
int<lower = 0, upper = 1> run_estimation;
}
parameters {
vector<lower=0.0>[p] h;
vector<lower=0.0>[r] q;
vector[m] t;
vector[m] a_1;
corr_matrix[m] Omega;
vector<lower=0.0>[m] tau;
}
transformed parameters {
matrix[p, p] H;
matrix[r, r] Q;
matrix[m, m] T;
cov_matrix[m] P_1;

H = diag_matrix(h);
Q = diag_matrix(q);
T = diag_matrix(t);

}
model {

vector[m] a[n + 1];
matrix[m, m] P[n + 1];
real llik_obs[n];
real llik;

//Set priors
tau ~ normal(0, 10);
Omega ~ lkj_corr(2);

for (i in 1:p)
h[i] ~ normal(0, 10);
for (i in 1:r)
q[i] ~ normal(0, 10);
for (i in 1:m)
t[i] ~ normal(0, 10);
for (i in 1:m)
a_1[i] ~ normal(0, 10);

// 1st observation
a = a_1;
P = P_1;

for (i in 1:n) {
vector[p] v;
matrix[p, p] F;
matrix[p, p] Finv;
matrix[m, p] K;
matrix[m, m] L;
matrix[p, m] Z;
Z = to_matrix(ZZ[, , n])';
v = y[i] - Z * a[i];
F = Z * P[i] * Z' + H;
Finv = inverse(F);
K = T * P[i] * Z' * Finv;
L = T - K * Z;
// // manual update of multivariate normal
llik_obs[i] = -0.5 * (p * log(2 * pi()) + log(determinant(F)) + v' * Finv * v);
a[i + 1] = T * a[i] + K * v;
P[i + 1] = T * P[i] * L' + R * Q * R';

}
llik = sum(llik_obs);

if(run_estimation == 1){
target += llik;
}
}
``````

Cool model! I didn’t have data to test this so there can be bugs (especially in the indexing changes) but here’s a version with a couple small changes

1. Not making as many temps. If we don’t need something in the output / only use it once just do the calculation for the variable inline
2. Use `diag_pre_multiply` / `diag_post_multiply`
3. Change the data input format for nicer indexing
``````data {
// number of observations; In our case, n = 18 years
int n;
// number of variable == 413; In our case, p = 413 as we have 413 block groups
int p;
// number of states; In our case, we have 5 predictors for the housing price, thus m = 5
int m;
// size of system error; In our case, the size of system error should equal to 5
int r;
// observed data; In our case, the observed data y is our time-series housing price data
vector[p] y[n];
// observation eq; In our case, the observation eq is our predictors
matrix[p, m] ZZ[n];
// system eq; T is the evolution matrix.
//matrix[m, m] T; In our case, T is unknown, and will be estimated as parameter
matrix[m, r] R;
// initial values; In our case, the a_1 and P_a is unknow parameter
//vector[m] a_1;
//cov_matrix[m] P_1;
// measurement error
// vector<lower=0.0>[p] h;
// vector<lower=0.0>[r] q;
int<lower = 0, upper = 1> run_estimation;
}
transformed data {
real log_2p = log(2 * pi());
}
parameters {
vector<lower=0.0>[p] h;
vector<lower=0.0>[r] q;
vector[m] t;
vector[m] a_1;
corr_matrix[m] Omega;
vector<lower=0.0>[m] tau;
}
transformed parameters {
// Unless you explicitly want a matrix in the return / don't wanna get a vector back
// matrix[m, m] T = diag_matrix(t);
// Unless you want this in the return
}
model {

vector[m] a[n + 1];
matrix[m, m] P[n + 1];
real llik_obs[n];
real llik;

//Set priors
tau ~ normal(0, 10);
Omega ~ lkj_corr(2);

h ~ normal(0, 10);
q ~ normal(0, 10);
t ~ normal(0, 10);
a_1 ~ normal(0, 10);

// 1st observation
a = a_1;

for (i in 1:n) {
vector[p] v = y[i] - ZZ[n] * a[i];
matrix[p, p] F = ZZ[n] * P[i] * diag_post_multiply(ZZ[n]', h);
matrix[p, p] Finv = inverse(F);
matrix[m, p] K = diag_pre_multiply(t, P[i]) * ZZ[n]' * Finv;
// // manual update of multivariate normal
llik_obs[i] = -0.5 * (p * log_2p + log(determinant(F)) + v' * Finv * v);
// I think this is the same?
a[i + 1] = t .* a[i] + K * v;
//a[i + 1] = T * a[i] + K * v;
P[i + 1] = diag_pre_multiply(t, P[i]) * (diag_matrix(t) - K * ZZ[n])' + diag_post_multiply(R, q) * R';
}
llik = sum(llik_obs);

if(run_estimation == 1){
target += llik;
}
}
``````

wrt modeling, do you have fake data to try to put through this? `normal(0, 10)` seems kind of wide, is there a semi information prior you can use? It might be helpful to check this with SBC that given some fake data you generate you are able to get back the parmeters

5 Likes

Hello, thank you so much for your help. I tried to run your revised stan code, but it kept producing the error " Log probability evaluates to log(0), i.e., negative infinity", and the initial value is rejected during the sampling. I searched on the Internet and I think it should be something wrong with my priors. I am quite new to Stan, and so far I haven’t found any possible mistakes about this error. Attached is the data I used, the R code, and the Stan code you revised. I really appreciate your help!
data.RData (168.7 KB)
Rcode.R (559 Bytes)
test.stan (2.4 KB)

I finally found the error was produced by an error in the equation for calculating F. Now the code is revised as:

``````data {
// number of observations; In our case, n = 18 years
int n;
// number of variable == 413; In our case, p = 413 as we have 413 block groups
int p;
// number of states; In our case, we have 5 predictors for the housing price, thus m = 5
int m;
// size of system error; In our case, the size of system error should equal to 5
int r;
// observed data; In our case, the observed data y is our time-series housing price data
vector[p] y[n];
// observation eq; In our case, the observation eq is our predictors
matrix[p, m] ZZ[n];
// system eq; T is the evolution matrix.
//matrix[m, m] T; In our case, T is unknown, and will be estimated as parameter
matrix[m, r] R;
// initial values; In our case, the a_1 and P_a is unknow parameter
//vector[m] a_1;
//cov_matrix[m] P_1;
// measurement error
// vector<lower=0.0>[p] h;
// vector<lower=0.0>[r] q;
int<lower = 0, upper = 1> run_estimation;
}
transformed data {
real log_2p = log(2 * pi());
}
parameters {
vector<lower=0.0>[p] h;
vector<lower=0.0>[r] q;
vector[m] t;
vector[m] a_1;
corr_matrix[m] Omega;
vector<lower=0.0>[m] tau;
}
transformed parameters {
// Unless you explicitly want a matrix in the return / don't wanna get a vector back
// matrix[m, m] T = diag_matrix(t);
// Unless you want this in the return
}
model {

vector[m] a[n + 1];
matrix[m, m] P[n + 1];
real llik_obs[n];
real llik;

//Set priors
tau ~ normal(0, 10);
Omega ~ lkj_corr(2);

h ~ normal(0, 10);
q ~ normal(0, 10);
t ~ normal(0, 10);
a_1 ~ normal(0, 10);
// 1st observation
a = a_1;

for (i in 1:n) {
vector[p] v = y[i] - ZZ[i] * a[i];
matrix[p, p] F = ZZ[i] * P[i] * ZZ[i]'+ diag_matrix(h);
matrix[p, p] Finv = inverse(F);
matrix[m, p] K = diag_pre_multiply(t, P[i]) * ZZ[i]' * Finv;
// // manual update of multivariate normal
llik_obs[i] = -0.5 * (p * log_2p + log(determinant(F)) + v' * Finv * v);
// I think this is the same?
a[i + 1] = t .* a[i] + K * v;
//a[i + 1] = T * a[i] + K * v;
P[i + 1] = diag_pre_multiply(t, P[i]) * (diag_matrix(t) - K * ZZ[i])' + diag_post_multiply(R, q) * R';
}
llik = sum(llik_obs);

if(run_estimation == 1){
target += llik;
}
}
``````

I tried to run the model using the revised code above, but it seems it is still quite slow. I am wondering what else I can do to speed up? Thanks.

In your loop you use n to index ZZ; are you sure that shouldn’t be i as the index?

1 Like

I think using n as the index should be correct. As n equal to 18 (representing 18 years of data), I am just iterating through each year and updating the states from year to year.

But you’re only ever using the nth element, the 18th element, of the array ZZ. Are elements 1-17 not relevant?

When updating the state variables for each iteration, only the current state variable value determine the update, but not the previous values.

And `i` doesn’t encode the current state index for that loop?

Unless I’m missing something, you’re passing in an array of length `n` as `ZZ` (where each element is a matrix), but only the final/nth element is ever used in the model, which makes me wonder why you are bothering passing in all the other elements.

Oh, thanks. I finally realized it! The index for ZZ should be i. At each time step, the matrix ZZ[i] is passed in for the calculation. I have corrected this mistake in the code I posted above.

2 Likes

After using the revised Stan code I posted above, the running speed is still quite slow. I tried to just run for 5 block groups instead of using all 413 block groups to test the model speed. Then, the running speed is mediocre. I ran 4 chains concurrently, and the sampling speed varies quite a lot in these 4 chains. Does this signal me I need to further reparameterize the model? Does anyone has more thoughts about the reparameterization? I appreciate your opinion and help!

Put some profile blocks in so you can see what particular sections of code are maybe causing more bottleneck than others.

I tried to put profile blocks in the stan code, but then it can not be compiled in rstan. I am wondering whether rstan does not accept profile blocks?

Would you be able to use a factor structure so that there is a smaller number of Kalman filters running?

1 Like

Ah, very possibly; can you use cmdstanr instead?

Yeah rstan is on Stan 2.21 while cmdstanr is on Stan 2.28. Profiling was introduced in 2.25 so until rstan is updated cmdstanr is the only place available for profiling

Also just saw this, I will try to run this on the weekend

Some chains running slowly while others run fast is sometimes an indication that there are difficult parts of the parameter region that some chains found while other chains missed. For that I usually go back to a simpler model that mixed well and slowly add the more complicated components until I hit the poor mixing again. Then once I know which piece it is I can dive into the diagnostics and see how those parameters interact with everything else.