Extended Kalman filter, problem estimating process and observation error size

Hi all,
I am new to Stan and have trouble estimating process and observation error standard deviations using an extended Kalman filter (EKF). Following Durbin & Koopman (2012), one can use Maximum likelihood estimation to estimate parameters in the process and observation equations. In a frequentist version, this works reasonably well. Now, I wanted to implement a Bayesian version. In this simplest example possible, I only try to estimate the two standard deviations, but the model notoriously overestimates both substantially (should be 0.05 both, but estimates are far above). Here, I use a normal prior, but I also tried a lognormal prior with similar issues. The current priors are pretty uninformative, but it also happens with really informative priors.

I would like some advice as to why this problem persists, or whether I made a coding error that causes this?

The (simulated) data stems from the R code here below, I use rstan to call the Stan code named “BESAstan_onlysig.stan”, which is attached here below.

All the best and thank you very much in advance,
Kira

library(rstan)

# To run in parallel on multiple cores
options(mc.cores=parallel::detectCores())
# To avoid recompiling unchanged Stan program
rstan_options(auto_write=TRUE)

## parameters, then simulate data (pt and Ht time series, time series of length N)
a=0.5737  
b=0.87 
c=2 
d=0.7
r=0.3809 
K=500 
kappa=r/K
sigmaeps=0.05;
sigmaeta=0.05
sigmaphi=0.1
N=50;
psi=log(250)
xt<- vector(length=N)
xt[1]=psi
epsilont=0
etat=0
phit=0
pt=0
ht=0

for (j in 1:N) {
  epsilont[j]=rnorm(1,0,sigmaeps) 
  etat[j]=rnorm(1,0,sigmaeta) 
  phit[j]=rnorm(1,0,sigmaphi) 
  pt[j]=log(c)-d*xt[j]+epsilont[j]; 
  ht[j]=1/b*(log(a)-pt[j]+phit[j]); 
  xt[j+1]=log((1+r)*exp(xt[j])-kappa*exp(xt[j])^2-exp(ht[j]))+etat[j]; 
}
Ht=exp(ht) # needed in levels in the filter

# define prior distribution parameters: 
musigeps=1
sigsigeps=1
musigeta=1
sigsigeta=1


nsamples=6000
nchains=3


# feed in data and run model 
dataStan <- list(pt=pt,Ht=Ht, TT=N,mur=r,musigeps=musigeps,sigsigeps=sigsigeps,musigeta=musigeta,sigsigeta=sigsigeta)
f2pStan <- stan(file = "BESAstan_onlysig.stan", data = dataStan, chains = nchains, iter =nsamples,warmup=1000,thin=2)


# Traceplot (includes warm up )
traceplot(f2pStan, pars="sigeps", inc_warmup = TRUE)
traceplot(f2pStan, pars="sigeta", inc_warmup = TRUE)
# print some summary results: true means of sigmaeps and sigmaeta should be around 0.05!
print(f2pStan, max = 150)
/*----------------------- Data --------------------------*/
/* Data block: unpack the objects that will be inputted as data */
data {
int TT; // Length of state and observation time series
vector[TT] pt; // observations (left side of observation equation)
vector[TT] Ht; // observations (used in process equation)
real mur;  // parameter in process equation 
real musigeps;
real sigsigeps;
real musigeta;
real sigsigeta;
}
/*----------------------- Parameters --------------------------*/
parameters {
real<lower=0> sigeps; // Standard deviation of the observation equation
real<lower=0> sigeta; // Standard deviation of the process equation
}


/*----------------------- Model --------------------------*/
/* Model block: defines the model */
model {
// Initialization of variables
vector[TT+1] st; // state variable
vector[TT] vt; // residual time series
vector[TT] Ft; // variance of vt
vector[TT+1] Zt; // variance of st
vector[TT] stt; // intermed state var time series
vector[TT] Ztt; // intermed state var variance 
vector[TT] Jtt; // Jacobian
real llik_obs[TT];
real llik;

// Prior distributions: 
target+=normal_lpdf(sigeps|(musigeps),sigsigeps);
target+=normal_lpdf(sigeta|(musigeta),sigsigeta);

// initial: 
Zt[1]=0;    // initial variance
st[1]=log(0.5);  // initial state
target+= -0.5*TT*log(2*3.14159);  // log likelihood constant part

// Extended Kalman filter part loop 
for (t in 1:TT){
vt[t]=pt[t]-log(2)+0.7*log(500)+0.7*st[t];  // residual
Ft[t]= pow(0.7,2)*Zt[t]+pow(sigeps,2);   // variance of residual 
// Each step, add this part to loglikelihood (Durbin & Koopmann 2012, p. 171): 
llik_obs[t] = -0.5*(log(Ft[t]+pow(vt[t],2)/Ft[t]));

stt[t]=st[t]-(0.7*Zt[t]*vt[t])/Ft[t];   // corrected state estimate
Ztt[t]=pow(sigeps,2)*Zt[t]/Ft[t];    // variance of corrected state estimate
st[t+1]=log((1+mur)*exp(stt[t])-mur*exp(2*stt[t])-Ht[t]/(500));  // a priori state estimate for next period
Jtt[t]=((1+mur)*exp(stt[t])-2*mur*exp(2*stt[t]))/((1+mur)*exp(stt[t])-mur*exp(2*stt[t])-Ht[t]/(500));  // Jacobian
Zt[t+1]=pow(Jtt[t],2)*Ztt[t]+pow(sigeta,2);  // a priori estimate for state variance for next period
}
  llik = sum(llik_obs); // log likelihood is the sum over all the parts 
  target+=llik;  // add the log likelihood to the target
}

Can you post your R code?

I put all constants in the transformed data block and I removed some unnecessary calculations. Note that the log prob only needs to be specified up to a constant for HMC and I removed some constant increments. Another thing is to use the _lupdf or ~ syntax to drop normalizing constants from the target increment for densities. If you want the exact log prob for later usage you can put these back.

/*----------------------- Data --------------------------*/
/* Data block: unpack the objects that will be inputted as data */
data {
  int TT; // Length of state and observation time series
  vector[TT] pt; // observations (left side of observation equation)
  vector[TT] Ht; // observations (used in process equation)
  real mur; // parameter in process equation 
  real musigeps;
  real sigsigeps;
  real musigeta;
  real sigsigeta;
}
transformed data {
  real loghalf = log(0.5);
  real logtwopi = log(2 * pi());
  real logfiveh = log(500);
}
/*----------------------- Parameters --------------------------*/
parameters {
  real<lower=0> sigeps; // Standard deviation of the observation equation
  real<lower=0> sigeta; // Standard deviation of the process equation
}
/*----------------------- Model --------------------------*/
/* Model block: defines the model */
model {
  // Initialization of variables
  vector[TT + 1] st; // state variable
  vector[TT] vt; // residual time series
  vector[TT] Ft; // variance of vt
  vector[TT + 1] Zt; // variance of st
  vector[TT] stt; // intermed state var time series
  vector[TT] Ztt; // intermed state var variance 
  vector[TT] Jtt; // Jacobian
  
  // Prior distributions: 
  target += normal_lpdf(sigeps | musigeps, sigsigeps);
  target += normal_lpdf(sigeta | musigeta, sigsigeta);
  
  // initial: 
  Zt[1] = 0; // initial variance
  st[1] = loghalf; // initial state
  // Stan only needs things up to a constant 
  // i.e. we can leave this out
  // target += -0.5 * TT * logtwopi; // log likelihood constant part
  
  // Extended Kalman filter part loop 
  for (t in 1 : TT) {
    vt[t] = pt[t] - log2() + 0.7 * logfiveh + 0.7 * st[t]; // residual
    Ft[t] = pow(0.7, 2) * Zt[t] + pow(sigeps, 2); // variance of residual 
    // Each step, add this part to loglikelihood (Durbin & Koopmann 2012, p. 171): 
    target += -0.5 * log(Ft[t] + pow(vt[t], 2) / Ft[t]);
    
    stt[t] = st[t] - (0.7 * Zt[t] * vt[t]) / Ft[t]; // corrected state estimate
    Ztt[t] = pow(sigeps, 2) * Zt[t] / Ft[t]; // variance of corrected state estimate
    st[t + 1] = log((1 + mur) * exp(stt[t]) - mur * exp(2 * stt[t])
                    - Ht[t] / 500); // a priori state estimate for next period
    Jtt[t] = ((1 + mur) * exp(stt[t]) - 2 * mur * exp(2 * stt[t]))
             / ((1 + mur) * exp(stt[t]) - mur * exp(2 * stt[t]) - Ht[t] / 500); // Jacobian
    Zt[t + 1] = pow(Jtt[t], 2) * Ztt[t] + pow(sigeta, 2); // a priori estimate for state variance for next period
  }
}

I see you posted it after an edit. Thanks!

Hi Spinkney, thank you. I understand your changes in the code, but not quite what you meant by: “Another thing is to use the _lupdf or ~ syntax to drop normalizing constants from the target increment for densities.” I suppose this concerns the priors, and refers to this: 7.4 Sampling statements | Stan Reference Manual ?

Yes that’s right. The docs aren’t very clear about this and needs updating to include lupmf/lupdf. The only reference I currently see is at 7.3 Increment log density | Stan Reference Manual.

@rok_cesnovar @WardBrian. The user guide never was updated either from this discourse thread _lupdf functions - where/how to doc them? - #22 by bbbales2

1 Like

@spinkney would you mind summarizing in an issue on the docs repo? Easier to keep track of there

1 Like

Coming back to this. Your log-likelihood was wrong. You were taking the log of the entire thing when you just need to log F_t.

target += -0.5 * (log(Ft[t]) + pow(vt[t], 2) / Ft[t]);

Estimates are

3 chains, each with iter=6000; warmup=1000; thin=2; 
post-warmup draws per chain=2500, total post-warmup draws=7500.

         mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
sigeps   0.05    0.00 0.01   0.03   0.04   0.05   0.05   0.06  3503    1
sigeta   0.05    0.00 0.01   0.03   0.04   0.05   0.05   0.08  3424    1
lp__   107.46    0.02 1.18 104.25 107.04 107.82 108.29 108.57  3105    1

Samples were drawn using NUTS(diag_e) at Thu Jul 14 13:39:30 2022.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
2 Likes

Ah, that is right, thank you so much! Will try it again tomorrow.