I have a time series model which I’m testing with some synthetic data. (included)
example_data.csv (604.6 KB)
The model mostly appears to be doing what I expect, and reproduces the mean of known parameters.
However, surprisingly when I subsample my input data at higher frequencies the variance in my most important parameter ( R ) increases.
So i presume i’m not actually modeling what I think i’m modeling, because I can’t understand why this would be the case.
This model is designed to work with a non-stationary but cyclic timeseries ( C ). C is controlled by advective movement in two directions u and v, with gradients beta_u and beta_v, plus a near constant negative term R.
The model works on the assumption that over a daily timescale beta_u and beta_v don’t change much, and R changes even slower (hence the “filter”). The model should then be able to simultaneously estimate the gradients and R.
The second question is, the cumulative value of the R parameter is important to me, in order to correctly reproduce the synthetic data I’ve allowed R_raw to be negative, however I know that this can’t actually be true with real data. How would i go about improving the model to account for this, without biasing the estimate of R towards positive numbers.
i.e. the model should still correctly estimate R = 0.
Example R code:
require(data.table)
require(rstan)
Sys.setenv(tz="UTC")
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
x = fread("example_data.csv")
# example has:
# R = 50
# C ~N(C_true, 0.01)
# x = x[minute(dateTime) %in% c(0, 30)]
x = x[minute(dateTime) %in% c(0, 10, 20, 30, 40, 50)]
x[, dt := unique(diff(time))]
dx = list(
N = nrow(x),
day = x$day,
N_day = max(x$day),
filter = x$filter,
N_filter = max(x$filter),
C = x$C, # measured C
BML = x$BML,
u = x$u,
v = x$v,
dt = x$dt[1]
)
fit = stan("adv.stan", data=dx, chains=2, iter=2000)
e = rstan::extract(fit)
mean(e$sigma)
mean(e$R*(24*60*60))
quantile(e$R*(24*60*60), c(0.05, 0.95), names=F)
sd(e$R*(24*60*60))
mean(x$R*(24*60*60))
# with 30 min sampling sd(R) = 0.4
# 10 min sampling sd(R) = 0.7
Stan model adv.stan:
data {
int<lower=1> N; // number of samples
int<lower=1> N_day; // number of days
int<lower=1> N_filter; // number of filter
int<lower=1> day[N];
int<lower=1> filter[N];
real C[N]; // observations
vector[N] u; // east-west velocity
vector[N] v; // north-south velocity
real dt;
real BML[N]; // scale
}
transformed data {
real u_avg[N-1];
real v_avg[N-1];
for(i in 1:N-1){
u_avg[i] = (u[i] + u[i+1]) * 0.5;
v_avg[i] = (v[i] + v[i+1]) * 0.5;
}
}
parameters {
real<lower=0> sigma;
real beta_u[N_day];
real beta_v[N_day];
real R_raw[N_filter];
}
transformed parameters {
real C_hat[N-1];
for(i in 1:N-1){
C_hat[i] = C[i] + (( (u_avg[i] * dt * beta_u[day[i]] * 0.01) + (v_avg[i] * dt * beta_v[day[i]] * 0.01) - (R_raw[filter[i]] * dt * 0.001) ) / BML[i]);
}
}
model {
sigma ~ cauchy(0, 1);
beta_u ~ normal(0, 1);
beta_v ~ normal(0, 1);
R_raw ~ normal(0, 1);
// N=1 = start time = variables are attached to their start time
for(n in 2:N){
C[n] ~ normal(C_hat[n-1], sigma);
}
}
generated quantities{
real R[N_filter];
real cumulative_R[N];
real adv_u[N];
real adv_v[N];
for(d in 1:N_filter){
R[d] = (R_raw[d] * 0.001);
}
cumulative_R[1] = R[filter[1]] * dt;
adv_u[1] = (u[1] * dt * beta_v[day[1]] * 0.01);
adv_v[1] = (v[1] * dt * beta_v[day[1]] * 0.01);
for(n in 2:N){
cumulative_R[n] = cumulative_R[n-1] + (R[filter[n]] * dt);
adv_u[n] = (adv_u[n-1] + (u[n] * dt * beta_u[day[n]] * 0.01));
adv_v[n] = (adv_v[n-1] + (v[n] * dt * beta_v[day[n]] * 0.01));
}
}
Many thanks