TL; DR: I’m modeling missing data points in a time series as a thing like a simplex, and it’s relatively fast but I think it has relatively low n_eff/sample, and I’d like to make it better.
I’m modeling a time series:
Y_t ~ normal(Y_t-1 + mu,sigma)
with missing observations:
Y[3] = nan
Y[4] = nan
I’m dealing with the missing observations by treating them as parameters to be estimated; if there’s an observation at time t=2
, Y_2
, and another observation at t=5
, Y_5
, then Y_3
and Y_4
are inferred as parameters. What do we know about those parameters? Since this is a time series, we know that the difference between the two observed time points (Y_5
and Y_2
) must be explained by the steps between Y_2
, Y_3
, Y_4
, and Y_5
.
Y_5 - Y_2 == (Y_5 - Y_4) + (Y_4 - Y_3) + (Y_3 - Y_2)
(This is particularly natural if we know that Y
is monotonically increasing, in which case the time series model is Y_t ~ normal(Y_t-1 + mu, sigma) T[Y_t-1,]
). The monotonic case isn’t germane to my question today, but it’s the use case I’m eventually building up to.)
The way that I actually model this in Stan is not to treat Y_4
and Y_3
as parameters (the levels), but to treat (Y_5 - Y_4)
, (Y_4 - Y_3)
, and (Y_3 - Y_2)
as parameters (the steps). We’re not modeling the levels; we’re modeling the steps. I can then constrain these steps using the equation above, and back out what the levels Y_3
and Y_4
should be. At the end of the post is my Stan model for this.
The problem is as follows: One of the parameters is unnecessary. If you know Y_5 - Y_2
, Y_5 - Y_4
, and Y_4 - Y_3
, then you know Y_3 - Y_2
. If we could lop off one of these parameters, surely things would be better. Is there a way to do this? Is there a better way to attack the problem?
Example output (with mu=3
, sigma=1
, and Y_3
and Y_4
missing)
Inference for Stan model: anon_model_4888889db6e52ba8f2aaf6c10bceb20d.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
mu 3.05 9.9e-4 0.05 2.96 3.02 3.05 3.09 3.14 2212 1.0
sigma 1.03 7.7e-4 0.03 0.97 1.01 1.03 1.06 1.1 1896 1.0
restricted_latent_parameters[0]-3.8e-3 0.46 0.65 -0.95 -0.62 -0.03 0.61 0.95 2 3.07
restricted_latent_parameters[1]-6.6e-3 0.46 0.65 -0.97 -0.63 0.01 0.62 0.96 2 3.1
restricted_latent_parameters[2] 5.6e-4 0.46 0.65 -0.95 -0.62-4.1e-3 0.62 0.95 2 3.1
Y_latent[0] 2.73 3.1e-164.4e-16 2.73 2.73 2.73 2.73 2.73 2 nan
Y_latent[1] 5.98 6.3e-168.9e-16 5.98 5.98 5.98 5.98 5.98 2 nan
Y_latent[2] 8.49 0.0 0.0 8.49 8.49 8.49 8.49 8.49 2 nan
Y_latent[3] 11.22 0.01 0.73 9.69 10.76 11.27 11.69 12.61 4000 1.0
Y_latent[4] 13.98 0.01 0.72 12.56 13.51 13.96 14.44 15.41 4000 1.0
Y_latent[5] 16.71 0.0 0.0 16.71 16.71 16.71 16.71 16.71 2 nan
Y_latent[6] 19.79 5.0e-157.1e-15 19.79 19.79 19.79 19.79 19.79 2 nan
...
If there are no missing observations then we have markedly higher n_eff
; sometimes all the way up to 1 n_eff/sample
.
And here’s the model:
data {
int T; // number of time steps
vector[T] Y; // data to model
int n_missing_observations;
}
parameters {
real mu;
real<lower=0> sigma;
vector<lower=-1, upper=1>[n_missing_steps] restricted_latent_parameters;
}
transformed parameters {
vector[T] Y_latent;
{
int restricted_param_counter;
int gap_width;
real previous_value;
int previous_value_index;
restricted_param_counter = 1;
Y_latent[1] = Y[1];
previous_value = Y[1];
previous_value_index = 1;
for (t in 2:T){
if (!is_nan(Y[t])){
Y_latent[t] = Y[t];
gap_width = t-previous_value_index;
if (gap_width>1){
// These are the unobserved STEPS between observed time steps.
// I.e. If Y_3 and Y_1 are observed, by Y_2 is not, these are (Y_3 - Y_2) and (Y_2-Y_1)
// We will say that these steps have to sum up to the observed difference between Y_3 and Y_1.
// The unobserved levels then have values that are the cumulative sum of these steps.
Y_latent[previous_value_index+1:t-1] =
cumulative_sum(
restricted_latent_parameters[restricted_param_counter:restricted_param_counter+gap_width-1]
/ sum(restricted_latent_parameters[restricted_param_counter:restricted_param_counter+gap_width-1])
* (Y[t] - previous_value) //Take the parameters and divide by their sum, then multiply by the total step that needs to be accounted for. You now have a set of steps that lead up to the next observed level.
)[1:gap_width-1] + previous_value;
// We don't include the last step in this sum, since we already explicitly grabbed the level
// we ultimately get to from the data itself.
restricted_param_counter = restricted_param_counter + gap_width-1;
}
previous_value = Y[t];
previous_value_index = t;
}
}
}
}
model {
mu ~ normal(0, 1);
sigma ~ cauchy(0, 4);
for (t in 2:T){
Y_latent[t] ~ normal(Y_latent[t-1]+mu, sigma);
}
}