I feel like the answer to my question is very simple, but I just cannot get it. I am modeling a gaussian process on a time series x with y being the outcome variable. I have some missing data in y. I can fit y using
y ~ multi_normal_cholesky(mu, L_K);
as appears in Stan guide, but how do I skip over the missing time points? Should I just pass time vector that is not continuous? For example x = [1 2 4 5] with a missing time point 3? I think this might not be right because of how the kernel is being calculated with
Thanks! This makes total sense and I know it’s possible to use GP to predict unobserved future data. However, my data has “holes” in it - for example, I collect continuous data (y) on one subject every day (x) for a month. But in some days during the month I don’t have data because the subject forgot to fill the questionnaire. So is it possible to use GP to interpolate the missing days? From my understanding (maybe it’s wrong!) it should be possible, I just don’t know how to phrase it in Stan in the fitting line
So just to make sure I understand - if I have 4 time indexes [1 2 3 4] and time 3 is missing, I just fit the GP for data and time indexed [1 2 4] and then predict in the generated quantities section the missing time index?
Yes. You now think in “discrete” timestep, but what I usually think of doing, is a continuous space (e.g. spatial or time) and I have observations on some locations (no grid, just some location) and then you fit GP over those values. Then the predictions (in this case interpolation) is done at some steps (could ve same as in fitting or not).
So what you will see is that the uncertainty will increase for locations with “missing” data.