Missing data in GP

nerpa · April 8, 2020, 11:11pm

Hi everyone,

I feel like the answer to my question is very simple, but I just cannot get it. I am modeling a gaussian process on a time series x with y being the outcome variable. I have some missing data in y. I can fit y using

y ~ multi_normal_cholesky(mu, L_K);

as appears in Stan guide, but how do I skip over the missing time points? Should I just pass time vector that is not continuous? For example x = [1 2 4 5] with a missing time point 3? I think this might not be right because of how the kernel is being calculated with

matrix[N, N] K = cov_exp_quad(x, alpha, rho).

Thanks in advance!

ahartikainen · April 8, 2020, 11:58pm

You have points at x_i with value y_i, then fit GP over those. Then you would want to predict y_j at x_j.

GP doesn’t assume gridded points.

Did I miss something?

nerpa · April 9, 2020, 9:11am

Thanks! This makes total sense and I know it’s possible to use GP to predict unobserved future data. However, my data has “holes” in it - for example, I collect continuous data (y) on one subject every day (x) for a month. But in some days during the month I don’t have data because the subject forgot to fill the questionnaire. So is it possible to use GP to interpolate the missing days? From my understanding (maybe it’s wrong!) it should be possible, I just don’t know how to phrase it in Stan in the fitting line

y ~ multi_normal_cholesky(mu, L_K);

Hope I explained myself better!

torkar · April 9, 2020, 9:15am

Hi Nerpa,

what I think Ari said is that you fit your model as usual, and then you look what the model predicts for the days that are “missing”.

nerpa · April 9, 2020, 9:20am

So just to make sure I understand - if I have 4 time indexes [1 2 3 4] and time 3 is missing, I just fit the GP for data and time indexed [1 2 4] and then predict in the generated quantities section the missing time index?

torkar · April 9, 2020, 9:25am

If I understood Ari correctly, yes :)

nerpa · April 9, 2020, 9:28am

Thanks! (Hope this second hand understanding is correct :)

ahartikainen · April 9, 2020, 10:20am

Yes. You now think in “discrete” timestep, but what I usually think of doing, is a continuous space (e.g. spatial or time) and I have observations on some locations (no grid, just some location) and then you fit GP over those values. Then the predictions (in this case interpolation) is done at some steps (could ve same as in fitting or not).

So what you will see is that the uncertainty will increase for locations with “missing” data.

nerpa · April 9, 2020, 10:22am

Great! Thank you so much.

Topic		Replies	Views
Modeling time dynamics with GP Modeling specification , gaussian-process	12	651	April 29, 2020
Fitting GP with noisy observations of time Modeling fitting-issues , bioinformatics	19	2484	August 2, 2017
Gaussian Process out-of-sample predictive distribution Modeling gaussian-process	21	2161	December 6, 2024
GP - integration out for multivariate Modeling	1	423	October 5, 2018
Gaussian Process on time series with a known trend break Modeling	5	646	August 7, 2020

Missing data in GP

Related topics