Advice me on GP prediction

Hi Fellows,

I used latent variable GP to implement a regression problem in my data which is the acceleration of a object that lowering to the sea water. Near the sea level there is high variation in the acceleration. The model seems that cannot predict well the uncertainty that reach to this level (i.e. near sea level).

The observation of the data are set based on simple normal distribution, while its mean and deviation have Gaussian process prior on both of them. I tried to keep my priro for hyper-parameters of the kernel as alpha with large number and lengthsacle very small to help me detect these sharp variation well:

lengthscale ~ lognormal(-2.5, .2);
sigma ~ normal(5, 2);

I checked different prior options, these two one give me a better prediction on the data at least.

Here I uploaded the results for your valubel considerations.


As it is shown, the model cannot predict high values in the middle of the time and also it also cannot predict the negative accelerations. Couls you kindly help with your advice?

Having said that, I have four different data set, in which for one of them i got one chain diverged from others. However, the results are the same particulary I can say.

@avehtari @rtrangucci

It looks like the conditional distribution would be bimodal. Is the data just one time serie or several timerises overlapping? It looks like you would need a smaller lengthscale.

2 Likes

Dear Professor Aki, Thank you for your valuable time.
That is true, it is one time series. one coloumn with time and one column with acceleration of object.
I put a restrict for sampling the prior as

lengthscale ~ gamma(2,0.05);
sigma ~ normal(8, 2);

in order to have small lengthsacle and larg alpha. there was not much improvment in the result.

Can you replot the data with line connecting the observations in the time series? And zoom in to the middle part?

3 Likes

Dear Professor Aki,
here is the plot for the time series with also the zoom in part in the middle:


1 Like

What’s going on near 194-200s (and to a lesser extent near 155s and near 163s) where the data oscillate wildly between two smooth curves? Ideally, should the model fit through the middle of this band of oscillation?

In general, it looks to me like the generative process is really different in different parts of the timeseries. For example, from 170-180s there’s almost no noise. At 194-200s there are two non-noisy series that alternate. At 165-170 the data are possibly quite noisy, or else the true value is changing much more irregularly than elsewhere.

What is your inferential goal in modeling this timeseries?

1 Like