Guassian process: modelling differently correlated time series

Sean_Matthews · February 13, 2018, 10:04pm

I am currently looking at a set of time series that are clearly more or less (but not equally) correlated.
Call them A, B, C, D, [up to 12 of these, each of 50-60 points, but I can make a principled retreat to 6]

I am trying to build a GP to model this set of time series.

I can define the basic input vector as (T, V) where V is an indicator vector for the time series, but this does not work. Seems obvious that this is because the various time series are differently correlated - A[t+1] is obviously not the same distance away from A[t] as is B[t], which is at a different distance than C[t), but, alas, cov_exp_quad assumes an equally weighted metric.

My idea (the obvious one) is to put a covariance matrix Q into the model, and replace the indicator vectors with the projections from the cholesky decomposition of Q. Then cov_exp_quad should work as before, but with a weighted distance function, and I can fit the covariance along with everything else.

Now I assume that I’m not the first person to encounter this problem, so here are my questions:

Is this the best way to do this? If not, what is the best way? (Is there another way? Am I missing something obvious?)
What is the best way to implement this - I can see at least two, and this is a big enough model that the computations cost is a material issue, so I thought I would mine the experience of people who have more of it than I do.

Thanks for any thoughts,

Sean Matthews

bbbales2 · February 14, 2018, 5:32am

Is it that you have four processes that you’ve measured and you think they are all perturbations away from one big underlying GP? @rtrangucci has a big hierarchical election GP model here: GitHub - stan-dev/stancon_talks: Materials from Stan conferences (Hierarchical Gaussian Processes in Stan). I could be wrong but I vaguely recall it did something like that.

Could you describe a little more what V looks like?

Sean_Matthews · February 14, 2018, 7:55am

V is just an indicator vector the length of the number of time series, with the indicator for the particular time series set to 1, and everything else set to 0

thus, e.g.,. we define X = <t,0,0,0,1,0,0> for the location of the value for the fourth time series from a set of six at time t.

stevebronder · February 14, 2018, 5:02pm

Sean it sounds like your data is long, but I think you want the data to be wide?

date	aapl_price	pip_price
12/11/10	23	24
12/12/10	81	82
12/13/10	12	18

And then you can use a multi-output gaussian process (something like this)

arya · February 15, 2018, 3:50pm

I just saw multiple time-series and GPs so my first thought was multi-output GPs. You may want to look in to that. Alvarez has a good review on the topic.

Sean_Matthews · February 15, 2018, 9:21pm

Yes, that does seem to be a useful approach. Shall investigate much more closely.

Thanks for the suggestions

Sean

Charles_Driver · February 17, 2018, 5:09pm

In case you want to try with a stochastic differential equation approach, ctsem in R will make a stan model for it. Would be pretty close to the default model, so can post the code if you want.

Sean_Matthews · February 18, 2018, 6:25am

I hadn”t thought of that. So yes please - I would be very interested

[from my phone]

Charles_Driver · February 19, 2018, 10:26am

Ok. Just a simple first order model structure:

install.packages("devtools")
library(devtools)
install_github("cdriveraus/ctsem") #cran version is due for an update...
library(ctsem)

n.latent <- 6 #number of processes
n.manifest<- n.latent #number of process indicators
varnames <- c(paste0('Y',1:n.manifest)) #n.manifest column names from data
dat <- longdatafile
time <- 1:nrow(dat) #unless time is already in data file
id <- 1 #modelling 1 subject

model <- ctModel(n.manifest=n.manifest,n.latent=n.latent,type='stanct',
  LAMBDA=diag(n.latent), #fixed latent to manifest loading matrix
  manifestNames = varnames, latentNames=varnames)

dat[,varnames] <- scale(dat[,varnames]) #otherwise priors may need to be adjusted...

model$pars[model$pars$matrix=='DRIFT' & model$pars$row != model$pars$col,'value'] <- 0 #not estimating directional effects between processes
model$pars$indvarying <- FALSE #because only 1 subject

fit <- ctStanFit(datalong = dat, ctstanmodel = model,iter = 500, chains=3,cores=3)

summary(fit)
cat(fit$stanmodeltext)
plot(fit)

Sean_Matthews · February 19, 2018, 11:04pm

thanks for that.

Cheers

Topic		Replies	Views
Modelling dependent time series with Gaussian Processes Modeling	10	2200	May 23, 2018
Using GP for autocorrelation in a time-series model: memory pressure, GP kernel definition Modeling techniques , performance , gaussian-process	51	4522	August 23, 2018
Fitting a mixture of Gaussian processes Modeling	3	1066	November 27, 2017
Gaussian process model for several functional observations with common mean Modeling specification	3	565	June 27, 2019
Modeling time dynamics with GP Modeling specification , gaussian-process	12	701	April 29, 2020

Guassian process: modelling differently correlated time series

Related topics