Guassian process: modelling differently correlated time series

I am currently looking at a set of time series that are clearly more or less (but not equally) correlated.
Call them A, B, C, D, [up to 12 of these, each of 50-60 points, but I can make a principled retreat to 6]

I am trying to build a GP to model this set of time series.

I can define the basic input vector as (T, V) where V is an indicator vector for the time series, but this does not work. Seems obvious that this is because the various time series are differently correlated - A[t+1] is obviously not the same distance away from A[t] as is B[t], which is at a different distance than C[t), but, alas, cov_exp_quad assumes an equally weighted metric.

My idea (the obvious one) is to put a covariance matrix Q into the model, and replace the indicator vectors with the projections from the cholesky decomposition of Q. Then cov_exp_quad should work as before, but with a weighted distance function, and I can fit the covariance along with everything else.

Now I assume that I’m not the first person to encounter this problem, so here are my questions:

  1. Is this the best way to do this? If not, what is the best way? (Is there another way? Am I missing something obvious?)

  2. What is the best way to implement this - I can see at least two, and this is a big enough model that the computations cost is a material issue, so I thought I would mine the experience of people who have more of it than I do.

Thanks for any thoughts,

Sean Matthews

1 Like

Is it that you have four processes that you’ve measured and you think they are all perturbations away from one big underlying GP? @rtrangucci has a big hierarchical election GP model here: GitHub - stan-dev/stancon_talks: Materials from Stan conferences (Hierarchical Gaussian Processes in Stan). I could be wrong but I vaguely recall it did something like that.

Could you describe a little more what V looks like?

V is just an indicator vector the length of the number of time series, with the indicator for the particular time series set to 1, and everything else set to 0

thus, e.g.,. we define X = <t,0,0,0,1,0,0> for the location of the value for the fourth time series from a set of six at time t.

Sean it sounds like your data is long, but I think you want the data to be wide?

date aapl_price pip_price
12/11/10 23 24
12/12/10 81 82
12/13/10 12 18

And then you can use a multi-output gaussian process (something like this)

1 Like

I just saw multiple time-series and GPs so my first thought was multi-output GPs. You may want to look in to that. Alvarez has a good review on the topic.

Yes, that does seem to be a useful approach. Shall investigate much more closely.

Thanks for the suggestions


In case you want to try with a stochastic differential equation approach, ctsem in R will make a stan model for it. Would be pretty close to the default model, so can post the code if you want.

I hadn”t thought of that. So yes please - I would be very interested

[from my phone]

Ok. Just a simple first order model structure:

install_github("cdriveraus/ctsem") #cran version is due for an update...

n.latent <- 6 #number of processes
n.manifest<- n.latent #number of process indicators
varnames <- c(paste0('Y',1:n.manifest)) #n.manifest column names from data
dat <- longdatafile
time <- 1:nrow(dat) #unless time is already in data file
id <- 1 #modelling 1 subject

model <- ctModel(n.manifest=n.manifest,n.latent=n.latent,type='stanct',
  LAMBDA=diag(n.latent), #fixed latent to manifest loading matrix
  manifestNames = varnames, latentNames=varnames)

dat[,varnames] <- scale(dat[,varnames]) #otherwise priors may need to be adjusted...

model$pars[model$pars$matrix=='DRIFT' & model$pars$row != model$pars$col,'value'] <- 0 #not estimating directional effects between processes
model$pars$indvarying <- FALSE #because only 1 subject

fit <- ctStanFit(datalong = dat, ctstanmodel = model,iter = 500, chains=3,cores=3)

1 Like

thanks for that.