Volterra kernels for time series

I’m curious if there’s any experience on using Volterra kernels for regularizing latent state time series models? Wikipedia suggests these kernels might be particularly difficult to estimate as the coefficients are hard to decorrelate, and looking for papers turns up mostly papers from a group I’m trying to “fact check” methodologically.

It seems in Bayesian camp there are GPs, SDE filtering and MVARs, all loosely related… I just want to make sure I’m not missing something obvious. Thanks for any comments related or otherwise.

1 Like

Absolutely no experience, but I’d be interesting in learning about it, or implementing something and writing it up. Do you have something more concrete? Like applied analysis or want to prove something? At this point I’ve only glanced at the Wikipedia page.

If you want to DM me one of the papers so we’re not publicly shaming them, I want to have a look.

It’s been a while since I’ve read this, but this paper has a section on relationship between GPs and Kalman filtering: https://pdfs.semanticscholar.org/bae4/756f7514d4f8202a383fe3827b27a89545c0.pdf

Thanks for the reference! I have been wondering how to generalize GPs to spatiotemporal patterns (as I’m doing brain imaging data modeling). I will have a more thorough look.

I maybe misspoke: I don’t mean anything bad by this, but I would rather try to achieve a “clean-room” implementation of some of these techniques, which the authors have implemented as part of a big pile of (nearly) incomprehensible MATLAB code. The generative modeling is otherwise innovative in field otherwise dominated by frequentist statistics. The Volterra kernel stuck out for me because most of the elements (complex hierarchical priors, extensive model comparison, variational inference) appear here and there, but I couldn’t turn up anything on estimation of Volterra kernels. The framework is described in this article: https://www.sciencedirect.com/science/article/abs/pii/S1053811903002027 but seems to be behind a paywall. I’ll try to turn up an article which isn’t paywalled…

If you want, I’m happy to look into developing this for you. I’ll do some reading and get back with you when I understand the Volterra kernel better. I’ve found a paper by Boyd, an electrical engineer, who’s reputable, but I don’t want to say anything about it until I’ve read it thoroughly and understand it.

Until then, let’s use this thread as a paper dump. Mind dropping me everything you have about the Volterra kernel? Papers, code, whatever.

as the one usually writing code for others, thanks! I will tackle things as time permits… Here’s an earlier paper where they approximate a 4D nonlinear system by Volterra kernels,

another about multivariate autoregressive models (referred to as Granger causality) vs Volterra kernels

Here’s the 1-s2.0-S1053811914005394-main.pdf. Finally another where the kernels are derived analytically,

Sorry if it’s all a bit heavy on the neuroscience, but that’s one of the reasons I’m trying to do a clean room approach, to disentangle some of the domain-specific pieces from the more statistical parts.

I’m still spelunking in their MATLAB code for the Volterra kernel stuff.

Here’s an unrelated paper closer to your first PDF which makes a connection from Volterra & Wiener series to GPs,

From the paragraph below Eq 16 on page 4

In other words, Gaussian process regression can be used as an alternative regression technique
for estimating implicit Volterra and Wiener series. The interpretation of the polynomial kernel as a covariance function leads us also to an alternative explanation of the unfavorable generalisation properties of polynomial regression: a polynomial covariance function implies a high covariance for distant inputs. In most real-world problems we have the reverse situation, i.e. nearby inputs typically result in similar outputs. Therefore, in the Gaussian process view, a polynomial covariance implies a prior over the function space which favors functions not suited for most real-world problems.

I don’t have time to digest right now enough of the math in the paper but maybe standard GPs estimated with Stan correspond to a low-order truncated Volterra series? </wild-guess>

No worries, I worked in an fMRI lab for a summer so I’m familiar with the vocab.

Here’s a paper that’s helping me with the motivation as to why this is important in neuroscience:

I’ll get back with you after I’ve read and digested a lot of this stuff. I’m finishing up this semester and then I’ll dig more into this.

Quick question, what exactly are you applying this to EEG or fMRI? If it’s fMRI can you describe exactly what kind, and the dimensions of the data we’d be working with? I remember having 30, 256x256 connectivity matrices and then an additional design with some demographic characteristics, may be 30x5 design matrix.

Can you please be more detailed? I want to know how “big” it is.

Due to high sampling rate of EEG we’d have to use a state space solution of a GP, which might not be possible with this kernel.

But it’s good to know this stuff before hand before the project dead-ends.

We have a few use cases; fMRI is the low temporal resolution, high spatial resolution case, MEG (like EEG) is the high temporal resolution, low spatial resolution case. I work mostly with MEG and EEG, so the data size is 64 or 248 sensors, 256 Hz sampling frequency or so. Typically though, we’re estimating a latent source space from 164 regions to tens of thousands of vertices (on a cortical surface).

So far we have used a recurrent neural network as a latent state space model, specifically with neural mass models, but the parameters of these models are highly nonlinearly correlated, making it hard to estimate. Even when we can estimate them, they make not have the right flexibility hence the interest in more flexible state space models and the interest in Volterra series (since I did not yet see how to GP for this). We have some further regularization tricks on the back burner, based on spherical harmonics of neural fields, but not ready quite yet.

In the case of fMRI, the data are so much smaller that we haven’t had problems with HMC.

I’ll get back once I digest the math.

Yes it was unfortunate that they used Matlab for this, but I just want to mention that knowing Arno and Simo, I’m certain that they are happy to help with any questions you have if you contact them. They have experience with fMRI, MEG and EEG and they could certainly provide specific algorithm recommendations given measurement modality and resolution in different dimensions.


Thanks for the names, I’ll get in touch. Again, fact check was a poor choice of word, rather simply trying to port the methods and missing relevant papers here and there.