Gaussian Processes and cumsum

#1

I’ve been trying to model a time series (weeks) of cumulative values using GP and a negbinomial likelihood as this:

EA_cumsum ~ 1 + gp(log2(week))
and
S_cumsum ~ 1 + gp(log2(week))

with this result:
Rplot-GP.pdf (13.6 KB)

compared to using non-GP as, e.g., S_cumsum ~ log2(week) with this result:
gamma-poisson.pdf (18.5 KB)

For some reason, in the GP case the EA curve goes down even though I’m dealing with cumulative data. I presume it is that at the end of the dataset EA plateaus:

> tail(d$EA_cumsum)
[1] 40945 40968 40968 40968 40968 40968

Is there a way to tell the GP model somehow that it is cumulative values and that it never can decrease?

#2

That would then be monotonic GPs, which are generally possible but not yet in brms.

#3

Big thank you for the quick reply @paul.buerkner

So I have to either go pure Stan or ARIMA then?

#4

I don’t know your specific data and modeling goals so I don’t know what good alternatives would be.

#5

Is there a possibility to model difference value on a log scale (add GP for that?)?

#6

In principle yes. could you clarify what model you have in mind exactly?

#7

I have two techniques being compared on a weekly basis: wekly data.csv (492 Bytes)

Making the columns cumsum() makes it easy to compare against a third approach which is a linear approach y=343x. I would like to model the techniques for the 54 weeks I have, but then make forecasts for 200+ weeks if possible (well it’s always possible, but I’d like to see if it is useful).

Modeling it with the outcome as cumsum() and the predictor as log2(week) using negbinomial() makes sense (see below fig), but when I try simple GP approaches they seem to have much better out of sample prediction so that makes me curious…
Rplot.pdf (18.5 KB)

#8

You mean log(diff) ~ gp(week) or diff ~ gp(log(week), either way it looks very funky and in some cases fails spectacularly :)

#9

Hmm, more like … model GP so that it must be monotonic

Like… if you put GP on a difference… or basically on a derivative and then construct you cumsum from that. If you model derivative so its larger than 0, then the main signal should be monotonic.

Sure, I don’t have any specific model at hand now and yes it could fail.

1 Like