 # Gaussian Processes and cumsum

#1

I’ve been trying to model a time series (weeks) of cumulative values using GP and a negbinomial likelihood as this:

`EA_cumsum ~ 1 + gp(log2(week))`
and
`S_cumsum ~ 1 + gp(log2(week))`

with this result:
Rplot-GP.pdf (13.6 KB)

compared to using non-GP as, e.g., `S_cumsum ~ log2(week)` with this result:
gamma-poisson.pdf (18.5 KB)

For some reason, in the GP case the `EA` curve goes down even though I’m dealing with cumulative data. I presume it is that at the end of the dataset `EA` plateaus:

``````> tail(d\$EA_cumsum)
 40945 40968 40968 40968 40968 40968
``````

Is there a way to tell the GP model somehow that it is cumulative values and that it never can decrease?

#2

That would then be monotonic GPs, which are generally possible but not yet in brms.

#3

Big thank you for the quick reply @paul.buerkner

So I have to either go pure Stan or ARIMA then?

#4

I don’t know your specific data and modeling goals so I don’t know what good alternatives would be.

#5

Is there a possibility to model difference value on a log scale (add GP for that?)?

#6

In principle yes. could you clarify what model you have in mind exactly?

#7

I have two techniques being compared on a weekly basis: wekly data.csv (492 Bytes)

Making the columns `cumsum()` makes it easy to compare against a third approach which is a linear approach y=343x. I would like to model the techniques for the 54 weeks I have, but then make forecasts for 200+ weeks if possible (well it’s always possible, but I’d like to see if it is useful).

Modeling it with the outcome as `cumsum()` and the predictor as `log2(week)` using `negbinomial()` makes sense (see below fig), but when I try simple `GP` approaches they seem to have much better out of sample prediction so that makes me curious…
Rplot.pdf (18.5 KB)

#8

You mean `log(diff) ~ gp(week)` or `diff ~ gp(log(week)`, either way it looks very funky and in some cases fails spectacularly :)

#9

Hmm, more like … model GP so that it must be monotonic

Like… if you put GP on a difference… or basically on a derivative and then construct you cumsum from that. If you model derivative so its larger than 0, then the main signal should be monotonic.

Sure, I don’t have any specific model at hand now and yes it could fail.

1 Like