Binary time series

Frankly I also don’t know why I need Y\sim N(\mu,\Sigma). I just following the advice from https://stats.stackexchange.com/questions/197084/binary-time-series.
Since data is time series I thought AR(1): y_t\sim N(\alpha+\beta y_{t-1},\sigma) makes sense to me - intuitively next state for some individual depends on the previous state of the same individual.
However I don’t know if below is possible (x - binary outcome, alpha, beta - GPs, Sigma,y[0] - parameters, T - # of time points):

target += bernoulli_lpmf(x | Phi(y));
for t in 1:T mu[t] = alpha + beta * y[t - 1];
target += multi_normal_lpdf(y | mu, Sigma);

So far I have managed with decent convergence

target += bernoulli_lpmf(x | Phi(y));
for t in 1:T y[t] = alpha + beta * y[t - 1];

Is there example how to specify different number of time points for each individual? What I mean is that time points for individual 1 are 0,1,2,…,5 months but time points for individual 2 are 0,1 months simply because individual 2 died. All examples in the documentation and tutorial have equal number of time points for each individual.

You give the data argument of the lgp command as a data frame where one column is one covariate and one row is one observation. So in your case the data frame would look something like

id age other covariates…
1 0
1 1
1 2
1 3
1 4
1 5
2 0
2 1
3
3

Can you comment on how you choose between GAMs / GPs for time trends and state space models?

Also thanks for the link, it looks wonderful!

@jtimonen My impression is that lgpr is designed to work on datasets where we observe the same individuals repeatedly over time. Do you have any favorite resources on additive GPs for time series modeling of panel data where the individuals being measured change over time? Can I use lgpr for this kind of data, or does lgpr always fit individual-specific trends for each level in id_variable:time_variable?

Hi,

1) Let me answer this first:

No, because

  1. The individual effect can be specified to be just a constant offset
  2. The id covariate does not have to be in the model formula at all

A categorical covariate z can be included in the model as a time-independent offset by using the offset_vars argument, like in

fit <- lgp(formula = y ~ id + age + z, 
              data = data, 
       offset_vars = c("z"))

The identifier variable is a bit special however so currently if you dont want to fit a time-dependent trend for it, you have to define an additional covariate id_ofs (which has the exact same values as the id variable) and then use a model with id not included, like here:

data$id_ofs = data$id # copy id to id_ofs

fit <- lgp(formula = y ~ age + id_ofs, 
              data = data, 
       offset_vars = c("id_ofs"))

2) But now to this part:

Well yes it was designed for data where same observational units are measured at multiple time points, and I think that’s the definition of panel data (= longitudinal data). But lgpr can be used to model data where “individuals being measured change over time” (or even if there is only one measurement from each individual), but whether or not it makes sense depends on what you want to infer from the data and how exactly do they change over time. If you mean that for example

  • measurements at 1,2,3 months are from individual 1
  • measurements at 4,5,6 months are from individual 2
  • and so on

then it can be difficult to separate shared and individual-specific effects. But if the time points for individuals are more interleaved then possibly easier. And if in your application you can assume that there is no individual effect at all, then things become easier and you can just remove id from the model formula.

Only other additive GP works for panel data that I know are these two:

The first one is designed with quite similar assumptions about the measurement times as lgpr, and I think the second one too.

1 Like

For my problem: id, age, 1 cont covariate, 1 cat covariate, 300 individuals 5 time points: Gradient evaluation took 2.632 seconds. Are there any tricks to improve the performance?

I am currently implementing a basis function approximation to lgpr. It should make it scalable to thousands of data points.

2 Likes