Binary time series

linas · February 12, 2020, 2:20pm

Frankly I also don’t know why I need Y\sim N(\mu,\Sigma). I just following the advice from https://stats.stackexchange.com/questions/197084/binary-time-series.
Since data is time series I thought AR(1): y_t\sim N(\alpha+\beta y_{t-1},\sigma) makes sense to me - intuitively next state for some individual depends on the previous state of the same individual.
However I don’t know if below is possible (x - binary outcome, alpha, beta - GPs, Sigma,y[0] - parameters, T - # of time points):

target += bernoulli_lpmf(x | Phi(y));
for t in 1:T mu[t] = alpha + beta * y[t - 1];
target += multi_normal_lpdf(y | mu, Sigma);

So far I have managed with decent convergence

target += bernoulli_lpmf(x | Phi(y));
for t in 1:T y[t] = alpha + beta * y[t - 1];

linas · February 16, 2020, 10:39pm

Is there example how to specify different number of time points for each individual? What I mean is that time points for individual 1 are 0,1,2,…,5 months but time points for individual 2 are 0,1 months simply because individual 2 died. All examples in the documentation and tutorial have equal number of time points for each individual.

jtimonen · February 16, 2020, 11:18pm

You give the data argument of the lgp command as a data frame where one column is one covariate and one row is one observation. So in your case the data frame would look something like

id	age	other covariates…
1	0	…
1	1	…
1	2	…
1	3	…
1	4	…
1	5	…
2	0	…
2	1	…
3	…	…
3	…	…

alexpghayes · February 17, 2020, 2:00am

Can you comment on how you choose between GAMs / GPs for time trends and state space models?

Also thanks for the link, it looks wonderful!

alexpghayes · February 17, 2020, 2:04am

@jtimonen My impression is that lgpr is designed to work on datasets where we observe the same individuals repeatedly over time. Do you have any favorite resources on additive GPs for time series modeling of panel data where the individuals being measured change over time? Can I use lgpr for this kind of data, or does lgpr always fit individual-specific trends for each level in id_variable:time_variable?

jtimonen · February 17, 2020, 5:59am

Hi,

1) Let me answer this first:

No, because

The individual effect can be specified to be just a constant offset
The id covariate does not have to be in the model formula at all

A categorical covariate z can be included in the model as a time-independent offset by using the offset_vars argument, like in

fit <- lgp(formula = y ~ id + age + z, 
              data = data, 
       offset_vars = c("z"))

The identifier variable is a bit special however so currently if you dont want to fit a time-dependent trend for it, you have to define an additional covariate id_ofs (which has the exact same values as the id variable) and then use a model with id not included, like here:

data$id_ofs = data$id # copy id to id_ofs

fit <- lgp(formula = y ~ age + id_ofs, 
              data = data, 
       offset_vars = c("id_ofs"))

2) But now to this part:

Well yes it was designed for data where same observational units are measured at multiple time points, and I think that’s the definition of panel data (= longitudinal data). But lgpr can be used to model data where “individuals being measured change over time” (or even if there is only one measurement from each individual), but whether or not it makes sense depends on what you want to infer from the data and how exactly do they change over time. If you mean that for example

measurements at 1,2,3 months are from individual 1
measurements at 4,5,6 months are from individual 2
and so on

then it can be difficult to separate shared and individual-specific effects. But if the time points for individuals are more interleaved then possibly easier. And if in your application you can assume that there is no individual effect at all, then things become easier and you can just remove id from the model formula.

Only other additive GP works for panel data that I know are these two:

The first one is designed with quite similar assumptions about the measurement times as lgpr, and I think the second one too.

linas · February 17, 2020, 10:59pm

For my problem: id, age, 1 cont covariate, 1 cat covariate, 300 individuals 5 time points: Gradient evaluation took 2.632 seconds. Are there any tricks to improve the performance?

jtimonen · February 17, 2020, 11:14pm

I am currently implementing a basis function approximation to lgpr. It should make it scalable to thousands of data points.

Topic		Replies	Views
Changing regression response variable from gaussian to logistic Modeling fitting-issues	30	773	February 29, 2024
Time series and Gaussian process Modeling gaussian-process	62	4940	December 20, 2019
Fitting GP with noisy observations of time Modeling fitting-issues , bioinformatics	19	2611	August 2, 2017
New lgpr package and preprint: Longitudinal models using additive GPs Publicity	13	963	March 8, 2020
Hilbert space Gaussian process for multiple time series Modeling techniques , specification , gaussian-process	10	440	November 14, 2024

Binary time series

Related topics