Referencing Previous Observation in Longitudinal Modeling

harrelfe · December 2, 2020, 2:36pm

As being discussed here I am trying to formulate a stochastic process for modeling serial correlation on the logit scale for binary and ordinal logistic models. I am trying to handle the irregular measurement time situation and prefer to hand Stan (from rstan) a tall and thin dataset. Assume the dataset is sorted by subject ID and by time within ID.

With such a formulation, the per-observation log-likelihood contribution computed in the Stan code needs to know if the record is the first record per subject, and if not it needs to know the measurement time for the previous record for the subject.

Is this a good way to think about it, and how should this “lag” be coded in Stan? Do I need to manually add a lagged time in the input data?

andrjohns · December 2, 2020, 2:51pm

I’m not sure I’m entirely understanding, but I think this is what you want:

Assuming your dataset is setup like:

ID	Time	y
1	1	y
1	2	y
1	3	y
2	1	y
2	2	y
2	3	y

You would then have a loop over the Time variables:

// Where M is the number of rows in the dataset
for(m in 1:M) {
  if(Time[m] == 1) {
    log_lik[m] = ...;
  } else {
    int lag = Time[m] - Time[m-1];
    log_lik[m] = ...;
  }
}

Does that cover what you’re after?

harrelfe · December 2, 2020, 2:53pm

Yes I forget that the per-observation log-likelihood is dealing with data arrays and not just the one observation. The one change I’d need to make to your code is to take into account that the first time may not be t=1, but rather I need to look at a change from the previous record’s subject ID.

andrjohns · December 2, 2020, 2:58pm

Oh I see, in that case I think the simplest solution would be to have another variable indicating the observation number:

ID	Obs_N	Time	y
1	1	3	y
1	2	5	y
1	3	7	y
2	1	2	y
2	2	3	y
2	3	4	y

And then loop over that:

// Where M is the number of rows in the dataset
for(m in 1:M) {
  if(Obs_N[m] == 1) {
    log_lik[m] = ...;
  } else {
    int lag = Time[m] - Time[m-1];
    log_lik[m] = ...;
  }
}

Charles_Driver · December 2, 2020, 3:46pm

I think in the continuous time case it’s usually more helpful to think in terms of a latent process that you sometimes observe via instruments. This usually means some kind of state vector that gets updated (to some degree) whenever an observation appears, rather than a column of lagged predictors. I posted an example using ctsem (depends on rstan) in the cross validated thread.

Topic		Replies	Views
Model specification for hierarchical AR(1) and Ornstein-Uhlenbeck process Modeling hierarchical-model	14	4808	December 25, 2020
Time-series in Stan, I am new to Stan and need hints to develop the model. THANKS Modeling rstan , specification	43	2839	June 12, 2020
Likelihood for multiple measurements Modeling specification	3	564	May 14, 2018
Dynamic panel data models with Stan? Modeling	59	9552	November 20, 2023
NLME model in STAN for missing + unbalanced longitudinal data Modeling rstan , mixed-model , missing-data	20	471	October 21, 2024

Referencing Previous Observation in Longitudinal Modeling

Related topics