As being discussed here I am trying to formulate a stochastic process for modeling serial correlation on the logit scale for binary and ordinal logistic models. I am trying to handle the irregular measurement time situation and prefer to hand Stan (from rstan
) a tall and thin dataset. Assume the dataset is sorted by subject ID and by time within ID.
With such a formulation, the per-observation log-likelihood contribution computed in the Stan code needs to know if the record is the first record per subject, and if not it needs to know the measurement time for the previous record for the subject.
Is this a good way to think about it, and how should this “lag” be coded in Stan? Do I need to manually add a lagged time in the input data?
1 Like
I’m not sure I’m entirely understanding, but I think this is what you want:
Assuming your dataset is setup like:
ID |
Time |
y |
1 |
1 |
y |
1 |
2 |
y |
1 |
3 |
y |
2 |
1 |
y |
2 |
2 |
y |
2 |
3 |
y |
You would then have a loop over the Time variables:
// Where M is the number of rows in the dataset
for(m in 1:M) {
if(Time[m] == 1) {
log_lik[m] = ...;
} else {
int lag = Time[m] - Time[m-1];
log_lik[m] = ...;
}
}
Does that cover what you’re after?
1 Like
Yes I forget that the per-observation log-likelihood is dealing with data arrays and not just the one observation. The one change I’d need to make to your code is to take into account that the first time may not be t=1, but rather I need to look at a change from the previous record’s subject ID.
Oh I see, in that case I think the simplest solution would be to have another variable indicating the observation number:
ID |
Obs_N |
Time |
y |
1 |
1 |
3 |
y |
1 |
2 |
5 |
y |
1 |
3 |
7 |
y |
2 |
1 |
2 |
y |
2 |
2 |
3 |
y |
2 |
3 |
4 |
y |
And then loop over that:
// Where M is the number of rows in the dataset
for(m in 1:M) {
if(Obs_N[m] == 1) {
log_lik[m] = ...;
} else {
int lag = Time[m] - Time[m-1];
log_lik[m] = ...;
}
}
2 Likes
I think in the continuous time case it’s usually more helpful to think in terms of a latent process that you sometimes observe via instruments. This usually means some kind of state vector that gets updated (to some degree) whenever an observation appears, rather than a column of lagged predictors. I posted an example using ctsem (depends on rstan) in the cross validated thread.
3 Likes