How to fit a "lagged regression"

fabio · April 29, 2024, 12:37pm

Dear stan community,
I am facing a challenging model and I am asking for your guidance on how to set up my work.

I measured the value of a continuous response variable y(t) for several months sampling it every t=20 min. At each t I measured two continuous forcings variables x_1(t) and x_2(t).

When looking at the values, I noticed that the response variable “reacts” to the forcings with a delay which I would like to infer from the measures.
I have a couple of questions:

is there an example of a “lagged regression” that I can use as a starting point for coding a model?
I don’t know if it’s a good idea, but I could treat the delays T_1 and T_2 as parameters external to the model (i.e. explicitly stated in the data block) and implement a sort of grid sampling using LOOCV-IC to compare the models. This is a costly task (and I have to carefully think about indexing the vectors involved), so I am seeking your help to understand if it’s worth the effort.

Thanks in advance for any assistance and usual information!

jsocolar · April 29, 2024, 12:58pm

Do you believe that it is reacting at some specific particular unknown lag, or that it is reacting to the forcing variables over some window? If the latter, what are you willing to assume about the size and shape of the window?

fabio · April 29, 2024, 1:11pm

Thanks for the quick response. I would hope for two exact lags, but likely the lag of y with x_1 is between 3 and 5 hours and tje lag og y with x_2 between 6 and 8 hours. Moreover x_2 has only positive value with many of them are 0.

jsocolar · April 29, 2024, 1:28pm

If the lags are exact, then I think you’ll need to marginalize over the possible values of the lag, either explicitly inside a model or perhaps using your LOO approach. Fortunately, with 2-hour windows of plausible values, you’ll have only about 6 possible values for each variable, so 36 terms in the marginalization, which is highly tractable.

Note, however, that if you instead you were willing to assume that the response depends on a weighted average of the variable over some window, and the weighting function has tails that smoothly decay to zero, then you might be able to fit this without marginalizing. For example, if the weighting function is a normal PDF whose scale you either know a priori or estimate from data, and whose mean you estimate from data, then you’d have continuous derivatives without marginalizing. However, I think you would need the scale to be on the order of 20 minutes or greater to avoid problems with highly variable curvature and possible multimodality.

If the forcing variables are substantially autocorrelated, this might be effectively the same thing as identifying the single best lag.

Given that the true best lag might not fall a precise integer multiple of 20 minutes in the past, I would think that you’re already at least somewhat committed to using snapshots of the forcing variables to represent their values in a time window on the order of 20 minutes long.

fabio · May 7, 2024, 11:39am

Dear @jsocolar it took me a while to understand in depth your advice about marginalizing. So, I will try to implement different approaches, maybe starting in a principled way. I will let you (and all the community) know how it’s going with more detailed questions and snippet of code. In the meanwhile… happy modeling to you!

stevebronder · May 7, 2024, 9:31pm

Sorry @jsocolar, can you say what you mean by marginalize here?

jsocolar · May 8, 2024, 12:38am

The OP wants a regression of the form y = \alpha + \beta_1 x_1(t_1) + \beta_2 x_2(t_2) where the values of t_1 and t_2 are discrete unknowns. Thus the need to marginalize over the possible values of the pair (t_1, t_2). The OP has sufficiently strong prior information (or “eyeball the data” information) to restrict both t_1 and t_2 to about 6 possible values each, yielding 36 terms in the marginalization.

The marginalization will have the form

target += log_sum_exp(
  log_p[1] + response_lpdf(y | alpha + beta[1] * x1[1] + beta[2] * x2[1],
  ... 
  log_p[N] + response_lpdf(y | alpha + beta[1] * x1[i] + beta[2] * x2[j]
  )

where log_p is the logarithm of a simplex parameter.

edit: to write it in this compact form, x1 and x2 are arrays of pre-computed vectors giving the predictors at specific candidate lags.

Topic		Replies	Views
How to model a lag function? Modeling brms , time-series	0	255	January 10, 2024
Model Calibration Modeling fitting-issues	4	914	May 15, 2017
Dynamic panel data models with Stan? Modeling	59	9560	November 20, 2023
Sampler getting stuck? Modeling	18	3688	May 26, 2019
Distributed lag non-linear model Modeling	7	206	January 21, 2025

How to fit a "lagged regression"

Related topics