Modelling two-choice decision from time-series data

I have time-series sensor data from different participants in a repeated experiment. In each run of the experiment, there is either a specific time when the participant has done a certain decision or not (at all). I can read this directly from the data. Now, given that I take all of this data for training and test, I would like to build a time-series dependent model. The model shall give at any time (and with the sensor data at that time) a prediction between 0 and 1, where 0 would mean that the person would not do take decision and 1 that the person takes the decision. I also want to investigate if the model can be improved by including past measurement samples.

What would you recommend for tackling this modeling problem? My first try would be some sort of logistic regression which I train on every time-sample, either with response 1 (after the decision was made) or 0 (before the decision was made). The sensor readings at that time (and possibly in the past) would be the covariates of the linear predictor. I have so far mainly encountered usage of logistic regression in “static” rather than time-series problems. Furthermore, how could I do this in a Bayesian setting? Are there already functions for instance in brms?

1 Like

I think what you describe is a survival model. There is a lot of discussion in the forums here if you look for survival model or proportional hazard models. I think the more robust Stan implementation is in rstanarm::surv_jm. There is a more experimental implementation in brms. See this Github discussion.

PS: I never worked with any of these models. So take this with a grain of salt.


If your trials are set up such that the trial ends when the participant responds and your outcome is the time-to-respond, then a survival model might make sense. It’s also common to have a timeout in such experiments such that the trial ends after, say 2minutes if the participant hasn’t responded yet, in which case you’d be dealing with a censored survival model.

If, on the other hand, your trials are set up with some sort of probe stimulus that occurs at different times from trial-to-trial, and you’re measuring whether the participant responds to the probe or not, then your original idea of modelling the outcome as binomial predicted by time-of-stimulus-onset is more appropriate. You can model the effect of time as linear very easily, but you should also consider methods that permit discerning of non-linear effects as well, including Generalized Additive Models and Gaussian Processes (the former being a computationally faster approximation of the latter).


I should also say that if the trials are time-to-decision and said times are relatively rapid (<2s), you should look at this resource:


Thanks, I’ll definitely take a look into survival models, the Cox Proportional Hazards model might be the most appropriate one on the first view.

Thanks for the excellent link first. I shall also check the Generative Additive Models.

Actually, I would say that I am doing some sort of probe stimulus which occurs in different variations according to protocol and the participants need to evaluate between two options. Option 1 is connected to an increased risk to the participant while option 2 is to not take that risk and go for a safer option. So I would say that a participant must have responded within couple of seconds roughly, for either of the two options. The issue I found with this logistic regression is how I set up the response vector. For the participants that chose the riskier option 1, I set the response to 0 until they took the decision and then 1 for some time, which I have a hard time to define. So the 0’s are currently quite over-represented in the response vector. Also, most participants went for the riskier decision, so that is also over-represented by a factor of 3 roughly. Can those imbalances cause certain issues?

Another follow-up question: my logistic regression fit seems to be “too good”, which I assume is due to the high time resolution, i.e. high sampling frequency. I have few trials but within each trial many samples. Is this an issue that affects the overall model?

Can you clarify on how your observed/quantified this?

When I look at the confidence interval of the model estimates (I did frequentist GLM for debugging for now), the interval is extremely narrow, indicating to me that the model is extremely certain about the estimate. When I downsample the signals, the confidence interval understandably gets wider.

I think the issue with the way I have modeled this is that logistic regression assumes independent observations. Since I take every observation (with a sampling rate of 100 Hz I am taking one observation every 10 ms), and those observations actually represent distances, I assume that there is some correlation between observations. I have read that with correlated observations, the model will be over-confident (which appears to be the case for me). Is there any work-around for using correlated observations?

If you have correlated observations, it is best to model the correlation. This is relatively well-explored in time-series analysis. The options I would try first would be to include an autoregressive (AR) term in the predictor. Alternatively, spline or gaussian-process terms might also make sense - all supported out of the box with brms (see , and “Special predictor terms” at

1 Like

Thanks for the suggestion, I will look into that possibility in brms. Just to confirm, an AR term is a previous response/observation, and not just a lagged predictor, right? Would you say that modelling the correlation with an AR term is somehow better than e.g. down-sampling the signals before fitting (to remove the correlation between observations) and then using the down-sampled signals to fit the model (without AR term)?

Yes. Not sure however, how brms implements it for logistic regression. Might be either the value at previous point or the linear predictor at previous point. You can always use make_stancode to inspect what is happening under the hood.

This obviously depends on the details of your dataset. Downsampling means throwing out data which is usually bad (unless you have so much data you can’t process it). So if the AR is a good approximation to your actual process, then including it will improve inferences. If in fact your process behaves very differently, than it can make things worse. You can usually design posterior predictive checks to see if the model fits well (in this case the posterior predictive distribution of differences across severa step sizes could be of interest)

Hope that helps!