I am using a Bernoulli distribution for my network data with a resolution of 10 minutes. So I believe I need to account for autocorrelation. Since their are gaps in my data I would need to use a gaussian process.
My question is once I fit this model, what diagnostic test do I use to see that I have effectively accounted for the autocorrelation?
@paul.buerkner
@jsocolar
Thanks,
Hunter
Unfortunately you may find very significant difficulties in doing this when the response is Bernoulli. To apply a gaussian process to a binomial response the typical thing to do is to add a latent Gaussian residual. Unfortunately when the number of trials in the binomial response is 1, latent gaussian residuals tend to be unidentified. You can see this intuitively by thinking about a string of bernoulli samples with probability of 0.5. There is no good way to conclude whether the latent gaussian variance is zero, or whether the latent gaussian variance is huge. In either case the data stream would look identical.
There are other techniques for accounting for autocorrelation in Bernoulli data. In one dimension, a common option is the autologistic, where the actual observed outcome in the previous timestep is applied as a covariate on the current timestep. The associated slope parameter is then known as an “autologistic term”. In the presence of gaps in the data, it is possible to treat the outcome in unobserved timesteps as latent, and to efficiently marginalize over the unobserved latent bernoulli states via the forward algorithm, treating the whole model as a hidden markov model, albeit one where the emission probabilities are typically 0 and 1, such that markov states associated with observed timesteps are not really “hidden” at all.
Note that this approach is not available in brms except by constructing a fairly involved custom family. It would be easier to work in raw Stan.
Hi,
Thanks for this in depth explanation.
Can a time spline be used to capture temporal autocorrelation instead of a Gaussian process?
What about for discrete AR1 structure, because I see that in brms when using ar() and setting cov = TRUE for non-gaussian families it adds these latent residuals.
So for discrete AR1 using bernoulli should be fine?
Also for non gaussian distribution where you can use gaussian processes like with poisson, gamma, beta, etc what posterior predictive checks would you recommend.
Thanks,
Hunter
I was going to say this seems like an issue better dealt with at the random effect (modeled coefficient) level than the outcome. In the categorical outcome setting, for example, correlations among outcomes is a bit of a computational nightmare but seems comparatively trivial when we recast the relationship as a shared random effect across outcomes, which is often a more generative description of the underlying cause anyway. So yes, brms makes it easy to have an AR modeled coefficient and to tie it to group members as well I believe for your desired likelihood, but in terms of diagnostics to ensure that effect is functioning as intended, I defer to the time series specialists. One of the econometrics folks could probably offer good suggestions.