Am I doing something wrong or is it a bug in autocorrelation implementation?
The very same model that computes normally in five minutes, takes hours when autocorrelation term is added with a single lag. In fact the original model is a distributed lag one, so it does pretty much the same (single lag of dependent variable is among predictors). Can’t understand why such difference in computation time.
Also, the first time I try fit it (with autoregression term added instead of explicitly specified lag variable), chains did not converge and error summary included suggestion to increase max_treedepth. I have set it to 15, and now it just takes forever, despite I have reduced number of chains to just two.
Specification is as follows:
mod1 = brm(
bf(y ~ x1 + x2 + ar(p = 1, cov = TRUE),
phi ~ x2),
data = data1,
family = Beta(link = "logit", link_phi = "log"),
prior = prior1,
warmup = 2500,
iter = 5000,
chains = 2,
control = list(adapt_delta = 0.9, max_treedepth = 15),
seed = 1234
)
Latest R, latest rstan, latest brms (stable release). Mac OS 11.1
Hi,
I am no expert on autoregressive models, but since nobody else responded, I will give it a try.
The thing to notice is that the AR models brms
(and AFAIK more generally) do something very different from taking the observed value at previous timestep as predictor. It takes the true (unobserved, de-noised, …) value of the process at previous timestep as predictor. If you were working with normal
response, this “only” makes the overall distribution a multi-variate normal with a specifically structured correlation matrix and the model only gains the AR parameter.
However, this simple math stops working once you have non-normal response distribution. So instead what brms
does (and what you can inspect in the Stan code generated by brms
via make_stancode
is to introduce one new parameter for each observation to model the residuals (vector[N] err;
in the code) and keep them correlated following the AR structure. The model than has (paraphrasing) y ~ beta_proportion(mu + err, phi)
where mu
is the linear predictor without the autocorrelation term.
There are many ways this can go wrong - first you just added a ton of new parameters, so the model will definitely be slower. Second (and IMHO more importantly), you now effectively have two noise parameters: one representing the standard deviation of the AR residuals (sderr
) and the scale of the beta noise (phi
). It is quite possible that increasing sderr
has almost the same effect on the likelihood as decreasing phi
and so both cannot be identified unless you observe multiple seprate time series (and non-identifiabilities wreck the sampler). You can check if this is the case by inspecting the pairs
plot for the parameters and seeing if there is a negative correlation (would likely be visible even in the broken fit). Also the additional flexibility phi
has due to it being predicted as well definitely does not help.
This is unfortunately one of the cases where the power of brms
becomes somewhat dangerous - brms
does a lot of magic behind the scenes and it is very easy to specify a model that has serious problems.
Does that make sense?
Best of luck with your model!
EDIT: you are more likely to get help with brms
using the brms
category and note that you can use ``` (triple backticks) to mark code blocks. I edited and recategorized the post for you.
4 Likes
The problem in your case is likely that the model is not nicely idenfied. This is not because of the AR term per se but because the cov = TRUE approach to ar terms in non-Gaussian models implies adding latent residuals to the model which also models overdispersion in addition to the AR stuff. However, the beta likelihood already has a dispersion parameters in place (phi) so this clashes with the latent variance added for the purpose of the AR term.
4 Likes
What would be a correct approach if the likelihood is Beta and there is autocorrelation? Just specify if as ARDL without ar() part?
I don’t think we can say what would be "correct " without much deeper knowledge of what is going on in your data. And there likely isn’t a single “correct” model. What I would start with would be experimenting with what makes the model problematic. If you fix the phi
parameter (via a constant
prior) to a moderately large value, does it work better? Does the pairs plot of the broken fit (possibly just for subset of the data) look as I described above?
Then I would try to think why do you expect autocorrelation and how would a model with wrong handling of autocorrelation look. Can you design a posterior predictive check that would be sensitive to those types of problems? A good start might be looking at differences between successive timepoints. Does the model without autocorrelation or just using lagged values as predictors fail those checks? If you can’t find a problem with the fit without autocorrelation, you likely don’t have enough data to learn about the autocorrelation anyway.
Best of luck with your model
1 Like