Dear all,
I am fitting a model for time-series analysis of Wikipedia views with STAN through the ‘brms’ package.
I came up with a pretty good distributional model, which nevertheless requires some 1st order ARMA adjustment, as observations exhibit a slight first-order autocorrelation in time.
The model is not the easiest one: it contains two splines for modeling the year and the month effect over the response variable, plus it accounts for a linear change in time of the zero inflation and the shape parameters. The code below is the one I am using on a DELL M4800, running 6 MCMC chains on 6 different cores.
mod ← brm(bf(views ~ s(month) + s(year) + period, zi ~ year, shape ~ year),
family=zero_inflated_negbinomial(link = “log”, link_shape = “log”,
link_zi = “logit”),
autocor = cor_ar(form = ~time, p = 1, cov = TRUE),
chain=6, cores=6, iter=5000, warmup = 1000,
thin = 10, refresh = 0,control = list(adapt_delta = 0.999, max_treedepth=15),
data=wpd)`
The model without the autocorrelation structure works well, although fitting is a bit time-consuming. However, fitting the model with the ARMA structure takes far too much time: I run it on a workstation with approximately 24 cores (after having adjusted the number of chains accordingly) and after about a week it wasn’t over yet.
I would like to ask you:
-
if such of a long timespan for model fitting is normal
-
if there is some gross misspecification that I did not notice in the code
-
if there is something I could do to improve the speed of model fitting. I imagine that some sort of improvement might be obtained by specifying priors, but I have no idea on how this can be done in a nonlinear setting (any documentation would be appreciated).
Any help would be greatly appreciated.DatasetBRMSGitHub.csv (5.1 KB) Find attached a reproducible dataset.
Thanks,
Jacopo Cerri