# Is it just very hard to estimate ARMA models?

I mostly want a sanity check here that I’m not doing something stupid. The tl;dr is that I’m suprised when doing simulation for ARMA style models that you need a pretty large N to recover parameters in a meaningful way (for both `stats::arima` and Stan via `brms`) and I’m wondering if there if better priors or something would help recover parmeters better at smaller N.

Trying to be a good boy I wanted to sanity check that I can recover the parameters from an ARMA(2, 2) model in R and BRMS before I do anything more complicated.

``````set.seet(1234)
# gen the arma mod
arma_mod = list(ar = c(0.3, -0.3), ma = c(0.1, -0.2), order = c(2, 0, 2))
arma_22 = arima.sim(n = 500, model = arma_mod, mean = 0, sd = 1)
# try using arima(2, 0, 2) for a sanity check
arma_fit = stats::arima(arma_22, order = c(2, 0, 2), include.mean = TRUE,
transform.pars = TRUE, method = "ML",
optim.method = "BFGS",
optim.control = list(maxit = 1000))
arma_fit
# Call:
# stats::arima(x = arma_22, order = c(2, 0, 2), include.mean = TRUE, transform.pars = TRUE,
#     method = "ML", optim.method = "BFGS", optim.control = list(maxit = 1000))
#
# Coefficients:
#          ar1      ar2     ma1      ma2  intercept
#       0.2981  -0.2711  0.0339  -0.2706     0.0683
# s.e.  0.0979   0.0857  0.0994   0.0985     0.0342
``````

So like, not great but if I squint at the parameters with the standard errors I’m like “Okay this is sort of recovering these parameters fine fine fine”

Now I put this into BRMS like so

``````arma_df = data.frame(y = arma_22)
mod1 = bf(y ~ arma(p = 2, q = 2))
# get_prior(mod1, arma_df)
prior1 = c(set_prior("normal(0, .5)", class = "ar"),
set_prior("normal(0, .5)", class = "ma"),
set_prior("normal(0, .5)", class = "sigma"),
set_prior("normal(0, .5)", class = "Intercept"))

# not setting init to 0 causes some bad R-hat values
fit1 = brm(mod1, data = arma_df, prior = prior1, iter = 2000, cores = 4, init = 0)
summary(fit1)
#  Family: gaussian
#   Links: mu = identity; sigma = identity
# Formula: y ~ arma(p = 2, q = 2)
#    Data: arma_df (Number of observations: 1000)
# Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#          total post-warmup samples = 4000
#
# Correlation Structures:
#       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# ar[1]    -0.12      0.28    -0.65     0.46 1.00     1790     2128
# ar[2]    -0.02      0.26    -0.54     0.48 1.00     1484     1697
# ma[1]     0.13      0.28    -0.45     0.66 1.00     1792     2050
# ma[2]     0.07      0.26    -0.45     0.58 1.00     1488     1643
#
# Population-Level Effects:
#           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept    -0.00      0.03    -0.06     0.06 1.00     3195     2217
#
# Family Specific Parameters:
#       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma     0.95      0.02     0.91     1.00 1.00     2898     2132
#
# Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).
# Warning message:
# There were 18 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
``````

So ya know, that doesn’t seem great? All the parameters sit within the CI, but gosh those CI are pretty big!

I figured it might be that it needs more observations so I just kept turning up the number of generated points until `stats::arima` started giving answers that seemed a little nicer. At around 1500 data points it seems to get closer to recovering the parameters as does `brms`

``````set.seet(1234)
# gen the arma mod
arma_mod = list(ar = c(0.3, -0.3), ma = c(0.1, -0.2), order = c(2, 0, 2))
arma_22 = arima.sim(n = 1500, model = arma_mod, mean = 0, sd = 1)
# try using arima(2, 0, 2) for a sanity check
arma_fit = stats::arima(arma_22, order = c(2, 0, 2), include.mean = FALSE, transform.pars = TRUE, method = "ML",
optim.method = "BFGS",
optim.control = list(maxit = 1000))
arma_fit
# Call:
#   stats::arima(x = arma_22, order = c(2, 0, 2), include.mean = FALSE, transform.pars = TRUE,
#                method = "ML", optim.method = "BFGS", optim.control = list(maxit = 1000))
#
# Coefficients:
#   ar1      ar2     ma1      ma2
# 0.2775  -0.2523  0.1174  -0.2820
# s.e.  0.0604   0.0470  0.0607   0.0577
#
# sigma^2 estimated as 0.9862:  log likelihood = -2118.3,  aic = 4246.6

arma_df = data.frame(y = arma_22)
mod1 = bf(y ~ 0 + arma(p = 2, q = 2))
# get_prior(mod1, arma_df)
prior1 = c(set_prior("normal(0, .5)", class = "ar"),
set_prior("normal(0, .5)", class = "ma"),
set_prior("normal(0, .5)", class = "sigma"))

# not setting init to 0 causes some bad R-hat values
fit1 = brm(mod1, data = arma_df, prior = prior1, iter = 2000, cores = 4, init = 0)
summary(fit1)
# Family: gaussian
# Links: mu = identity; sigma = identity
# Formula: y ~ 0 + arma(p = 2, q = 2)
# Data: arma_df (Number of observations: 1500)
# Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
# total post-warmup samples = 4000
#
# Correlation Structures:
#  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# ar[1]     0.27      0.06     0.14     0.38 1.00     2133     1962
# ar[2]    -0.26      0.05    -0.35    -0.17 1.00     2075     2360
# ma[1]     0.13      0.06     0.01     0.26 1.00     2082     1948
# ma[2]    -0.27      0.06    -0.38    -0.15 1.00     1782     2008
#
# Family Specific Parameters:
#   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma     0.99      0.02     0.96     1.03 1.00     3337     2406
#
# Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).
``````

So that looks a bit nicer, the Stan results at least seem to be near `stats::arima`, no divergent transition issues, etc.

But that seems like a lot of data points to recover the model? 1500 observations is about 4 years and some change of daily observations. Are my priors too wide? Do I need to think about better starting positions for the parameters? One thing that seems to help is using `dense_e` as the metric over `diag_e`. For `N = 500` using `dense_e` this seems to recover parameters similar to what `stats::arima` brings back

``````# running the above with n = 500
arma_fit
# Call:
#   stats::arima(x = arma_22, order = c(2, 0, 2), include.mean = FALSE, transform.pars = TRUE,
#                method = "ML", optim.method = "BFGS", optim.control = list(maxit = 1000))
#
# Coefficients:
#   ar1      ar2      ma1      ma2
# 0.4161  -0.2276  -0.0593  -0.3134
# s.e.  0.0956   0.0851   0.0942   0.0918
#
# sigma^2 estimated as 1.138:  log likelihood = -742.06,  aic = 1494.13

fit1 = brm(mod1, data = arma_df, prior = prior1, iter = 2000, cores = 4, init = 0,
control = list(metric = "dense_e"))
summary(fit1)
# Family: gaussian
# Links: mu = identity; sigma = identity
# Formula: y ~ 0 + arma(p = 2, q = 2)
# Data: arma_df (Number of observations: 500)
# Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
# total post-warmup samples = 4000
#
# Correlation Structures:
#   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# ar[1]     0.38      0.10     0.19     0.57 1.00     4398     2875
# ar[2]    -0.24      0.08    -0.40    -0.07 1.00     4720     2812
# ma[1]    -0.03      0.10    -0.21     0.17 1.00     4060     2855
# ma[2]    -0.28      0.09    -0.45    -0.09 1.00     4885     3112
#
# Family Specific Parameters:
#   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma     1.07      0.03     1.01     1.14 1.00     6063     3127
#
# Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).
``````

Are there some other tricks to recovering parmeters for arma models I should be aware of? The generated Stan code from brms is here

``````# gen code and data for brms model
make_stancode(mod1, data = arma_df, prior = prior1)
make_standata(mod1, data = arma_df)
``````
``````// generated with brms 2.14.11
functions {
}
data {
int<lower=1> N;  // total number of observations
vector[N] Y;  // response variable
// data needed for ARMA correlations
int<lower=0> Kar;  // AR order
int<lower=0> Kma;  // MA order
// number of lags per observation
int<lower=0> J_lag[N];
int prior_only;  // should the likelihood be ignored?
}
transformed data {
int max_lag = max(Kar, Kma);
}
parameters {
vector[Kar] ar;  // autoregressive coefficients
vector[Kma] ma;  // moving-average coefficients
real<lower=0> sigma;  // residual SD
}
transformed parameters {
}
model {
// likelihood including constants
if (!prior_only) {
// matrix storing lagged residuals
matrix[N, max_lag] Err = rep_matrix(0, N, max_lag);
vector[N] err;  // actual residuals
// initialize linear predictor term
vector[N] mu = rep_vector(0.0, N);
// include ARMA terms
for (n in 1:N) {
mu[n] += Err[n, 1:Kma] * ma;
err[n] = Y[n] - mu[n];
for (i in 1:J_lag[n]) {
Err[n + 1, i] = err[n + 1 - i];
}
mu[n] += Err[n, 1:Kar] * ar;
}
target += normal_lpdf(Y | mu, sigma);
}
// priors including constants
target += normal_lpdf(ar | 0, .5)
- 1 * log_diff_exp(normal_lcdf(1 | 0, .5), normal_lcdf(-1 | 0, .5));
target += normal_lpdf(ma | 0, .5)
- 1 * log_diff_exp(normal_lcdf(1 | 0, .5), normal_lcdf(-1 | 0, .5));
target += normal_lpdf(sigma | 0, .5)
- 1 * normal_lccdf(0 | 0, .5);
}
generated quantities {
}
``````
4 Likes

I’m guessing the parameters are just highly correlated with each other. You’re probably recovering the combined output of the parameters (ie the predictions) just fine.

3 Likes

Yeah the predictions look fine so I think this could be the issue

Yes, but if the parameters themselves are of interest, that’s still a problem. (Even if not, chances are you can do better.)

I’d recommend Minnesota priors, a class of priors designed to encode reasonable prior information about time series, e.g. that autoregression parameters on longer lags will usually be smaller than those on shorter lags and that the sum of parameters will usually be close to, but not quite equal, to 1. This handbook includes a good introduction.

2 Likes

Are you using N(0,) for the ar parameter? Note that if the abs value is close to 1, convergence is hard, you can use a uniform U(-1, 1) or a Beta() * 2 - 1 instead.

This isn’t directly related to prior choice but my understanding of time series (and stochastic process more generally) is that if has a long characteristic length scale (or high autocorrelation) having to observe the process for a while to infer the parameters accurately seems unavoidable.