Data and Models
I’m trying to run a series of multilevel time series cross-sectional models. I observe outcomes for political parties nested within countries over a decade. Depending on the variables I use, I have approximately ~180 parties in 30 countries, measured over 10 years, yielding ~1360 party-country-year observations (not all parties are observed for the entire period). I have variables measured at the party and country level. Party variables are a mix of time-varying and time-invariant, while country variables are all time-varying. The main independent variable of interest is time-invariant.
The outcomes are count variables with high levels of overdispersion. For example, one of the outcomes has a [0:13,404,375] range (but very few zeros) with a mean of 97553.32 and variance of 250760871628. The outcomes have a long-tail, with a handful of observations with extreme values. I use negative binomial regression (Poisson models extremely poorly with low Bulk_ESS and Tail_ESS even with thousands of iterations).
Issues
I’ve managed to get the models to fit without any warnings but I’m running into issues when I try to include an AR(1) process to account for temporal autocorrelation in the residuals. Specifically, including the ar(time = year, gr = party, p=1)
term results in most observations having extremely high Pareto K values when running loo
. Just including the random effects results in problematic Pareto K values and almost all are bad or very bad when including the AR(1) process. I’ve looked at the observations with high values but cannot see any clear patterns (e.g. extreme or unusual values).
Moreover, it seems that the ar
and shape
parameters can have a strange interaction, such that both have estimation issues (high R hats) and the estimated shape
parameter is much larger when the AR(1) process is included.
I’ve tested out different priors including the defaults for the AR(1) process but it doesn’t appear to make much difference. The issue also persists with more complex models with random coefficients at the country and year level for the main independent variable of interest.
I have been unable to find any answers to this question on the forums. The literature on TSCS analysis strongly recommends that models include AR(1) terms, lagged-dependent variables, or other terms to account for dynamics (e.g. Beck and Katz 2011), but unfortunately, the key works using Monte Carlo simulations to evaluate TSCS seem to simply their analyses by leaving these factors out (e.g. Shor et al. 2007, Beck and Katz 2007), so offer little guidance on the issue (References below). Much of the literature in this area focuses on panel data using repeated cross-sections where it is unnecessary to account for temporal dependencies.
I’m a little unsure how to proceed here. I’ve read that the warnings can indicate misspecification or model flexibility. Is loo
even reliable when considering autocorrelation and/or multilevel models? The model specification is theoretically motivated and is consistent with prior work in the area, the models to converge without issues (sometimes there are a handful of divergent transitions but these go away when I increase adapt_delta
), and the posterior predictive checks look good, so I’m not sure how seriously to take the warnings. I’m not too concerned about model comparison either - while it would be nice to make comparisons using loo
, I plan to present a range of estimates across different model specifications.
Code
The code below shows condensed versions of four models. The first two include only population effects, the second two include all random intercepts. For each model I show a specification with and without the ar
term. I have simplified the formulas a little, but each model includes a factor variable for time and 11 other independent variables, either binary indicators or scaled continuous variables.
I then show the output for loo
and loo_compare
for all models.
# Simple model with population effects only
m1 <- brm(
y ~ log(x) +
as.factor(year) +
scale(x2) +
...,
prior = c(prior(normal(0,1), class = "b"),
prior(normal(0,1), class = "Intercept"),
prior(exponential(1), class = "shape")),
data = data, family = negbinomial,
iter = 2000, warmup = 1000, cores = 16, chains = 2)
# + AR(1)
m2 <- brm(
y ~ log(x) +
as.factor(year) +
scale(x2) +
... +
# autoregressive term
ar(time = year, gr = party, p=1),
prior = c(prior(normal(0,1), class = "b"),
prior(normal(0,1), class = "Intercept"),
prior(exponential(1), class = "shape"),
prior(normal(0.5, 0.5), class = "ar"),
prior(exponential(1), class = "sderr")),
data = data, family = negbinomial,
iter = 2000, warmup = 1000, cores = 16, chains = 2)
# + random intercepts - AR(1)
m3 <- brm(
y ~ log(x) +
as.factor(year) +
scale(x2) +
... +
# random part
(1 |party) + (1 |country) + (1 |year) + (1|country:year),
prior = c(prior(normal(0,1), class = "b"),
prior(normal(0,1), class = "Intercept"),
prior(exponential(1), class = "shape"),
prior(exponential(1), class = "sd")),
data = data, family = negbinomial,
iter = 2000, warmup = 1000, cores = 16, chains = 2)
# Random intercepts + AR(1)
m4 <- brm(
y ~ log(x) +
as.factor(year) +
scale(x2) +
... +
# autoregressive term
ar(time = year, gr = party, p=1) +
# random part
(1 |party) + (1 |country) + (1 |year) + (1|country:year),
prior = c(prior(normal(0,1), class = "b"),
prior(normal(0,1), class = "Intercept"),
prior(exponential(1), class = "shape"),
prior(normal(0.5, 0.5), class = "ar"),
prior(exponential(1), class = "sderr"),
prior(exponential(1), class = "sd")),
data = data, family = negbinomial,
iter = 2000, warmup = 1000, cores = 16, chains = 2
l.1 <- loo(m1)
l.2 <- loo(m2)
l.3 <- loo(m3)
l.4 <- loo(m4)
l.1
l.2
l.3
l.4
loo_compare(l.1, l.2, l.3, l.4)
Output
m1
Computed from 2000 by 1360 log-likelihood matrix
Estimate SE
elpd_loo -15231.0 84.4
p_loo 31.4 5.6
looic 30462.1 168.8
------
Monte Carlo SE of elpd_loo is NA.
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 1359 99.9% 164
(0.5, 0.7] (ok) 0 0.0% <NA>
(0.7, 1] (bad) 1 0.1% 13
(1, Inf) (very bad) 0 0.0% <NA>
See help('pareto-k-diagnostic') for details.
m2
Computed from 2000 by 1360 log-likelihood matrix
Estimate SE
elpd_loo -33073.8 931.0
p_loo 18783.9 855.4
looic 66147.6 1862.0
------
Monte Carlo SE of elpd_loo is NA.
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 1 0.1% 267
(0.5, 0.7] (ok) 33 2.4% 49
(0.7, 1] (bad) 166 12.2% 10
(1, Inf) (very bad) 1160 85.3% 0
See help('pareto-k-diagnostic') for details.
m3
Computed from 2000 by 1360 log-likelihood matrix
Estimate SE
elpd_loo -14239.7 86.0
p_loo 268.2 12.2
looic 28479.4 171.9
------
Monte Carlo SE of elpd_loo is NA.
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 1219 89.6% 295
(0.5, 0.7] (ok) 104 7.6% 67
(0.7, 1] (bad) 34 2.5% 22
(1, Inf) (very bad) 3 0.2% 10
See help('pareto-k-diagnostic') for details.
m4
Computed from 2000 by 1360 log-likelihood matrix
Estimate SE
elpd_loo -17632.2 198.3
p_loo 4403.3 154.1
looic 35264.4 396.6
------
Monte Carlo SE of elpd_loo is NA.
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 7 0.5% 340
(0.5, 0.7] (ok) 89 6.5% 48
(0.7, 1] (bad) 426 31.3% 10
(1, Inf) (very bad) 838 61.6% 0
See help('pareto-k-diagnostic') for details.
> loo_compare(l.1, l.2, l.3, l.4)
elpd_diff se_diff
m3 0.0 0.0
m1 -991.3 37.8
m4 -3392.5 154.5
m2 -18834.1 905.0
Here are the ar
and shape
parameters for the four models (in order, other output omitted):
Correlation Structures:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS Model
ar[1] 0.76 0.03 0.71 0.81 1.01 298 641 m2
sderr 0.92 0.02 0.88 0.96 1.02 178 718 m2
ar[1] 0.70 0.07 0.58 0.83 1.00 191 396 m4
sderr 0.44 0.02 0.40 0.49 1.01 144 335 m4
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS Model AR(1)
shape 0.82 0.03 0.77 0.88 1.00 3010 1428 m1 FALSE
shape 13.36 2.16 9.57 18.06 1.06 74 168 m2 TRUE
shape 3.34 0.14 3.07 3.62 1.00 1366 1440 m3 FALSE
shape 12.98 1.90 9.88 17.22 1.01 101 301 m4 TRUE
Thanks for taking the time to read this. Any advice would be much appreciated!
References
Beck, Nathaniel, and Jonathan N. Katz. 2007. “Random Coefficient Models for Time-Series—Cross-Section Data: Monte Carlo Experiments.” Political Analysis 15 (2): 182–95. Random Coefficient Models for Time-Series—Cross-Section Data: Monte Carlo Experiments | Political Analysis | Cambridge Core.
———. 2011. “Modeling Dynamics in Time-Series–Cross-Section Political Economy Data.” Annual Review of Political Science 14 (1): 331–52. https://doi.org/10.1146/annurev-polisci-071510-103222.
Shor, Boris, Joseph Bafumi, Luke Keele, and David Park. 2007. “A Bayesian Multilevel Modeling Approach to Time-Series Cross-Sectional Data.” Political Analysis 15 (2): 165–81. A Bayesian Multilevel Modeling Approach to Time-Series Cross-Sectional Data | Political Analysis | Cambridge Core.