Dynamic panel data models with Stan?

It’s well known that in panel data models (that is, models with observations across individuals and time), if you have individual intercepts/slopes, then including a lagged dependent variable on the right hand side is a big no-no (potentially enormous biases).

To see this, imagine you have

y_{it} = \alpha + u_{i} + \delta y_{it-1} + X_{it}\beta + \epsilon_{it}

with u_{i} \sim \text{Normal}(0, \sigma_{u}) being the random effects. By Gauss-Markov, these must be uncorrelated to X_{it} and y_{it-1} in our statistical model. But u_{i} applies to all observations of y_{i}, and so by construction E[u_{i} y_{it}]\neq 0. This typically inflates our estimate of \delta and shrinks our estimate of \beta.

Non-Bayesians use the Arellano-Bond estimator, which is a GMM approach to solving this problem. The idea is to express it in first-differences (getting rid of the random effect) in which case you’re only dealing with the correlation of the change in the error and change in lagged dependent variable. Then use GMM, instrumenting the differenced lagged dependent variable with its own lags.

Does anyone here have experience fitting these models in Stan? Happy to just throw Bayesian IV at it, but the Frequentist counterpart to this is less efficient than Arellano Bond. Ideas?

11 Likes

Hi Jim,

Can you expand a bit on exactly what the model you want to fit it? Is it the one in the display?

Haven’t done these models personally but I tried a little experiment since it was quick to do, and you were right I can’t recover the parameters.

generate_panel_data <- function(delta, beta, x) {
  
  N <- length(x)
  u <- rnorm(1)
  y <- rep(NA, N)
  y[1] <- u + beta*x[1] + rnorm(1)
  for(n in 2:N) {
    y[n] <- u + delta*y[n-1] + beta*x[n] + rnorm(1)
  }
  
  tibble(y = y, u = u, lag_y = lag(y), x = x)
}

data <- generate_panel_data(1.2, 3.0, rnorm(20))

fit <- stan(file = "panel.stan",
            data = list(N = nrow(data),
                        x_lead = data$x %>% head(nrow(data)-1),
                        y_lead = data$y %>% head(nrow(data)-1),
                        y_lag = data$y %>% tail(nrow(data)-1)),
            chains = 1)

> fit
Inference for Stan model: panel.
1 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=1000.

        mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
u      -0.73    0.02 0.34  -1.38  -0.96  -0.74  -0.51  -0.08   431 1.00
delta   0.83    0.00 0.00   0.82   0.83   0.83   0.83   0.84   470 1.00
beta    0.45    0.01 0.26  -0.03   0.28   0.45   0.62   0.97   577 1.00
lp__  -48.11    0.07 1.30 -51.53 -48.70 -47.72 -47.18 -46.75   340 1.01
data {
  int N;
  vector[N-1] x_lead;
  vector[N-1] y_lead;
  vector[N-1] y_lag;
}
parameters {
  real u;
  real delta;
  real beta;
}
model {
  y_lead ~ normal(u + delta*y_lag + beta*x_lead, 1.0);
}

Yep @anon75146577 here’s some simulated data etc.


library(rstanarm); library(tidyverse)

# Simulate some fake data
delta <- 0.5
sigma_u <- 2
sigma_e <- 1
I <- 100
T <- 10

initial_values <- data_frame(individual = 1:I,
                             initial_y = rnorm(I),
                             u = rnorm(I, 0, sigma_u))


simulated_panel <- initial_values %>%
  group_by(individual) %>% 
  do({
    simulate <- function(x) {
      out <- data.frame(y = rep(NA, T),
                        time = rep(NA, T),
                        u = x$u)
      for(t in 1:T) {
        out$time[t] <- t
        if(t == 1) {
          out$y[t] <- rnorm(1, delta * x$initial_y + x$u, sigma_e)
        } else {
          out$y[t] <- rnorm(1, delta * out$y[t-1] + x$u, sigma_e)
        }
      }
      return(out)
    }
    as.data.frame(simulate(.))
  }) %>% 
  group_by(individual) %>% 
  mutate(lagged_y = lag(y)) %>% 
  filter(!is.na(lagged_y))


cor(simulated_panel$lagged_y, simulated_panel$u)

test_fit <- stan_lmer(y ~ lagged_y + (1 | individual), data = simulated_panel, iter = 1000, cores = 4)

bayesplot::mcmc_dens(as.data.frame(test_fit), pars = "lagged_y") +
  geom_vline(aes(xintercept = delta)) +
  annotate("text", y = 15, x = delta + 0.05, label = "true coef") +
  xlim(0, max(as.data.frame(test_fit)$lagged_y))

Man I wish we had R syntax highlighting on the forums!

Arya – this isn’t repeated observations across multiple individuals it it?

1 Like

The simulation is recursive but the Stan program isn’t. Is that a problem?

No just a single individual, but without the \alpha term. I figured I’d start simple. The posterior pairs plots don’t actually look bad for that model or show any non-identifiabilities.

Just bumping this in case the right person missed it. Seems too common a problem to not have been discussed on the forum before.

You should look at the R package ctsem, got continuous time SEM. This allows you to do dynamic systems with either frequentist or Bayesian, Bayesian is done with Stan.

The simplest version of the dinámica system is the panel model.

1 Like

I never took the time to quite wrap my head around the Arellano Bond thing, but the poor performance of the rstanarm spec was eye opening. As Mauricio said, ctsem provides a front end for specifying hierarchical time series / state space models in Stan. I took a quick look and it seems to behave as you want, though performance is much slower than it could be for a perfectly observed first order model like you have (since there is no need to integrate out the latent states).

install.packages("devtools")
library(devtools)
install_github("cdriveraus/ctsem")

library(rstanarm); library(tidyverse); library(ctsem)

# Simulate some fake data
delta <- 0.5
sigma_u <- 2
sigma_e <- 1
I <- 100
T <- 10

initial_values <- data_frame(individual = 1:I,
                             initial_y = rnorm(I),
                             u = rnorm(I, 0, sigma_u))


simulated_panel <- initial_values %>%
  group_by(individual) %>% 
  do({
    simulate <- function(x) {
      out <- data.frame(y = rep(NA, T),
                        time = rep(NA, T),
                        u = x$u)
      for(t in 1:T) {
        out$time[t] <- t
        if(t == 1) {
          out$y[t] <- rnorm(1, delta * x$initial_y + x$u, sigma_e)
        } else {
          out$y[t] <- rnorm(1, delta * out$y[t-1] + x$u, sigma_e)
        }
      }
      return(out)
    }
    as.data.frame(simulate(.))
  }) %>% 
  group_by(individual) %>% 
  mutate(lagged_y = lag(y)) %>% 
  filter(!is.na(lagged_y))


cor(simulated_panel$lagged_y, simulated_panel$u)


#ctsem model
ctm <- ctModel(n.manifest = 1,n.latent = 1,LAMBDA = diag(1), type = 'standt',
  MANIFESTMEANS=matrix(0,1,1),CINT=matrix('u',1,1),
  MANIFESTVAR=diag(1e-6,1),
  T0VAR=matrix('stationary',1,1),
  manifestNames = 'y')

#set all parameters except the initial states and intercept fixed across subjects
ctm$pars$indvarying[!ctm$pars$matrix %in% c('T0MEANS','CINT')] <- FALSE

#adjust id column name
ctm$subjectIDname <- 'individual'

#fit using optimization
ctf <- ctStanFit(as.data.frame(simulated_panel),ctstanmodel = ctm,
  chains=6,optimize=TRUE,verbose=0,iter=300,deoptim = F,isloops = 0,issamples = 100)
summary(ctf)

#fit with sampling
ctf <- ctStanFit(as.data.frame(simulated_panel),ctstanmodel = ctm,chains=6,iter=300)

cat(ctf$stanmodeltext) #to view the stan model -- lots of unnecessary stuff for this problem though!

Ask @syclik—somehow he or someone else turned on Stan highlighting. So I imagine if there’s a Stan plugin, there’s an R plugin!

Here’s my solution, with some code.

set.seed(123)
delta <- 0.5
sigma_u <- 0.5
sigma_e <- 1
I <- 100
T <- 30
u = rnorm(I, 0, sigma_u)
initial_values <- data_frame(
  individual = 1:I,
  u = u,
  initial_y = u + sigma_e / 
                  sqrt(1 - delta^2) * rnorm(I) # p(y_1 | u, sigma_e, delta)
)


simulated_panel <- initial_values %>%
  group_by(individual) %>% 
  do({
    simulate <- function(x) {
      out <- data.frame(y = rep(NA, T),
                        time = rep(NA, T),
                        u = x$u)
      for(t in 1:T) {
        out$time[t] <- t
        if(t == 1) {
          out$y[t] <- x$initial_y
        } else {
          out$y[t] <- rnorm(1, delta * (out$y[t-1] - x$u) + x$u, sigma_e)
        }
      }
      return(out)
    }
    as.data.frame(simulate(.))
  }) %>% 
  group_by(individual) %>% 
  mutate(lagged_y = lag(y)) %>% ungroup()

y <- matrix(0, T, I)
y_lag <- matrix(0, T, I)
for (i in 1:I) {
  y[,i] <- filter(simulated_panel, individual == i)$y
  y_lag[2:T,i] <- filter(simulated_panel, individual == i)$lagged_y[2:T]
}

stan_dat <- list(I = I, T = T, y = y, y_lag = y_lag)
library(rstan)
mod <- stan_model('dynamic-panel.stan')
fit <- sampling(mod, data = stan_dat, cores = 4, chains = 4, iter = 2000)

Here’s dynamic-panel.stan:

data {
  int<lower=1> I;
  int<lower=1> T;
  matrix[T,I] y;
  matrix[T,I] y_lag;
}
parameters {
  real<lower=0,upper=1> delta_raw;
  real<lower=0> sigma_u;
  real<lower=0> sigma_e;
  vector[I] u_raw;
}
transformed parameters {
  vector[I] u = sigma_u * u_raw;
  real delta = delta_raw * 2 - 1;
}
model {
  real one_minus_delta_sq = (1 - delta) * (1 + delta);
  vector[I] u_delta = u * (1 - delta);
  for (i in 1:I) {
    target += normal_lpdf(y[1,i] | u[i], 
                          sigma_e / sqrt(one_minus_delta_sq));
    target += normal_lpdf(y[2:T,i] | u_delta[i]
                                     + delta * y_lag[2:T,i], sigma_e);
  }
  u_raw ~ normal(0, 1);
  sigma_u ~ normal(0, 2);
  sigma_e ~ normal(0, 1);
  delta_raw ~ beta(4, 4);
}

Summary of the model run over the simulated data from above:

> print(fit , pars = c('delta','sigma_u','sigma_e'))
Inference for Stan model: dynamic-panel.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

        mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
delta   0.50    0.00 0.04 0.42 0.47 0.50 0.52  0.57   831 1.01
sigma_u 0.36    0.01 0.13 0.05 0.29 0.37 0.45  0.58   430 1.01
sigma_e 1.01    0.00 0.02 0.96 0.99 1.01 1.03  1.06  4000 1.00

Samples were drawn using NUTS(diag_e) at Fri Sep  7 08:03:36 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

When I boost the number of observations per individual to 50 the inference gets sharper for sigma_u and approaches the true value.

Inference for Stan model: dynamic-panel.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

        mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
delta   0.48       0 0.01 0.46 0.47 0.48 0.49  0.51  4000    1
sigma_u 0.46       0 0.04 0.38 0.43 0.46 0.49  0.55  1400    1
sigma_e 0.99       0 0.01 0.98 0.99 0.99 1.00  1.01  4000    1

Samples were drawn using NUTS(diag_e) at Fri Sep  7 08:17:52 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
1 Like

Could you refer the paper or model that you based on to write your code? I would like to use your model but I can’t find the dynamic panel data model that you are using in your model.
EDIT:
Just in case if someone is interested in this approach who has no econometric background, I found the following paper useful to understand rtrangucci’s code.
https://www.researchgate.net/profile/Hashem_Pesaran/publication/226673422_Random_Coefficient_Models/links/0fcfd5089366b8067f000000/Random-Coefficient-Models.pdf

Could you maybe explain why you multiplicate by 1-delta in vector[I] u_delta = u * (1 - delta);?
Why don’t you just use u[i] instead of u_delta[i] in target += normal_lpdf(y[2:T,i] | u_delta[i] + delta * y_lag[2:T,i], sigma_e);?

Hi @c5v, sorry for the delay. The use of u_delta[i] instead of u[i] is just an algebraic trick to cut down on computation. We could instead write the likelihood like:

target += normal_lpdf(y[2:T,i] | u[i] + delta * (y_lag[2:T,i] - u[i]), sigma_e)

which leads to

target += normal_lpdf(y[2:T,i] | u[i] + delta * y_lag[2:T,i] - delta * u[i], sigma_e)

simplifying to

target += normal_lpdf(y[2:T,i] | u[i] * (1 - delta) + delta * y_lag[2:T,i], sigma_e)

2 Likes

@rtrangucci Thanks for your reply, this clarifies a lot!

The only thing I don’t really get is why you subtract u[i] in delta*(y_lag[2:T,i] - u[i]) since we are not modelling first differences here, right? Could you maybe explain this to me, this will help me a lot!

It’s because I want the mean of each observation in group i to be u[i]. The model is

Y_{i,t} = \mu_i + \delta (Y_{i,t-1} - \mu_i) + \epsilon_{i,t}\\ \epsilon_{i,t} \sim \text{Normal}(0,1)

Taking expectations of each side:

\mathbb{E}[Y_{i,t}] = \mu_i + \delta \mathbb{E}[(Y_{i,t-1} - \mu_i)] \\ \mathbb{E}[Y_{i,t}] = \mu_i + \delta (\mathbb{E}[Y_{i,t-1}] - \mu_i) \\ \mathbb{E}[Y_{i,t}] = \mu_i
1 Like

@rtrangucci clear explanation thanks!

If I am right, in a multilevel setting (which I am building right now), which then looks like y_{ijt} = \beta_{0ij} + \beta_{1ij}x_{ijt} + \delta y_{ij,t-1} + \epsilon_{ijt}, I could also use your reasoning and model it like:

y_{ijt} = \mu_{ij} + \delta(y_{ij,t-1} - \mu_{ij}) + \epsilon_{ijt}, where \mu_{ij} = \beta_{0ij} + \beta_{1ij}x_{ijt}

and use your approach, right?

Sorry for all my questions, but I am really stuck with this problem.

No problem! You’re correct, you could model your process like that, and you’d get the nice property that \mathbb{E}[y_{ijt}] = \mu_{ij}

1 Like