Dynamic panel data models with Stan?


#1

It’s well known that in panel data models (that is, models with observations across individuals and time), if you have individual intercepts/slopes, then including a lagged dependent variable on the right hand side is a big no-no (potentially enormous biases).

To see this, imagine you have

y_{it} = \alpha + u_{i} + \delta y_{it-1} + X_{it}\beta + \epsilon_{it}

with u_{i} \sim \text{Normal}(0, \sigma_{u}) being the random effects. By Gauss-Markov, these must be uncorrelated to X_{it} and y_{it-1} in our statistical model. But u_{i} applies to all observations of y_{i}, and so by construction E[u_{i} y_{it}]\neq 0. This typically inflates our estimate of \delta and shrinks our estimate of \beta.

Non-Bayesians use the Arellano-Bond estimator, which is a GMM approach to solving this problem. The idea is to express it in first-differences (getting rid of the random effect) in which case you’re only dealing with the correlation of the change in the error and change in lagged dependent variable. Then use GMM, instrumenting the differenced lagged dependent variable with its own lags.

Does anyone here have experience fitting these models in Stan? Happy to just throw Bayesian IV at it, but the Frequentist counterpart to this is less efficient than Arellano Bond. Ideas?


#2

Hi Jim,

Can you expand a bit on exactly what the model you want to fit it? Is it the one in the display?


#3

Haven’t done these models personally but I tried a little experiment since it was quick to do, and you were right I can’t recover the parameters.

generate_panel_data <- function(delta, beta, x) {
  
  N <- length(x)
  u <- rnorm(1)
  y <- rep(NA, N)
  y[1] <- u + beta*x[1] + rnorm(1)
  for(n in 2:N) {
    y[n] <- u + delta*y[n-1] + beta*x[n] + rnorm(1)
  }
  
  tibble(y = y, u = u, lag_y = lag(y), x = x)
}

data <- generate_panel_data(1.2, 3.0, rnorm(20))

fit <- stan(file = "panel.stan",
            data = list(N = nrow(data),
                        x_lead = data$x %>% head(nrow(data)-1),
                        y_lead = data$y %>% head(nrow(data)-1),
                        y_lag = data$y %>% tail(nrow(data)-1)),
            chains = 1)

> fit
Inference for Stan model: panel.
1 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=1000.

        mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
u      -0.73    0.02 0.34  -1.38  -0.96  -0.74  -0.51  -0.08   431 1.00
delta   0.83    0.00 0.00   0.82   0.83   0.83   0.83   0.84   470 1.00
beta    0.45    0.01 0.26  -0.03   0.28   0.45   0.62   0.97   577 1.00
lp__  -48.11    0.07 1.30 -51.53 -48.70 -47.72 -47.18 -46.75   340 1.01
data {
  int N;
  vector[N-1] x_lead;
  vector[N-1] y_lead;
  vector[N-1] y_lag;
}
parameters {
  real u;
  real delta;
  real beta;
}
model {
  y_lead ~ normal(u + delta*y_lag + beta*x_lead, 1.0);
}


#4

Yep @Daniel_Simpson here’s some simulated data etc.


library(rstanarm); library(tidyverse)

# Simulate some fake data
delta <- 0.5
sigma_u <- 2
sigma_e <- 1
I <- 100
T <- 10

initial_values <- data_frame(individual = 1:I,
                             initial_y = rnorm(I),
                             u = rnorm(I, 0, sigma_u))


simulated_panel <- initial_values %>%
  group_by(individual) %>% 
  do({
    simulate <- function(x) {
      out <- data.frame(y = rep(NA, T),
                        time = rep(NA, T),
                        u = x$u)
      for(t in 1:T) {
        out$time[t] <- t
        if(t == 1) {
          out$y[t] <- rnorm(1, delta * x$initial_y + x$u, sigma_e)
        } else {
          out$y[t] <- rnorm(1, delta * out$y[t-1] + x$u, sigma_e)
        }
      }
      return(out)
    }
    as.data.frame(simulate(.))
  }) %>% 
  group_by(individual) %>% 
  mutate(lagged_y = lag(y)) %>% 
  filter(!is.na(lagged_y))


cor(simulated_panel$lagged_y, simulated_panel$u)

test_fit <- stan_lmer(y ~ lagged_y + (1 | individual), data = simulated_panel, iter = 1000, cores = 4)

bayesplot::mcmc_dens(as.data.frame(test_fit), pars = "lagged_y") +
  geom_vline(aes(xintercept = delta)) +
  annotate("text", y = 15, x = delta + 0.05, label = "true coef") +
  xlim(0, max(as.data.frame(test_fit)$lagged_y))


#5

Man I wish we had R syntax highlighting on the forums!


#6

Arya – this isn’t repeated observations across multiple individuals it it?


#7

The simulation is recursive but the Stan program isn’t. Is that a problem?


#8

No just a single individual, but without the \alpha term. I figured I’d start simple. The posterior pairs plots don’t actually look bad for that model or show any non-identifiabilities.


#9

Just bumping this in case the right person missed it. Seems too common a problem to not have been discussed on the forum before.


#10

You should look at the R package ctsem, got continuous time SEM. This allows you to do dynamic systems with either frequentist or Bayesian, Bayesian is done with Stan.

The simplest version of the dinámica system is the panel model.


#11

I never took the time to quite wrap my head around the Arellano Bond thing, but the poor performance of the rstanarm spec was eye opening. As Mauricio said, ctsem provides a front end for specifying hierarchical time series / state space models in Stan. I took a quick look and it seems to behave as you want, though performance is much slower than it could be for a perfectly observed first order model like you have (since there is no need to integrate out the latent states).

install.packages("devtools")
library(devtools)
install_github("cdriveraus/ctsem")

library(rstanarm); library(tidyverse); library(ctsem)

# Simulate some fake data
delta <- 0.5
sigma_u <- 2
sigma_e <- 1
I <- 100
T <- 10

initial_values <- data_frame(individual = 1:I,
                             initial_y = rnorm(I),
                             u = rnorm(I, 0, sigma_u))


simulated_panel <- initial_values %>%
  group_by(individual) %>% 
  do({
    simulate <- function(x) {
      out <- data.frame(y = rep(NA, T),
                        time = rep(NA, T),
                        u = x$u)
      for(t in 1:T) {
        out$time[t] <- t
        if(t == 1) {
          out$y[t] <- rnorm(1, delta * x$initial_y + x$u, sigma_e)
        } else {
          out$y[t] <- rnorm(1, delta * out$y[t-1] + x$u, sigma_e)
        }
      }
      return(out)
    }
    as.data.frame(simulate(.))
  }) %>% 
  group_by(individual) %>% 
  mutate(lagged_y = lag(y)) %>% 
  filter(!is.na(lagged_y))


cor(simulated_panel$lagged_y, simulated_panel$u)


#ctsem model
ctm <- ctModel(n.manifest = 1,n.latent = 1,LAMBDA = diag(1), type = 'standt',
  MANIFESTMEANS=matrix(0,1,1),CINT=matrix('u',1,1),
  MANIFESTVAR=diag(1e-6,1),
  T0VAR=matrix('stationary',1,1),
  manifestNames = 'y')

#set all parameters except the initial states and intercept fixed across subjects
ctm$pars$indvarying[!ctm$pars$matrix %in% c('T0MEANS','CINT')] <- FALSE

#adjust id column name
ctm$subjectIDname <- 'individual'

#fit using optimization
ctf <- ctStanFit(as.data.frame(simulated_panel),ctstanmodel = ctm,
  chains=6,optimize=TRUE,verbose=0,iter=300,deoptim = F,isloops = 0,issamples = 100)
summary(ctf)

#fit with sampling
ctf <- ctStanFit(as.data.frame(simulated_panel),ctstanmodel = ctm,chains=6,iter=300)

cat(ctf$stanmodeltext) #to view the stan model -- lots of unnecessary stuff for this problem though!

#12

Ask @syclik—somehow he or someone else turned on Stan highlighting. So I imagine if there’s a Stan plugin, there’s an R plugin!


#13

Here’s my solution, with some code.

set.seed(123)
delta <- 0.5
sigma_u <- 0.5
sigma_e <- 1
I <- 100
T <- 30
u = rnorm(I, 0, sigma_u)
initial_values <- data_frame(
  individual = 1:I,
  u = u,
  initial_y = u + sigma_e / 
                  sqrt(1 - delta^2) * rnorm(I) # p(y_1 | u, sigma_e, delta)
)


simulated_panel <- initial_values %>%
  group_by(individual) %>% 
  do({
    simulate <- function(x) {
      out <- data.frame(y = rep(NA, T),
                        time = rep(NA, T),
                        u = x$u)
      for(t in 1:T) {
        out$time[t] <- t
        if(t == 1) {
          out$y[t] <- x$initial_y
        } else {
          out$y[t] <- rnorm(1, delta * (out$y[t-1] - x$u) + x$u, sigma_e)
        }
      }
      return(out)
    }
    as.data.frame(simulate(.))
  }) %>% 
  group_by(individual) %>% 
  mutate(lagged_y = lag(y)) %>% ungroup()

y <- matrix(0, T, I)
y_lag <- matrix(0, T, I)
for (i in 1:I) {
  y[,i] <- filter(simulated_panel, individual == i)$y
  y_lag[2:T,i] <- filter(simulated_panel, individual == i)$lagged_y[2:T]
}

stan_dat <- list(I = I, T = T, y = y, y_lag = y_lag)
library(rstan)
mod <- stan_model('dynamic-panel.stan')
fit <- sampling(mod, data = stan_dat, cores = 4, chains = 4, iter = 2000)

Here’s dynamic-panel.stan:

data {
  int<lower=1> I;
  int<lower=1> T;
  matrix[T,I] y;
  matrix[T,I] y_lag;
}
parameters {
  real<lower=0,upper=1> delta_raw;
  real<lower=0> sigma_u;
  real<lower=0> sigma_e;
  vector[I] u_raw;
}
transformed parameters {
  vector[I] u = sigma_u * u_raw;
  real delta = delta_raw * 2 - 1;
}
model {
  real one_minus_delta_sq = (1 - delta) * (1 + delta);
  vector[I] u_delta = u * (1 - delta);
  for (i in 1:I) {
    target += normal_lpdf(y[1,i] | u[i], 
                          sigma_e / sqrt(one_minus_delta_sq));
    target += normal_lpdf(y[2:T,i] | u_delta[i]
                                     + delta * y_lag[2:T,i], sigma_e);
  }
  u_raw ~ normal(0, 1);
  sigma_u ~ normal(0, 2);
  sigma_e ~ normal(0, 1);
  delta_raw ~ beta(4, 4);
}

Summary of the model run over the simulated data from above:

> print(fit , pars = c('delta','sigma_u','sigma_e'))
Inference for Stan model: dynamic-panel.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

        mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
delta   0.50    0.00 0.04 0.42 0.47 0.50 0.52  0.57   831 1.01
sigma_u 0.36    0.01 0.13 0.05 0.29 0.37 0.45  0.58   430 1.01
sigma_e 1.01    0.00 0.02 0.96 0.99 1.01 1.03  1.06  4000 1.00

Samples were drawn using NUTS(diag_e) at Fri Sep  7 08:03:36 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

When I boost the number of observations per individual to 50 the inference gets sharper for sigma_u and approaches the true value.

Inference for Stan model: dynamic-panel.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

        mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
delta   0.48       0 0.01 0.46 0.47 0.48 0.49  0.51  4000    1
sigma_u 0.46       0 0.04 0.38 0.43 0.46 0.49  0.55  1400    1
sigma_e 0.99       0 0.01 0.98 0.99 0.99 1.00  1.01  4000    1

Samples were drawn using NUTS(diag_e) at Fri Sep  7 08:17:52 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).