Dynamic panel data models with Stan?

James_Savage · August 9, 2018, 2:09pm

It’s well known that in panel data models (that is, models with observations across individuals and time), if you have individual intercepts/slopes, then including a lagged dependent variable on the right hand side is a big no-no (potentially enormous biases).

To see this, imagine you have

y_{it} = \alpha + u_{i} + \delta y_{it-1} + X_{it}\beta + \epsilon_{it}

with u_{i} \sim \text{Normal}(0, \sigma_{u}) being the random effects. By Gauss-Markov, these must be uncorrelated to X_{it} and y_{it-1} in our statistical model. But u_{i} applies to all observations of y_{i}, and so by construction E[u_{i} y_{it}]\neq 0. This typically inflates our estimate of \delta and shrinks our estimate of \beta.

Non-Bayesians use the Arellano-Bond estimator, which is a GMM approach to solving this problem. The idea is to express it in first-differences (getting rid of the random effect) in which case you’re only dealing with the correlation of the change in the error and change in lagged dependent variable. Then use GMM, instrumenting the differenced lagged dependent variable with its own lags.

Does anyone here have experience fitting these models in Stan? Happy to just throw Bayesian IV at it, but the Frequentist counterpart to this is less efficient than Arellano Bond. Ideas?

anon75146577 · August 9, 2018, 3:01pm

Hi Jim,

Can you expand a bit on exactly what the model you want to fit it? Is it the one in the display?

arya · August 9, 2018, 3:16pm

Haven’t done these models personally but I tried a little experiment since it was quick to do, and you were right I can’t recover the parameters.

generate_panel_data <- function(delta, beta, x) {
  
  N <- length(x)
  u <- rnorm(1)
  y <- rep(NA, N)
  y[1] <- u + beta*x[1] + rnorm(1)
  for(n in 2:N) {
    y[n] <- u + delta*y[n-1] + beta*x[n] + rnorm(1)
  }
  
  tibble(y = y, u = u, lag_y = lag(y), x = x)
}

data <- generate_panel_data(1.2, 3.0, rnorm(20))

fit <- stan(file = "panel.stan",
            data = list(N = nrow(data),
                        x_lead = data$x %>% head(nrow(data)-1),
                        y_lead = data$y %>% head(nrow(data)-1),
                        y_lag = data$y %>% tail(nrow(data)-1)),
            chains = 1)

> fit
Inference for Stan model: panel.
1 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=1000.

        mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
u      -0.73    0.02 0.34  -1.38  -0.96  -0.74  -0.51  -0.08   431 1.00
delta   0.83    0.00 0.00   0.82   0.83   0.83   0.83   0.84   470 1.00
beta    0.45    0.01 0.26  -0.03   0.28   0.45   0.62   0.97   577 1.00
lp__  -48.11    0.07 1.30 -51.53 -48.70 -47.72 -47.18 -46.75   340 1.01

data {
  int N;
  vector[N-1] x_lead;
  vector[N-1] y_lead;
  vector[N-1] y_lag;
}
parameters {
  real u;
  real delta;
  real beta;
}
model {
  y_lead ~ normal(u + delta*y_lag + beta*x_lead, 1.0);
}

James_Savage · August 9, 2018, 3:19pm

Yep @anon75146577 here’s some simulated data etc.


library(rstanarm); library(tidyverse)

# Simulate some fake data
delta <- 0.5
sigma_u <- 2
sigma_e <- 1
I <- 100
T <- 10

initial_values <- data_frame(individual = 1:I,
                             initial_y = rnorm(I),
                             u = rnorm(I, 0, sigma_u))


simulated_panel <- initial_values %>%
  group_by(individual) %>% 
  do({
    simulate <- function(x) {
      out <- data.frame(y = rep(NA, T),
                        time = rep(NA, T),
                        u = x$u)
      for(t in 1:T) {
        out$time[t] <- t
        if(t == 1) {
          out$y[t] <- rnorm(1, delta * x$initial_y + x$u, sigma_e)
        } else {
          out$y[t] <- rnorm(1, delta * out$y[t-1] + x$u, sigma_e)
        }
      }
      return(out)
    }
    as.data.frame(simulate(.))
  }) %>% 
  group_by(individual) %>% 
  mutate(lagged_y = lag(y)) %>% 
  filter(!is.na(lagged_y))


cor(simulated_panel$lagged_y, simulated_panel$u)

test_fit <- stan_lmer(y ~ lagged_y + (1 | individual), data = simulated_panel, iter = 1000, cores = 4)

bayesplot::mcmc_dens(as.data.frame(test_fit), pars = "lagged_y") +
  geom_vline(aes(xintercept = delta)) +
  annotate("text", y = 15, x = delta + 0.05, label = "true coef") +
  xlim(0, max(as.data.frame(test_fit)$lagged_y))

arya · August 9, 2018, 3:26pm

Man I wish we had R syntax highlighting on the forums!

James_Savage · August 9, 2018, 4:07pm

Arya – this isn’t repeated observations across multiple individuals it it?

saudiwin · August 9, 2018, 4:17pm

The simulation is recursive but the Stan program isn’t. Is that a problem?

arya · August 10, 2018, 12:22pm

No just a single individual, but without the \alpha term. I figured I’d start simple. The posterior pairs plots don’t actually look bad for that model or show any non-identifiabilities.

James_Savage · August 13, 2018, 8:29pm

Just bumping this in case the right person missed it. Seems too common a problem to not have been discussed on the forum before.

Mauricio_Garnier-Villarre · August 14, 2018, 11:30pm

You should look at the R package ctsem, got continuous time SEM. This allows you to do dynamic systems with either frequentist or Bayesian, Bayesian is done with Stan.

The simplest version of the dinámica system is the panel model.

Charles_Driver · August 15, 2018, 6:12pm

I never took the time to quite wrap my head around the Arellano Bond thing, but the poor performance of the rstanarm spec was eye opening. As Mauricio said, ctsem provides a front end for specifying hierarchical time series / state space models in Stan. I took a quick look and it seems to behave as you want, though performance is much slower than it could be for a perfectly observed first order model like you have (since there is no need to integrate out the latent states).

install.packages("devtools")
library(devtools)
install_github("cdriveraus/ctsem")

library(rstanarm); library(tidyverse); library(ctsem)

# Simulate some fake data
delta <- 0.5
sigma_u <- 2
sigma_e <- 1
I <- 100
T <- 10

initial_values <- data_frame(individual = 1:I,
                             initial_y = rnorm(I),
                             u = rnorm(I, 0, sigma_u))


simulated_panel <- initial_values %>%
  group_by(individual) %>% 
  do({
    simulate <- function(x) {
      out <- data.frame(y = rep(NA, T),
                        time = rep(NA, T),
                        u = x$u)
      for(t in 1:T) {
        out$time[t] <- t
        if(t == 1) {
          out$y[t] <- rnorm(1, delta * x$initial_y + x$u, sigma_e)
        } else {
          out$y[t] <- rnorm(1, delta * out$y[t-1] + x$u, sigma_e)
        }
      }
      return(out)
    }
    as.data.frame(simulate(.))
  }) %>% 
  group_by(individual) %>% 
  mutate(lagged_y = lag(y)) %>% 
  filter(!is.na(lagged_y))


cor(simulated_panel$lagged_y, simulated_panel$u)


#ctsem model
ctm <- ctModel(n.manifest = 1,n.latent = 1,LAMBDA = diag(1), type = 'standt',
  MANIFESTMEANS=matrix(0,1,1),CINT=matrix('u',1,1),
  MANIFESTVAR=diag(1e-6,1),
  T0VAR=matrix('stationary',1,1),
  manifestNames = 'y')

#set all parameters except the initial states and intercept fixed across subjects
ctm$pars$indvarying[!ctm$pars$matrix %in% c('T0MEANS','CINT')] <- FALSE

#adjust id column name
ctm$subjectIDname <- 'individual'

#fit using optimization
ctf <- ctStanFit(as.data.frame(simulated_panel),ctstanmodel = ctm,
  chains=6,optimize=TRUE,verbose=0,iter=300,deoptim = F,isloops = 0,issamples = 100)
summary(ctf)

#fit with sampling
ctf <- ctStanFit(as.data.frame(simulated_panel),ctstanmodel = ctm,chains=6,iter=300)

cat(ctf$stanmodeltext) #to view the stan model -- lots of unnecessary stuff for this problem though!

Bob_Carpenter · August 22, 2018, 12:38pm

Ask @syclik—somehow he or someone else turned on Stan highlighting. So I imagine if there’s a Stan plugin, there’s an R plugin!

rtrangucci · September 7, 2018, 12:19pm

Here’s my solution, with some code.

set.seed(123)
delta <- 0.5
sigma_u <- 0.5
sigma_e <- 1
I <- 100
T <- 30
u = rnorm(I, 0, sigma_u)
initial_values <- data_frame(
  individual = 1:I,
  u = u,
  initial_y = u + sigma_e / 
                  sqrt(1 - delta^2) * rnorm(I) # p(y_1 | u, sigma_e, delta)
)


simulated_panel <- initial_values %>%
  group_by(individual) %>% 
  do({
    simulate <- function(x) {
      out <- data.frame(y = rep(NA, T),
                        time = rep(NA, T),
                        u = x$u)
      for(t in 1:T) {
        out$time[t] <- t
        if(t == 1) {
          out$y[t] <- x$initial_y
        } else {
          out$y[t] <- rnorm(1, delta * (out$y[t-1] - x$u) + x$u, sigma_e)
        }
      }
      return(out)
    }
    as.data.frame(simulate(.))
  }) %>% 
  group_by(individual) %>% 
  mutate(lagged_y = lag(y)) %>% ungroup()

y <- matrix(0, T, I)
y_lag <- matrix(0, T, I)
for (i in 1:I) {
  y[,i] <- filter(simulated_panel, individual == i)$y
  y_lag[2:T,i] <- filter(simulated_panel, individual == i)$lagged_y[2:T]
}

stan_dat <- list(I = I, T = T, y = y, y_lag = y_lag)
library(rstan)
mod <- stan_model('dynamic-panel.stan')
fit <- sampling(mod, data = stan_dat, cores = 4, chains = 4, iter = 2000)

Here’s dynamic-panel.stan:

data {
  int<lower=1> I;
  int<lower=1> T;
  matrix[T,I] y;
  matrix[T,I] y_lag;
}
parameters {
  real<lower=0,upper=1> delta_raw;
  real<lower=0> sigma_u;
  real<lower=0> sigma_e;
  vector[I] u_raw;
}
transformed parameters {
  vector[I] u = sigma_u * u_raw;
  real delta = delta_raw * 2 - 1;
}
model {
  real one_minus_delta_sq = (1 - delta) * (1 + delta);
  vector[I] u_delta = u * (1 - delta);
  for (i in 1:I) {
    target += normal_lpdf(y[1,i] | u[i], 
                          sigma_e / sqrt(one_minus_delta_sq));
    target += normal_lpdf(y[2:T,i] | u_delta[i]
                                     + delta * y_lag[2:T,i], sigma_e);
  }
  u_raw ~ normal(0, 1);
  sigma_u ~ normal(0, 2);
  sigma_e ~ normal(0, 1);
  delta_raw ~ beta(4, 4);
}

Summary of the model run over the simulated data from above:

> print(fit , pars = c('delta','sigma_u','sigma_e'))
Inference for Stan model: dynamic-panel.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

        mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
delta   0.50    0.00 0.04 0.42 0.47 0.50 0.52  0.57   831 1.01
sigma_u 0.36    0.01 0.13 0.05 0.29 0.37 0.45  0.58   430 1.01
sigma_e 1.01    0.00 0.02 0.96 0.99 1.01 1.03  1.06  4000 1.00

Samples were drawn using NUTS(diag_e) at Fri Sep  7 08:03:36 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

When I boost the number of observations per individual to 50 the inference gets sharper for sigma_u and approaches the true value.

Inference for Stan model: dynamic-panel.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

        mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
delta   0.48       0 0.01 0.46 0.47 0.48 0.49  0.51  4000    1
sigma_u 0.46       0 0.04 0.38 0.43 0.46 0.49  0.55  1400    1
sigma_e 0.99       0 0.01 0.98 0.99 0.99 1.00  1.01  4000    1

Samples were drawn using NUTS(diag_e) at Fri Sep  7 08:17:52 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Saram_Han · January 8, 2019, 6:16pm

Could you refer the paper or model that you based on to write your code? I would like to use your model but I can’t find the dynamic panel data model that you are using in your model.
EDIT:
Just in case if someone is interested in this approach who has no econometric background, I found the following paper useful to understand rtrangucci’s code.
https://www.researchgate.net/profile/Hashem_Pesaran/publication/226673422_Random_Coefficient_Models/links/0fcfd5089366b8067f000000/Random-Coefficient-Models.pdf

c5v · January 30, 2019, 1:41pm

Could you maybe explain why you multiplicate by 1-delta in vector[I] u_delta = u * (1 - delta);?
Why don’t you just use u[i] instead of u_delta[i] in target += normal_lpdf(y[2:T,i] | u_delta[i] + delta * y_lag[2:T,i], sigma_e);?

rtrangucci · January 31, 2019, 6:40pm

Hi @c5v, sorry for the delay. The use of u_delta[i] instead of u[i] is just an algebraic trick to cut down on computation. We could instead write the likelihood like:

target += normal_lpdf(y[2:T,i] | u[i] + delta * (y_lag[2:T,i] - u[i]), sigma_e)

which leads to

target += normal_lpdf(y[2:T,i] | u[i] + delta * y_lag[2:T,i] - delta * u[i], sigma_e)

simplifying to

target += normal_lpdf(y[2:T,i] | u[i] * (1 - delta) + delta * y_lag[2:T,i], sigma_e)

c5v · January 31, 2019, 6:53pm

@rtrangucci Thanks for your reply, this clarifies a lot!

The only thing I don’t really get is why you subtract u[i] in delta*(y_lag[2:T,i] - u[i]) since we are not modelling first differences here, right? Could you maybe explain this to me, this will help me a lot!

rtrangucci · January 31, 2019, 7:10pm

It’s because I want the mean of each observation in group i to be u[i]. The model is

Y_{i,t} = \mu_i + \delta (Y_{i,t-1} - \mu_i) + \epsilon_{i,t}\\ \epsilon_{i,t} \sim \text{Normal}(0,1)

Taking expectations of each side:

\mathbb{E}[Y_{i,t}] = \mu_i + \delta \mathbb{E}[(Y_{i,t-1} - \mu_i)] \\ \mathbb{E}[Y_{i,t}] = \mu_i + \delta (\mathbb{E}[Y_{i,t-1}] - \mu_i) \\ \mathbb{E}[Y_{i,t}] = \mu_i

c5v · January 31, 2019, 7:19pm

@rtrangucci clear explanation thanks!

If I am right, in a multilevel setting (which I am building right now), which then looks like y_{ijt} = \beta_{0ij} + \beta_{1ij}x_{ijt} + \delta y_{ij,t-1} + \epsilon_{ijt}, I could also use your reasoning and model it like:

y_{ijt} = \mu_{ij} + \delta(y_{ij,t-1} - \mu_{ij}) + \epsilon_{ijt}, where \mu_{ij} = \beta_{0ij} + \beta_{1ij}x_{ijt}

and use your approach, right?

Sorry for all my questions, but I am really stuck with this problem.

rtrangucci · January 31, 2019, 7:21pm

No problem! You’re correct, you could model your process like that, and you’d get the nice property that \mathbb{E}[y_{ijt}] = \mu_{ij}

Topic		Replies	Views
Paper: Causal inference with panel data by Pang, Liu, and Xu Modeling techniques	15	2092	July 18, 2023
Dynamic panel model for count data Modeling poisson , panel-data	2	1242	May 31, 2020
Incorporating lagged dependent variable into multilevel model Modeling	2	945	January 24, 2019
Bayesian dynamic ordered probit panel model Algorithms	2	261	April 3, 2024
Hierarchical model for panel data with many time points and assumed autocorrelation Modeling techniques	0	940	July 20, 2019

Dynamic panel data models with Stan?

Related topics