Dynamic panel data models with Stan?

Videos can be found here, I think. But iirc, Jonah didn’t have enough time to cover this part of the tutorial in class. I worked through it later, when he uploaded the full thing. The markdown file is really easy to follow. :)

3 Likes

I’m trying to generate fake data for the multilevel model described by @c5v :

y_{ijt} = \mu_{ij} + \delta (y_{i,j,t-1} - \mu_{ij}) + \epsilon_{ijt}

To keep things simple:

\mu_{ij} = a_i + b_{j[i]}

How should I generate the fake data? This is what I did but I’m not sure if it is okay:

set.seed(123)
delta <- 0.5

sigma_a <- 0.5
sigma_b <- 0.25
sigma_e <- 1

I <- 100   # Number of individuals
T <- 50   # Number of time periods
J <- 10   # Number of groups

fake_data <- tibble(individual = 1:I,
                    group = sample(1:J, I, replace=TRUE))

a <- tibble(individual = 1:I,
            a = rnorm(I, 0, sigma_a))
b <- tibble(group = 1:J,
            b = rnorm(J, 0, sigma_b))

initial_values <- fake_data %>% 
  inner_join(a, by = "individual") %>% 
  inner_join(b, by = "group") %>% 
  mutate(u = a + b,
         initial_y = u + 
                     sigma_e / sqrt(1 - delta^2) * rnorm(I))  # Is this right???

simulated_panel <- initial_values %>%
  group_by(individual, group) %>%  # Is this right???
  do({
    simulate <- function(x) {
      out <- data.frame(y = rep(NA, T),
                        time = rep(NA, T),
                        u = x$u)
      for(t in 1:T) {
        out$time[t] <- t
        if(t == 1) {
          out$y[t] <- x$initial_y
        } else {
          out$y[t] <- rnorm(1, delta * (out$y[t-1] - x$u) + x$u, sigma_e)
        }
      }
      return(out)
    }
    as.data.frame(simulate(.))
  }) %>% 
  group_by(individual, group) %>%  # Is this right???
  mutate(lagged_y = lag(y)) %>% ungroup()

y <- matrix(0, T, I)
y_lag <- matrix(0, T, I)

for (i in 1:I) {
  y[,i] <- filter(simulated_panel, individual == i)$y
  y_lag[2:T,i] <- filter(simulated_panel, individual == i)$lagged_y[2:T]
}

Thanks for the help!

@rtrangucci could you help me understand how the model you described would change if I added linear trend?. What I have in mind is something like this:

Y_{i,t} = \mu_i + \delta(Y_{i,t-1} - \mu_i - \beta (t-1)) + \beta t + \epsilon_{i,t}

Is this the right way of thinking about adding a linear trend or am I missing something?

1 Like

Hi everyone,

I have a question regarding the data generating process used here. I am not an econometrician and doing this stuff mainly for my PhD thesis where we plan to estimate a hierarchical error correction model in Stan. The code provided by @rtrangucci works nice but could anyone explain me, what is the reason behind creating {y_{t > 1}} as:

out$y[t] <- rnorm(1, delta * (out$y[t-1] - values$u[i] - values$mu[[i]][t]) + values$u[i] + values$mu[[i]][t], sigma_e)

instead of

out$y[t] <- rnorm(1, delta * out$y[t-1]  + values$u[i] + values$mu[[i]][t], sigma_e)

for @rtrangucci 's code?

In the beginning of the thread, the second approach was used. But then it was changed to the first one. By using the first alternative, I was not able to get meaningful estimates of the parameters in a group-level setting (i.e. estimating the model e.g. only for individual 1), although the estimates are fine using the code provided here for the panel setting. This is not surprising since I regress {y_{t}} on {y_{t-1}} and not on e.g., {y_{t-1}- \mu_{i}-\beta x_{t}} for the univariate case. If I change the data generating process to out$y[t] <- rnorm(1, delta * out$y[t-1] + values$u[i] + values$mu[[i]][t], sigma_e), so what @James_Savage originally did, the provided code doesn’t discover the true parameters anymore. But in practice we would assume that {y_{t}} is generated as

{y_{t}= \mu_{i}+\beta x_{t}+\delta y_{t-1}+ \epsilon_{t}}

and not

{y_{t}= \mu_{i}+\beta x_{t}+\delta (y_{t-1}- \mu_{i}-\beta x_{t})+ \epsilon_{t}}

or am I totally wrong?

Thanks!

Hi @ignacio, sorry for the delay! Yes, I think you’ve got the right idea. The key is that you want E[Y_{i,t}] = \mu_i + \beta t which you should have here. This is just a special case of the model shown here, which allows for covariates that vary by i,t

2 Likes

Hi @Dirk_B, the reason for generating the observations like

out$y[t] <- rnorm(1, delta * (out$y[t-1] - values$u[i] - values$mu[[i]][t])
                     + values$u[i] + values$mu[[i]][t], sigma_e)

is to ensure that the conditional expectation for Y_{i,t} - X_{i,t} \beta | \mu_i is \mu_i for all t for values of \delta \in (0,1). If you generate data from the latter code:

out$y[t] <- rnorm(1, delta * out$y[t-1] 
                     + values$u[i] + values$mu[[i]][t], sigma_e)

then, as you say, you’re generating observations according to

Y_{i,t} = \mu_i + X_{i,t} \beta + \delta Y_{i,t-1} + \epsilon_{i,t}, so (in all the following E[ \cdot ] is E[ \cdot | \mu_i])

E[Y_{i,t} - X_{i,t} \beta] = E[\mu_i + \delta Y_{i,t-1} - \delta X_{i,t-1} \beta + \delta X_{i,t-1} \beta + \epsilon_{i,t}]

E[Y_{i,t} - X_{i,t} \beta] = \mu_i + \delta (E[Y_{i,t-1} - X_{i,t-1} \beta]) + \delta X_{i,t-1} \beta

given that E[Y_{i,t} - X_{i,t} \beta] = E[Y_{i,t-1} - X_{i,t-1} \beta], then

E[Y_{i,t} - X_{i,t} \beta] - \delta (E[Y_{i,t} - X_{i,t} \beta]) = \mu_i + \delta X_{i,t-1} \beta so

E[Y_{i,t}] = X_{i,t} \beta + \tfrac{\mu_i}{1-\delta} + \tfrac{\delta}{1 - \delta} X_{i,t-1}\beta.

If you fit the model:

Y_{i,t} = \mu_i + \beta x_{i,t} + \delta Y_{i,t-1} + \epsilon_{i,t} but your data are generated according to:

Y_{i,t} = \tfrac{\mu_i}{1-\delta} + X_{i,t} \beta + \tfrac{\delta}{1 - \delta} X_{i,t-1}\beta + \epsilon_{i,t}, you’ll get misfit.

Instead, let’s generate data from:

Y_{i,t} = \mu_i + X_{i,t} \beta + \delta (Y_{i,t-1} - X_{i,t-1} \beta - \mu_i) + \epsilon_{i,t}.

If E[Y_{i,t} - X_{i,t} \beta] = \mu_i for all t, then

E[Y_{i,t} - X_{i,t} \beta] = \mu_i + \delta (E[Y_{i,t-1} - X_{i,t-1} \beta] - \mu_i ) + E[\epsilon_{i,t}].

or

E[Y_{i,t} - X_{i,t} \beta] = \mu_i + \delta (\mu_i - \mu_i ) + 0

and

E[Y_{i,t} - X_{i,t} \beta] = \mu_i

@c5v Apologies about the delay!

Yes, that’s right, though I might think about it using conditional expectation:

E[\mu_i (Y_{i,t-1} - \mu_i)] = E[\mu_i Y_{i,t-1}] - E[\mu_i^2] = E[E[\mu_i Y_{i,t-1} | \mu_i]] - \sigma_u^2 = E[\mu_i E[Y_{i,t-1} | \mu_i]] - \sigma_u^2
so
E[\mu_i \mu_i] - \sigma_u^2 = 0

Thanks @Max_Mantei! That’s spot on.

@aornugent Probably you’ve moved on from this, but if not…

If everything is Gaussian, you should be able to integrate out the missing observations, but if you’ve got covariates associated with the missing values, you’ll probably need to explicitly model those missing covariates, which might get messy.

@joeHoover if you’ve already fixed this, great! If not, it looks like you’ve got a typo for the likelihood of the initial observation:

 y[j,s] ~ normal(mu[1,s], sigma_e / sqrt(1 - delta[s]) * (1 + delta[s]));

should be

 y[j,s] ~ normal(mu[1,s], sigma_e / sqrt((1 - delta[s]) * (1 + delta[s])));

that might be the fix. I’ll see if I can get something running with the full hierarchy over the AR1 parameter \delta_s and post back here.

1 Like

Hi @joeHoover,

I got the following model to run OK:

data {
  int<lower=1> I;
  int<lower=1> TT;
  int<lower=1> p;
  matrix[TT,I] y;
  matrix[TT,I] y_lag;
  matrix[TT,p] X[I];
}
parameters {
  real delta_mean;
  real<lower=0> sigma_delta;
  vector[I] delta_raw;
  real<lower=0> sigma_u;
  real<lower=0> sigma_e;
  vector[I] u_raw;
  
  // Fixed effects
  vector[p] beta;
}
transformed parameters {
  vector[I] u = sigma_u * u_raw;
  vector[I] delta = 2 * inv_logit(delta_mean + sigma_delta * delta_raw) - 1;
  matrix[TT, I] mu;
  for (i in 1:I)
    mu[,i] = X[i] * beta;
}
model {
  vector[I] one_minus_delta_sq = (1 - delta) .* (1 + delta);
  vector[I] u_delta = u .* (1 - delta);
 
  for (i in 1:I) {
    for(t in 1:TT){
      if(t == 1){
        y[t,i] ~ normal(u[i] + mu[1,i], sigma_e / sqrt(one_minus_delta_sq[i]));
      }
      else{
        y[t,i] ~ normal(u_delta[i] + mu[t,i] + delta[i] * (y_lag[t,i] - mu[t-1,i]), sigma_e);
      }
    }
  }
  
  u_raw ~ normal(0, 1);
  sigma_u ~ normal(0, 2);
  sigma_e ~ normal(0, 1);
  sigma_delta ~ normal(0, 1);
  delta_mean ~ normal(0, 2);
  delta_raw ~ normal(0, 1);
  beta ~ normal(0, 5);
}
generated quantities{
  matrix[TT,I] y_rep;
  vector[I] one_minus_delta_sq = (1 - delta) .* (1 + delta);
  
  for (i in 1:I) {
    for(t in 1:TT){
      if(t == 1){
        y_rep[t,i] = normal_rng(u[i] + mu[1,i], sigma_e / sqrt(one_minus_delta_sq[i]));
      }
      else{
        y_rep[t,i] = normal_rng(u[i] * (1 - delta[i]) + mu[t,i] + delta[i] * (y_lag[t,i] - mu[t-1,i]), sigma_e);
      }
    }
  }
}

when using this data:

library(dplyr)
library(rstan)
set.seed(123)
delta_logit_mean <- qlogis((0.8 + 1)/2)
sigma_delta_logit <- 0.5
sigma_u <- 0.5
sigma_e <- 1
I <- 100
T <- 50
u = rnorm(I, 0, sigma_u)
deltas <- plogis(delta_logit_mean + sigma_delta_logit * rnorm(I)) * 2 - 1
p <- 2
beta = rnorm(p)
X <- lapply(1:I, function(x) matrix(rnorm(p * T), nrow = T, ncol = p))
fixed_effects_by_T <- lapply(X, function(x) x %*% beta)

values <- data_frame(
  u = u,
  initial_y = u + sapply(fixed_effects_by_T,function(x) x[1]) + sigma_e / 
    sqrt(1 - deltas^2) * rnorm(I), # p(y_1 | u, sigma_e, delta[i]),
  mu = fixed_effects_by_T
)

dfs <- list()
for (i in 1:I) {
  out <- data.frame(y = rep(NA, T),
                    time = rep(NA, T),
                    u = values$u[i],
                    individual = i)
  for(t in 1:T) {
    out$time[t] <- t
    if(t == 1) {
      out$y[t] <- values$initial_y[i]
    } else {
      out$y[t] <- rnorm(1, deltas[i] * (out$y[t-1] - values$u[i] - values$mu[[i]][t - 1]) + values$u[i] + values$mu[[i]][t], sigma_e)
    }
  }
  out <- out %>% mutate(
    lagged_y = lag(y)
  )
  dfs[[i]] <- out
}
simulated_panel <- do.call(rbind,dfs)

y <- matrix(0, T, I)
y_lag <- matrix(0, T, I)
for (i in 1:I) {
  y[,i] <- filter(simulated_panel, individual == i)$y
  y_lag[2:T,i] <- filter(simulated_panel, individual == i)$lagged_y[2:T]
}

mod <- stan_model('dynamic-panel-hier-ar.stan')
stan_data <- list(I = I, TT = T, p = p, y = y, y_lag = y_lag, X = X)

fit <- sampling(mod, data = stan_data, iter = 2000, cores = 4, chains = 4, control = list(adapt_delta = 0.9))
library(bayesplot)
mcmc_recover_intervals(as.matrix(fit, pars = 'beta'), true = beta)
mcmc_recover_intervals(as.matrix(fit, pars = 'u'), true = u)
mcmc_recover_intervals(as.matrix(fit, pars = 'sigma_u'), true = sigma_u)
mcmc_recover_intervals(as.matrix(fit, pars = 'sigma_e'), true = sigma_e)
mcmc_recover_intervals(as.matrix(fit, pars = 'sigma_delta'), true = sigma_delta_logit)
mcmc_recover_intervals(as.matrix(fit, pars = 'delta_mean'), true = delta_logit_mean)
mcmc_recover_intervals(as.matrix(fit, pars = 'delta'), true = deltas)

Lemme know if this runs for you (if you already figured this out, feel free to ignore!)

1 Like

Hi @rtrangucci,

thanks for your explanation! As I said, I am not an econometrician or statistician and also pretty new to Bayes, so please excuse me if my questions are very nonsense :) I am already ashamed to ask this, but I really try to understand.

I think I can generally follow your explanations, but I do not quite understand why we want E[Y_{it}-X_{it}\beta] = \mu_i and not e.g. E[Y_{it}-X_{it}\beta-\delta Y_{it-1}] = \mu_i which is how we define our model. What happens to \delta Y_{it-1}? Unfortunately it is also still difficult for me to understand the intuition regarding real observation data, which I would like to use the code for. Do I also assume that all real data was generated this way? What confuses me the most is that I can’t reproduce the simulated values for y_{it}. If I get a real time series data set with only one cross-section and my outcome and predictor variables and want to run a regression with lagged DV, I create the lagged DV and run my model Y_{t}=\alpha+X_{t}\beta+\delta Y_{t-1}+\epsilon_{t}. If we assume the error is very small, just calculating \hat\alpha+X_{t} \hat\beta+\hat\delta Y_{t-1} should give me values pretty close to Y_{t}. In my intuition, this should also work for the panel case with the only difference that we have \mu_i instead of \alpha which shouldn’t differ from the previous situation if we look at e.g. I=1 and ignore all the other cases. I know the true value for \mu_1 and I know the previous observation y_{1, t-1}. But if I do this after generating the data according to the code this does not work for me:
I.e., if I want to calculate Y_{i=1,t=3} given \mu_1=3.88, \delta=0.8 and Y_{i=1,t=2}=3.87 with \sigma_e=0.1 this gives me 6.98 which is not even close to the generated Y_{i=1,t=3} which is 3.83. I don’t know if I am a blockhead right now, but this really confuses me.

And my last question: you say that we generate data from Y_{i,t}=\mu_i+X_{it}\beta+\delta (Y_{it-1}-X_{it-1}\beta-\mu_i)+\epsilon_{it}. Wouldn’t this imply that we have to generate the data as:

out$y[t] <- rnorm(1, delta * (out$y[t-1] - values$u[i] - values$mu[[i]][t-1])
                     + values$u[i] + values$mu[[i]][t], sigma_e)

?

Why do we use

out$y[t] <- rnorm(1, delta * (out$y[t-1] - values$u[i] - values$mu[[i]][t])
                     + values$u[i] + values$mu[[i]][t], sigma_e)

here?

Thank you so much!

I’m also interested in dynamic hierarchical models. I’d just like to point out that @rtrangucci 's suggestion is equivalent to estimating an AR(1) model in errors, but this is different than a “dynamic” model where the lagged dependent variable is employed as a predictor.

Specifically, an AR(1) model in errors is given by:

E\left[y_{i,t}\right | \mu_{i,t}] = \mu_{i,t} + \epsilon_{i,t},

where \epsilon_{i,t} = \delta \epsilon_{i,t-1} + \eta_{i,t} and \eta_{i,t} is i.i.d. white noise. This can be rewritten as:

E\left[y_{i,t}\right | \mu_{i,t}] = \mu_{i,t} + \delta (y_{i,t-1} - \mu_{i,t-1}) + \eta_{i,t},

where \eta_{i,t} is i.i.d. white noise. This can be estimated using a non-linear least squares approach or via MLE. In contrast, a dynamic model is given by:

E\left[y_{i,t} | \mu_{i,t}\right] = \phi y_{i,t-1} + \mu_{i,t} + \nu_{i,t},

where \nu_{i,t} = u_{i} + \xi_{i,t}, u_{i} is a random or fixed effect, and \xi_{i,t} is i.i.d. white noise. This type of model assumes there is state-dependence of order 1 (i.e., last period’s level of the response variable affects this period’s level of the response).

Due to the presence of unobserved heterogeneity u_{i}, which is correlated with y_{i,t-1} by construction, estimation of a dynamic multilevel model is not straightforward.

I’m new to the Bayesian world, so I could be wrong, but the endogeneity of the problem precludes consistent and unbiased estimates of the coefficients. Frequentist methods generally get around this problem by first-differencing or applying some other transformation to the estimating equation. But any such transformation requires instrumental variables (i.e., higher order lags of the dependent variable) to be used to purge endogeneity between the transformed lagged dependent variable and the transformed \xi_{i,t}.

For a good primer, please see https://fmwww.bc.edu/EC-C/S2013/823/EC823.S2013.nn05.slides.pdf.

Personally I use PyMC3, but I would love to learn about a way to estimate dynamic models in a Bayesian framework. In economics, instrumental variables are often necessary due to endogeneity of some form, but the cure (instrumental variables) can sometime be worse than the disease, particularly if the instrumental variables are weakly correlated with the endogenous variable(s).

You’re right, my solution is a bit different than the question that spurred the thread. Here’s another solution for a dynamic model that I think gets the modeling right:

data {
  int<lower=1> I;
  int<lower=1> T;
  int<lower=1> p;
  matrix[T,I] y;
  matrix[T,I] y_lag;
  matrix[T,p] X[I];
}
parameters {
  real delta_mean;
  real<lower=0> sigma_delta;
  vector[I] delta_raw;
  real<lower=0> sigma_u;
  real<lower=0> sigma_e;
  vector[I] u_raw;
  
  // Fixed effects
  vector[p] beta;
  real alpha;
}
transformed parameters {
  vector[I] u = sigma_u * u_raw;
  vector[I] delta = 2 * inv_logit(delta_mean + sigma_delta * delta_raw) - 1;
}
model {
  vector[I] one_minus_delta_sq = (1 - delta) .* (1 + delta);
 
  for (i in 1:I) {
    vector[T] mu = X[i] * beta + alpha + u[i];
    for(j in 1:T){
      if(j == 1){
        y[j,i] ~ normal(mu[1] / (1 - delta[i]), sigma_e / sqrt(one_minus_delta_sq[i]));
      }
      else{
        y[j,i] ~ normal(mu[j] + delta[i] * y_lag[j,i], sigma_e);
      }
    }
  }
  
  u_raw ~ std_normal();
  sigma_u ~ normal(0, 2);
  sigma_e ~ std_normal();
  sigma_delta ~ std_normal();
  delta_mean ~ normal(0, 2);
  delta_raw ~ std_normal();
  beta ~ normal(0, 5);
  alpha ~ std_normal();
}

Simulation code:

library(dplyr)
library(rstan)
set.seed(123)
delta_logit_mean <- qlogis((0.8 + 1)/2)
sigma_delta_logit <- 0.5
sigma_u <- 0.5
sigma_e <- 1
I <- 100
T <- 50
u = rnorm(I, 0, sigma_u)
alpha = rnorm(1)
deltas <- plogis(delta_logit_mean + sigma_delta_logit * rnorm(I)) * 2 - 1
p <- 2
beta = rnorm(p)
X <- lapply(1:I, function(x) matrix(rnorm(p * T), nrow = T, ncol = p))
fixed_effects_by_T <- lapply(X, function(x) x %*% beta + alpha)

values <- data_frame(
  u = u,
  initial_y = (u + sapply(fixed_effects_by_T,function(x) x[1]))/(1 - deltas) + sigma_e / 
    sqrt(1 - deltas^2) * rnorm(I), # N(y_1 | u / (1 - delta[i]), sigma_e / (1 - delta[i]^2)),
  mu = fixed_effects_by_T
)

dfs <- list()
for (i in 1:I) {
  out <- data.frame(y = rep(NA, T),
                    time = rep(NA, T),
                    u = values$u[i],
                    individual = i)
  for(t in 1:T) {
    out$time[t] <- t
    if(t == 1) {
      out$y[t] <- values$initial_y[i]
    } else {
      out$y[t] <- rnorm(1, deltas[i] * out$y[t-1] + values$u[i] + values$mu[[i]][t], sigma_e)
    }
  }
  out <- out %>% mutate(
    lagged_y = lag(y)
  )
  dfs[[i]] <- out
}
simulated_panel <- do.call(rbind,dfs)

y <- matrix(0, T, I)
y_lag <- matrix(0, T, I)
for (i in 1:I) {
  y[,i] <- filter(simulated_panel, individual == i)$y
  y_lag[2:T,i] <- filter(simulated_panel, individual == i)$lagged_y[2:T]
}

mod <- stan_model('dynamic-panel-hier-ar-alt.stan')
stan_data <- list(I = I, T = T, p = p, y = y, y_lag = y_lag, X = X)

fit <- sampling(mod, data = stan_data, iter = 2000, cores = 4, chains = 4, control = list(adapt_delta = 0.9))
library(bayesplot)
mcmc_recover_intervals(as.matrix(fit, pars = c('beta','alpha')), true = c(beta,alpha))
mcmc_recover_intervals(as.matrix(fit, pars = 'u'), true = u)
mcmc_recover_intervals(as.matrix(fit, pars = 'sigma_u'), true = sigma_u)
mcmc_recover_intervals(as.matrix(fit, pars = 'sigma_e'), true = sigma_e)
mcmc_recover_intervals(as.matrix(fit, pars = 'sigma_delta'), true = sigma_delta_logit)
mcmc_recover_intervals(as.matrix(fit, pars = 'delta_mean'), true = delta_logit_mean)
mcmc_recover_intervals(as.matrix(fit, pars = 'delta'), true = deltas)

cover <- function(x, l, u) {
  if (x >= l && x <= u)
    return(TRUE)
  else
    return(FALSE)
}

delta_ints <- bayesplot::mcmc_intervals_data(as.matrix(fit, pars = 'delta'))
mean(sapply(1:100,function(i) cover(deltas[i], delta_ints[i,'ll'], delta_ints[i,'hh'])))
mean(sapply(1:100,function(i) cover(deltas[i], delta_ints[i,'l'], delta_ints[i,'h'])))

u_ints <- bayesplot::mcmc_intervals_data(as.matrix(fit, pars = 'u'))
mean(sapply(1:100,function(i) cover(u[i], u_ints[i,'ll'], u_ints[i,'hh'])))
mean(sapply(1:100,function(i) cover(u[i], u_ints[i,'l'], u_ints[i,'h'])))
1 Like

Hi @Dirk_B,

you’re right that there’s a typo in the data generating code! Thanks for pointing that out. It should be:

out$y[t] <- rnorm(1, delta * (out$y[t-1] - values$u[i] - values$mu[[i]][t-1]) + values$u[i] + values$mu[[i]][t], sigma_e)

After going back to check my code, I realized I had another error in my model code! I edited the original post (Dynamic panel data models with Stan?) so that the code is correct. I think it’s too late to edit the original post where I made the first mistake (Dynamic panel data models with Stan?)

As for your first question, I took another crack at putting together a model with a DGP of Y_t = \alpha_i + X_t \beta + \delta_i Y_{t-1} + \epsilon_{i,t}, see the solution Dynamic panel data models with Stan?

1 Like

Thanks @rtrangucci. I’m not confident that this new solution solves the problem, but perhaps I’m missing something. From what I can tell, you’re treating the unobserved fixed/random effect, u_i, as a parameter to be estimated, namely as a group-specific constant. More formally, assuming the model is

y_{it} = \beta_{0,i} + \phi_{i} y_{i, t-1} + \beta_{1,i} x_{it} + u_{i} + \epsilon_{it},

you’re in effect treating the u_{i} as \beta_{0,i}. That is, you’re really estimating

y_{it} = \beta_{0,i} + \phi_{i} y_{i, t-1} + \beta_{1,i} x_{it} + \epsilon_{it},

which assumes no unobserved heterogeneity through u_i. The distinction is that u_{i} is an unobserved set of group-specific traits (e.g., u_i could be brand value, operational efficiency or know-how, and y_{it} is firm i's profitability) correlated with y_{i, t-1}, whereas \beta_{0,i} is a parameter.

Thus, if we wanted to estimate the equation as-is (i.e., without transformation to eliminate the u_{i} and without instrumental variables), I think there needs to be some restriction or assumption that reflects the correlation between u_{i} and y_{i,t-1}.

I’m treating u_i as a random intercept at the group level i, or unobserved heterogeneity at the group level, so wouldn’t that imply a correlation structure between y_{i,t-1} and u_i?

So the full model would be:

\begin{align} y_{i,t} | u_i, y_{i,t-1}, X_i, \sigma, \beta, \phi_i & \sim N(u_i + \phi_i y_{i,t-1} + X_{it} \beta, \sigma^2) \\ u_i | \tau & \sim N(0, \tau^2) \\ \tau & \sim \text{N}^+(0, 2) \\ \sigma & \sim \text{N}^+(0, 1) \end{align}

Then \text{Cov}(y_{i,t-1},u_i) would be:

\begin{align} \text{Cov}(y_{i,t-1}, u_i) & = \text{Cov}(u_i + \phi_i y_{i,t-2} + X_{it} \beta + \epsilon_{i,t},u_i) \\ & = \text{Cov}(u_i,u_i) + \text{Cov}(\phi_i y_{i,t-2},u_i) + \text{Cov}(X_{it} \beta,u_i) + \text{Cov}(\epsilon_{i,t},u_i) \\ & = \tau^2 + \phi_i \text{Cov}(y_{i,t-2},u_i) + 0 + 0 \\ & = \tau^2 + \phi_i \text{Cov}(y_{i,t-1},u_i) \,\,\, \text{By stationarity} \\ \end{align}

giving \text{Cov}(y_{i,t-1},u_i) = \frac{\tau^2}{1 - \phi_i}

But I also might be totally misunderstanding the model, in which case, I apologize!

I agree, I’m certainly imposing assumptions in order to fit this model, though I’m not eliminating the covariance between u_i and y_{i,t-1}. There’s a distributional assumption over the random intercepts, and I’m imposing stationarity as well through the assumption that \phi_i \in (-1, 1) for all i.

Hmm, I need to noodle on this. I really appreciate the thought you’ve put in!

I think what I possibly fail to understand is how a distributional assumption on u_i is any different than placing a distributional assumption on a “standard” intercept (see above where there is \beta_{0,i} and u_i in the model). Is it because we can just define u_i = \beta_{0,i} + v_i, where v_i is the unobserved heterogeneity and \beta_{0,i} is the constant? Maybe I’m overthinking things.

Update: I think what you’re doing is using \alpha in your code as a “global” constant (kind of capturing the global mean across groups of y), but augmenting the intercept by u_i, which captures the unobserved heterogeneity. A distributional assumption is made on \alpha, too, but again we treat this as common to all groups. This is starting to make more sense, but I need to think about it some more. It also occurs to me that if the u_i are fixed rather than random effects, then the u_i will be correlated with a subset, if not all, of the predictors. This requires a different solution, likely in the form of a transformation or IV or both.

Were you able to recover the true parameters in your simulation based on the DGP?

@rtrangucci Thank you so much! I just tried out your code and this is exactly what I was looking for!

I’m really not that deep in the statistical theory but maybe this paper helps you @ddgar and @rtrangucci: Fok et al. (2006), “A Hierarchical Bayes Error Correction Model to Explain Dynamic Effects of Price Changes,” Journal of Marketing Research, 43 (3), 443-46.

In their paper they also try to account for the dynamic panel bias by modeling the first observation just like rtrangucci does. This is described in the appendix A on page 456. My statistical skills are way too bad to really understand if and if yes how rtrangucci’s and Fok et al.'s approach differ. Maybe someone can replicate Fok et al.'s modeling for the first observation and check if this solves or at least reduces the bias problem.

Can we use brms package with panel data? (Non-Gaussian likelihood, if that matters.)

Hi,

Sorry for the up, but this thread is one of the very few dealing with dynamic panel data on the forum and I think my question is too general to deserve its own thread.

During my current search on the very same topic, I found the recent dynamite package, which allows to fit multivariate dynamic panel data with Stan as backend:

I gave it a few tries and the package is super easy to use and seems to provide everything I need and even more.

However, coming from an econometric background, I am usually more convinced by mathematical demonstration rather than simulations and I feel like the documentation and articles lack about mathematical details or discussion about what’s going on and how it circumvents potential issues pinpointed in the original post of this thread.

I therefore have some doubts (I would very gladly let them go away) about the package ability to achieve what it sells and I would like to know if anyone has had any successful experience with this package and/or spent sometimes understanding/validating the Stan code/statistical approach ? I’ve been looking for similar packages and command in other software (e.g., STATA) without finding anything close : this type of model is surely not trivial to fit correctly and it’s an awesome package if it does so.

1 Like

@helske is one of the authors. He’s typically responsive on GitHub, not sure how often he checks the forums.

3 Likes

Hi @Osupplisson, I rarely log in to the forums so I only now noticed thread. Thanks for your interest in dynamite!

In a sense there is nothing special in the DMPM models fitted by dynamite vs other Bayesian (or maximum likelihood based) models regarding the parameter estimation: if the model matches the data generating process then we should get the consistent (not necessarily unbiased) results. @Osupplisson if you can pinpoint something specific regarding DMPMs you would like to know “mathematically” then please let me know. For example, the centering approach suggested also in this thread (i.e. the Mundlak/Chamberlain/Bifumi&Gelman using centering and/or group means as predictors to account for correlations) solution to fixed/random effects arguments) can be used with dynamite similarly as with lmer/brms etc.

Regarding the unbiasedness, it is well know that the maximum likelihood and least squares estimates of the AR coefficients are biased (with and without the intercept term), which naturally extends also to the panel data setting and Bayesian estimation as well (of course priors could change this to some extent). However, the estimates are still consistent (bias decreases as the number of time points increase), so with reasonably large data we (I) don’t usually care too much about small sample bias (compared to many other things which are likely wrong in our data/model in practice…).

In the simple panel AR(1) case with random intercepts, we get pretty much unbiased estimates with n =5 already (although this can depend on the parameters of the model):

library(dynamite)
set.seed(1)
p <- 100
n <- 5

y <- matrix(0, p, n)
x <- matrix(rnorm(p * n), p, n)
u <- rnorm(p)
y[, 1] <- u + rnorm(p)
for(t in 2:n) y[, t] <- rnorm(p, u + 0.8 * y[, t-1] + 2 * x[, t])

d <- data.frame(
  y = c(y), 
  x = c(x), 
  time = rep(1:n, each = p), 
  id = 1:p)
fit <- dynamite(
  obs(y ~ lag(y) + x + random(~1), family = "gaussian"), 
  time = "time", group = "id", data = d, iter = 5000, chains = 4, cores = 4)
fit

Multimodality

Notice that I changed the standard deviation of the random intercepts in the above simulations compared to the some of the earlier simulations in this thread:

The issue with some of the simulations here (e.g post 3 by @James_Savage and post 11 by @Charles_Driver) is that with a small data and large sigma_u, there is sometimes another posterior mode at sigma_u = 0. This can be found with also lme4::lmer by changing the initial values (start=0.01) as well as with nimble:

library(dplyr)
library(cmdstanr)
set.seed(1)
p <- 100
n <- 10

y <- matrix(0, p, n)
u <- rnorm(p, 0, 2)
y[, 1] <- u + rnorm(p)
for(t in 2:n) y[, t] <- rnorm(p, u + 0.5 * y[, t-1], 1)
model <- cmdstan_model("model.stan")
fit <- model$sample(
  data = list(n = n, p = p, y = y),
  init = replicate(
    32, 
    list(alpha=0,u_raw = u/2, sigma_u = 2, sigma_y = 1, beta = 0.5), 
    simplify = FALSE
  ),
  parallel_chains = 1, chains = 32, iter_warmup = 5000, iter = 1000,refresh = 0)
out <- rstan::read_stan_csv(fit$output_files())

pairs(out, pars = c("lp__", "sigma_u", "u[1]", "beta"))
print(out, pars = c("u", "u_raw"), include = FALSE)


image

Stan model:


data {
  int p;
  int n;
  matrix[p, n] y;
}
parameters {
  real<lower=0> sigma_u;
  real<lower=0> sigma_y;
  vector[p] u_raw;
  real beta;
  real alpha;
}
transformed parameters {
  vector[p] u = sigma_u * u_raw;
}
model {
  alpha ~ normal(0, 10);
  sigma_y ~ normal(0, 10);
  sigma_u ~ normal(0, 10);
  beta ~ normal(0, 10);
  
  u_raw ~ std_normal();
  for(i in 2:n) y[, i] ~ normal(alpha + u + beta * y[, i - 1], sigma_y);
}
3 Likes