Dynamic panel data models with Stan?


#22

Thank you so much for your explanation! I am going to try this approach.


#23

Great! Happy to help. Let me know how it goes.


#24

@rtrangucci Sorry for disturbing you again. But I have been trying to get your approach working in my multilevel context.

My stan code now looks like the following:

data {
  int<lower=1> I;
  int<lower=1> T;
  matrix[T,I] y;
  matrix[T,I] y_lag;
  
  // Define variables in data test
  // Number of level-1 observations (an integer)
  int<lower=0> Ni;
  // Number of level-2 clusters
  int<lower=0> Nj;
  // Number of level-3 clusters
  int<lower=0> Nk;
  // Numer of models x countries
  int<lower = 0> Njk;
  // Number of fixed effect parameters
  int<lower=0> p;
  // Number of random effect parameters
  //int<lower=0> q;
  int<lower=0> Npars;
  
  // Variables with fixed coefficient
  matrix[Ni,p] fixedEffectVars;
  
  // Cluster IDs
  int<lower=1> modelid[Ni];
  int<lower=1> countryid[Ni];
  
  // Level 3 look up vector for level 2
  int<lower=1> countryLookup[Npars];
  int<lower=1> countryMonthLookup[Njk];
}
parameters {
  real<lower=0,upper=1> delta_raw;
  //real<lower=0> sigma_u;
  real<lower=0> sigma_e;
  //vector[I] u_raw;
  
    // Define parameters to estimate
  // Population intercept (a real number)
  real beta_0;
  
  // Fixed effects
  vector[p] beta;
  
  
  // Level-2 random effect
  real u_0jk[Npars];
  real<lower=0> sigma_u0jk;
  
  // Level-3 random effect
  real u_0k[Nk];
  real<lower=0> sigma_u0k;
}
transformed parameters {
  //vector[I] u = sigma_u * u_raw;
  real delta = delta_raw * 2 - 1;
  
    // Varying intercepts
  real beta_0jk[Npars];
  real beta_0k[Nk];
  
  // Individual mean
  real mu[Ni];

  // Varying intercepts definition
  // Level-3 (10 level-3 random intercepts)
  for (k in 1:Nk) {
    beta_0k[k] = beta_0 + u_0k[k];
  }
  // Level-2 (100 level-2 random intercepts)
  for (j in 1:Npars) {
    beta_0jk[j] = beta_0k[countryLookup[j]] + u_0jk[j];
  }
  
    //Individual mean
  for (i in 1:Ni) {
    mu[i] = beta_0jk[countryMonthLookup[i]] + fixedEffectVars[i,]*beta;
  }
}
model {
  real one_minus_delta_sq = (1 - delta) * (1 + delta);
  int count = 1;
  //vector[I] u_delta = u * (1 - delta);
  for (i in 1:I) {
    // target += normal_lpdf(y[1,i] | u[i],
    //                       sigma_e / sqrt(one_minus_delta_sq));
    // target += normal_lpdf(y[2:T,i] | u_delta[i]
    //                       + delta * y_lag[2:T,i], sigma_e);
    for(j in 1:T){
      if(j == 1){
        y[j,i] ~ normal(mu[count], sigma_e / sqrt(one_minus_delta_sq));
      }
      else{
        y[j,i] ~ normal(mu[count] + delta * y_lag[j,i] - delta*mu[count], sigma_e);
      }
      count += 1;
    }
  }
  //u_raw ~ normal(0, 1);
  //sigma_u ~ normal(0, 2);
  sigma_e ~ normal(0, 1);
  delta_raw ~ beta(4, 4);
  
  // Random effects distribution
  u_0k  ~ normal(0, sigma_u0k);
  u_0jk ~ normal(0, sigma_u0jk);
  beta[1] ~ normal(-0.25, 1);
}
generated quantities{
    matrix[T,I] y_rep;
    int count_rep = 1;
    real one_minus_delta_sq = (1 - delta) * (1 + delta);
    //vector[I] u_delta = u * (1 - delta);

    for (i in 1:I) {
      for(j in 1:T){
        if(j == 1){
          y_rep[j,i] = normal_rng(mu[count_rep], sigma_e / sqrt(one_minus_delta_sq));
        }
        else{
          y_rep[j,i] = normal_rng(mu[count_rep] + delta * y_lag[j,i] - delta*mu[count_rep], sigma_e);
        }
        count_rep += 1;
      }
    }
}

Here mu[count] denotes the \mu_{ij} from my previous post.

When I estimate this model I get the warning that all my transitions after warm-up are divergent. However, all my parameters seem to have a reasonable value and a Rhat of 1, except from the delta parameter which is 0.00. This is very strange to me and I don’t really know what is going wrong here. Do you have any idea?


#25

This is a pretty complex model, so it’s hard for me to see what the problem is. Does the model fit OK without the lagged term? Have you done fake data checking to make sure you can pull out known parameters in a model that doesn’t include the lagged term? If yes, can you fit fake data that includes the AR1 term?


#26

Yes, without the lagged term the model works perfectly fine and predictive accuracy of the model is already pretty good. So the error only occurs when I add the lagged term in:

if(j == 1){
          y_rep[j,i] = normal_rng(mu[count_rep], sigma_e / sqrt(one_minus_delta_sq));
        }
        else{
          y_rep[j,i] = normal_rng(mu[count_rep] + delta * y_lag[j,i] - delta*mu[count_rep], sigma_e);
}

The parameter estimates are still fine after this model and predictive accuracy is basically the same as without the lagged term. The weird thing is that the delta parameter equals 0 and I get all those divergent transitions.


#27

Have you tried non-centering your random effects u_0k and u_0jk?

Edit: noncentering would involve declaring a new parameter, u_0k_raw, e.g. and declaring u_0k as a transformed parameter like so:

parameters {
  ...
  vector[Nk] u_0k = sigma_u0k * u_0k_raw;
  ...
}
model {
...
u_0k_raw ~ normal(0, 1);
...
}


#28

Unfortunately, this gives me again only divergent transitions after warm-up. Next to that, the delta parameter stays 0


#29

Did you generate fake data from the AR1 model and fit the fake data, or is this fitted to real data? If not, I’d say you should generate fake data from the AR1 model and make sure you can pull out the right parameter values from the fake data before fitting it to the real data.


#30

@rtrangucci sorry for my late reply, but I have been trying to get the model working for a fake simulated dataset.

I simulated data according to y_{it} = (\alpha + u_{i}) + x_{i}\beta +\delta y_{i,t-1} +\epsilon_{it} using this code:

library(dplyr)
set.seed(123)
delta <- 0.8
sigma_u <- 0.5
sigma_e <- 1
I <- 100
T <- 30
u = rnorm(I, 0, sigma_u)
beta = 10
X_init = rnorm(I, 0, 1)#matrix(rnorm(I*T), nrow  = T, ncol = I)
Xbeta_init = as.matrix(X_init)%*%beta
initial_values <- data_frame(
  individual = 1:I,
  u = u,
  Xbeta = as.vector(Xbeta_init),
  initial_y = u + as.vector(Xbeta_init) + sigma_e / 
    sqrt(1 - delta^2) * rnorm(I) # p(y_1 | u, sigma_e, delta)
)


simulated_panel <- initial_values %>%
  group_by(individual) %>% 
  do({
    simulate <- function(x) {
      out <- data.frame(y = rep(NA, T),
                        time = rep(NA, T),
                        u = x$u)
      for(t in 1:T) {
        out$time[t] <- t
        if(t == 1) {
          out$y[t] <- x$initial_y
        } else {
          out$y[t] <- rnorm(1, delta * (out$y[t-1] - x$u - x$Xbeta) + x$u + x$Xbeta, sigma_e)
        }
      }
      return(out)
    }
    as.data.frame(simulate(.))
  }) %>% 
  group_by(individual) %>% 
  mutate(lagged_y = lag(y)) %>% ungroup()

y <- matrix(0, T, I)
y_lag <- matrix(0, T, I)
for (i in 1:I) {
  y[,i] <- filter(simulated_panel, individual == i)$y
  y_lag[2:T,i] <- filter(simulated_panel, individual == i)$lagged_y[2:T]
}

Afterwards, I estimated my model which looks like this in stan:

data {
  int<lower=1> I;
  int<lower=1> T;
  matrix[T,I] y;
  matrix[T,I] y_lag;
  
  int<lower=0> Ni;
  
  // Variables with fixed coefficient
  matrix[Ni,p] fixedEffectVars;
}
parameters {
  real<lower=0,upper=1> delta_raw;
  real<lower=0> sigma_u;
  real<lower=0> sigma_e;
  vector[I] u_raw;
  
  // Fixed effects
  vector[p] beta;
}
transformed parameters {
  vector[I] u = sigma_u * u_raw;
  real delta = delta_raw * 2 - 1;
 
  //Individual mean
  for (i in 1:Ni) {
    mu[i] = fixedEffectVars[i,]*beta;// + beta_0jk[countryMonthLookup[i]];
  }
}
model {
  real one_minus_delta_sq = (1 - delta) * (1 + delta);
 
  for (i in 1:I) {
    for(j in 1:T){
      if(j == 1){
        y[j,i] ~ normal(u[i] + mu[i], sigma_e / sqrt(one_minus_delta_sq));
      }
      else{
        y[j,i] ~ normal(u[i] + mu[i] + delta * y_lag[j,i] - delta*(u[i]+mu[i]), sigma_e);
      }
    }
  }
  
  u_raw ~ normal(0, 1);
  sigma_u ~ normal(0, 2);
  sigma_e ~ normal(0, 1);
  delta_raw ~ beta(4, 4);
  
}
generated quantities{
  matrix[T,I] y_rep;
  real one_minus_delta_sq = (1 - delta) * (1 + delta);
  
  for (i in 1:I) {
    for(j in 1:T){
      if(j == 1){
        y_rep[j,i] = normal_rng(u[i] + mu[i], sigma_e / sqrt(one_minus_delta_sq));
      }
      else{
        y_rep[j,i] = normal_rng(u[i] + mu[i] + delta * y_lag[j,i] - delta*(u[i]+mu[i]), sigma_e);
      }
    }
  }
}

Using

stan_dat <- list(I = I, T = T, y = y, y_lag = y_lag, Ni = 100, p = 1, fixedEffectVars = Xbeta_init)
library(rstan)

fit <- stan(file = "dynamic_test.stan",
            data = stan_dat, iter = 5000, warmup = 2500, chains = 4)

In this case the delta, sigma_u and sigma_e parameters are estimated correctly. However, the estimated betaparameter equals 1, while it is equal to 10. Also, when changing the beta parameter to a different value in the DGP, the estimated value still equals about 1. Next to that I get the warning message that there were 2500 transitions after warmup that exceeded the maximum treedepth. Do you have any idea why this is the case and how to obtain the correct beta estimates?


#31

Sorry, I’m having trouble following your logic in the Stan program with Ni, but I put together an example with fixed effects.

library(dplyr)
library(rstan)
set.seed(123)
delta <- 0.8
sigma_u <- 0.5
sigma_e <- 1
I <- 100
T <- 30
u = rnorm(I, 0, sigma_u)
p <- 2
beta = rnorm(p)
X <- lapply(1:I, function(x) matrix(rnorm(p * T), nrow = T, ncol = p))
fixed_effects_by_T <- lapply(X, function(x) x %*% beta)

values <- data_frame(
  u = u,
  initial_y = u + sapply(fixed_effects_by_T,function(x) x[1]) + sigma_e / 
    sqrt(1 - delta^2) * rnorm(I), # p(y_1 | u, sigma_e, delta),
  mu = fixed_effects_by_T
)

dfs <- list()
for (i in 1:I) {
  out <- data.frame(y = rep(NA, T),
                    time = rep(NA, T),
                    u = values$u[i],
                    individual = i)
  for(t in 1:T) {
    out$time[t] <- t
    if(t == 1) {
      out$y[t] <- values$initial_y[i]
    } else {
      out$y[t] <- rnorm(1, delta * (out$y[t-1] - values$u[i] - values$mu[[i]][t]) + values$u[i] + values$mu[[i]][t], sigma_e)
    }
  }
  out <- out %>% mutate(
    lagged_y = lag(y)
  )
  dfs[[i]] <- out
}
simulated_panel <- do.call(rbind,dfs)

y <- matrix(0, T, I)
y_lag <- matrix(0, T, I)
for (i in 1:I) {
  y[,i] <- filter(simulated_panel, individual == i)$y
  y_lag[2:T,i] <- filter(simulated_panel, individual == i)$lagged_y[2:T]
}

mod <- stan_model('dynamic-panel_w_predictors.stan')
stan_data <- list(I = I, T = T, p = p, y = y, y_lag = y_lag, X = X)

fit <- sampling(mod, data = stan_data, iter = 2000, cores = 4, chains = 4, control = list(adapt_delta = 0.9))
library(bayesplot)
mcmc_recover_intervals(as.matrix(fit, pars = 'beta'), true = beta)
mcmc_recover_intervals(as.matrix(fit, pars = 'delta'), true = delta)
mcmc_recover_intervals(as.matrix(fit, pars = 'sigma_u'), true = sigma_u)
mcmc_recover_intervals(as.matrix(fit, pars = 'sigma_e'), true = sigma_e)
data {
  int<lower=1> I;
  int<lower=1> T;
  int<lower=1> p;
  matrix[T,I] y;
  matrix[T,I] y_lag;
  matrix[T,p] X[I];
}
parameters {
  real<lower=0,upper=1> delta_raw;
  real<lower=0> sigma_u;
  real<lower=0> sigma_e;
  vector[I] u_raw;
  
  // Fixed effects
  vector[p] beta;
}
transformed parameters {
  vector[I] u = sigma_u * u_raw;
  real delta = delta_raw * 2 - 1;
  matrix[T, I] mu;
  for (i in 1:I)
    mu[,i] = u[i] + X[i] * beta;
}
model {
  real one_minus_delta_sq = (1 - delta) * (1 + delta);
 
  for (i in 1:I) {
    for(j in 1:T){
      if(j == 1){
        y[j,i] ~ normal(mu[1,i], sigma_e / sqrt(one_minus_delta_sq));
      }
      else{
        y[j,i] ~ normal(mu[j,i] * (1 - delta) + delta * y_lag[j,i], sigma_e);
      }
    }
  }
  
  u_raw ~ normal(0, 1);
  sigma_u ~ normal(0, 2);
  sigma_e ~ normal(0, 1);
  delta_raw ~ beta(4, 4);
  beta ~ normal(0, 5);
}
generated quantities{
  matrix[T,I] y_rep;
  real one_minus_delta_sq = (1 - delta) * (1 + delta);
  
  for (i in 1:I) {
    for(j in 1:T){
      if(j == 1){
        y_rep[j,i] = normal_rng(mu[1,i], sigma_e / sqrt(one_minus_delta_sq));
      }
      else{
        y_rep[j,i] = normal_rng(mu[j,i] * (1 - delta) + delta * y_lag[j,i], sigma_e);
      }
    }
  }
}

The model seems to do OK at pulling out the parameters, though there are divergent transitions if you don’t dial up adapt_delta to 0.9. This example should be extensible to other models with group-level predictors.


#32

@rtrangucci Thanks you so much! I am going to make this work in my multilevel context. I will let you know if it works!

sidenote: Ni is just the total number of observations


#33

@rtrangucci I have one final question about your model. I want to make the lagged variable parameter \delta also dependent on i. So the model then becomes y_{it} = (\alpha+u_{i}) + x_{i}\beta + \delta_{i} + \varepsilon_{it}. In my multilevel setting I could again estimate \delta_i using non-centered parameterization and use:

parameters {
  ...
  real delta_raw;
  real u_delta_raw[Nk];
  real<lower=0> sigma_u_delta
  ...
}
transformed parameters{
  real u_delta[N];
  for(i in 1:Nk){//number of individuals i
  u_delta[i] = sigma_u_delta*u_delta_raw[i];
  }

  for(k in 1:Nk){//number of individuals i
   delta[k] = delta_raw + u_delta[k];
  }
}
model {
...
u_delta_raw ~ normal(0, 1);
sigma_u_delta~normal(0,1);
...
}

However, this does not necessarily mean that my \delta_i is in the interval [-1,1]. Do you maybe know how to incorporate this?

Many thanks in advance!


#34

Hi @c5v,

This is almost all the way there. You can change this line:

  for(k in 1:Nk){//number of individuals i
   delta[k] = delta_raw + u_delta[k];
  }

to

  for(k in 1:Nk){//number of individuals i
   delta[k] = inv_logit(delta_raw + u_delta[k]) * 2 - 1;
  }

which will get you back to the [-1,1] scale.