Dynamic panel data models with Stan?

Thank you so much for your explanation! I am going to try this approach.

Great! Happy to help. Let me know how it goes.

@rtrangucci Sorry for disturbing you again. But I have been trying to get your approach working in my multilevel context.

My stan code now looks like the following:

data {
  int<lower=1> I;
  int<lower=1> T;
  matrix[T,I] y;
  matrix[T,I] y_lag;
  
  // Define variables in data test
  // Number of level-1 observations (an integer)
  int<lower=0> Ni;
  // Number of level-2 clusters
  int<lower=0> Nj;
  // Number of level-3 clusters
  int<lower=0> Nk;
  // Numer of models x countries
  int<lower = 0> Njk;
  // Number of fixed effect parameters
  int<lower=0> p;
  // Number of random effect parameters
  //int<lower=0> q;
  int<lower=0> Npars;
  
  // Variables with fixed coefficient
  matrix[Ni,p] fixedEffectVars;
  
  // Cluster IDs
  int<lower=1> modelid[Ni];
  int<lower=1> countryid[Ni];
  
  // Level 3 look up vector for level 2
  int<lower=1> countryLookup[Npars];
  int<lower=1> countryMonthLookup[Njk];
}
parameters {
  real<lower=0,upper=1> delta_raw;
  //real<lower=0> sigma_u;
  real<lower=0> sigma_e;
  //vector[I] u_raw;
  
    // Define parameters to estimate
  // Population intercept (a real number)
  real beta_0;
  
  // Fixed effects
  vector[p] beta;
  
  
  // Level-2 random effect
  real u_0jk[Npars];
  real<lower=0> sigma_u0jk;
  
  // Level-3 random effect
  real u_0k[Nk];
  real<lower=0> sigma_u0k;
}
transformed parameters {
  //vector[I] u = sigma_u * u_raw;
  real delta = delta_raw * 2 - 1;
  
    // Varying intercepts
  real beta_0jk[Npars];
  real beta_0k[Nk];
  
  // Individual mean
  real mu[Ni];

  // Varying intercepts definition
  // Level-3 (10 level-3 random intercepts)
  for (k in 1:Nk) {
    beta_0k[k] = beta_0 + u_0k[k];
  }
  // Level-2 (100 level-2 random intercepts)
  for (j in 1:Npars) {
    beta_0jk[j] = beta_0k[countryLookup[j]] + u_0jk[j];
  }
  
    //Individual mean
  for (i in 1:Ni) {
    mu[i] = beta_0jk[countryMonthLookup[i]] + fixedEffectVars[i,]*beta;
  }
}
model {
  real one_minus_delta_sq = (1 - delta) * (1 + delta);
  int count = 1;
  //vector[I] u_delta = u * (1 - delta);
  for (i in 1:I) {
    // target += normal_lpdf(y[1,i] | u[i],
    //                       sigma_e / sqrt(one_minus_delta_sq));
    // target += normal_lpdf(y[2:T,i] | u_delta[i]
    //                       + delta * y_lag[2:T,i], sigma_e);
    for(j in 1:T){
      if(j == 1){
        y[j,i] ~ normal(mu[count], sigma_e / sqrt(one_minus_delta_sq));
      }
      else{
        y[j,i] ~ normal(mu[count] + delta * y_lag[j,i] - delta*mu[count], sigma_e);
      }
      count += 1;
    }
  }
  //u_raw ~ normal(0, 1);
  //sigma_u ~ normal(0, 2);
  sigma_e ~ normal(0, 1);
  delta_raw ~ beta(4, 4);
  
  // Random effects distribution
  u_0k  ~ normal(0, sigma_u0k);
  u_0jk ~ normal(0, sigma_u0jk);
  beta[1] ~ normal(-0.25, 1);
}
generated quantities{
    matrix[T,I] y_rep;
    int count_rep = 1;
    real one_minus_delta_sq = (1 - delta) * (1 + delta);
    //vector[I] u_delta = u * (1 - delta);

    for (i in 1:I) {
      for(j in 1:T){
        if(j == 1){
          y_rep[j,i] = normal_rng(mu[count_rep], sigma_e / sqrt(one_minus_delta_sq));
        }
        else{
          y_rep[j,i] = normal_rng(mu[count_rep] + delta * y_lag[j,i] - delta*mu[count_rep], sigma_e);
        }
        count_rep += 1;
      }
    }
}

Here mu[count] denotes the \mu_{ij} from my previous post.

When I estimate this model I get the warning that all my transitions after warm-up are divergent. However, all my parameters seem to have a reasonable value and a Rhat of 1, except from the delta parameter which is 0.00. This is very strange to me and I don’t really know what is going wrong here. Do you have any idea?

This is a pretty complex model, so it’s hard for me to see what the problem is. Does the model fit OK without the lagged term? Have you done fake data checking to make sure you can pull out known parameters in a model that doesn’t include the lagged term? If yes, can you fit fake data that includes the AR1 term?

Yes, without the lagged term the model works perfectly fine and predictive accuracy of the model is already pretty good. So the error only occurs when I add the lagged term in:

if(j == 1){
          y_rep[j,i] = normal_rng(mu[count_rep], sigma_e / sqrt(one_minus_delta_sq));
        }
        else{
          y_rep[j,i] = normal_rng(mu[count_rep] + delta * y_lag[j,i] - delta*mu[count_rep], sigma_e);
}

The parameter estimates are still fine after this model and predictive accuracy is basically the same as without the lagged term. The weird thing is that the delta parameter equals 0 and I get all those divergent transitions.

Have you tried non-centering your random effects u_0k and u_0jk?

Edit: noncentering would involve declaring a new parameter, u_0k_raw, e.g. and declaring u_0k as a transformed parameter like so:

parameters {
  ...
  vector[Nk] u_0k = sigma_u0k * u_0k_raw;
  ...
}
model {
...
u_0k_raw ~ normal(0, 1);
...
}

Unfortunately, this gives me again only divergent transitions after warm-up. Next to that, the delta parameter stays 0

Did you generate fake data from the AR1 model and fit the fake data, or is this fitted to real data? If not, I’d say you should generate fake data from the AR1 model and make sure you can pull out the right parameter values from the fake data before fitting it to the real data.

@rtrangucci sorry for my late reply, but I have been trying to get the model working for a fake simulated dataset.

I simulated data according to y_{it} = (\alpha + u_{i}) + x_{i}\beta +\delta y_{i,t-1} +\epsilon_{it} using this code:

library(dplyr)
set.seed(123)
delta <- 0.8
sigma_u <- 0.5
sigma_e <- 1
I <- 100
T <- 30
u = rnorm(I, 0, sigma_u)
beta = 10
X_init = rnorm(I, 0, 1)#matrix(rnorm(I*T), nrow  = T, ncol = I)
Xbeta_init = as.matrix(X_init)%*%beta
initial_values <- data_frame(
  individual = 1:I,
  u = u,
  Xbeta = as.vector(Xbeta_init),
  initial_y = u + as.vector(Xbeta_init) + sigma_e / 
    sqrt(1 - delta^2) * rnorm(I) # p(y_1 | u, sigma_e, delta)
)


simulated_panel <- initial_values %>%
  group_by(individual) %>% 
  do({
    simulate <- function(x) {
      out <- data.frame(y = rep(NA, T),
                        time = rep(NA, T),
                        u = x$u)
      for(t in 1:T) {
        out$time[t] <- t
        if(t == 1) {
          out$y[t] <- x$initial_y
        } else {
          out$y[t] <- rnorm(1, delta * (out$y[t-1] - x$u - x$Xbeta) + x$u + x$Xbeta, sigma_e)
        }
      }
      return(out)
    }
    as.data.frame(simulate(.))
  }) %>% 
  group_by(individual) %>% 
  mutate(lagged_y = lag(y)) %>% ungroup()

y <- matrix(0, T, I)
y_lag <- matrix(0, T, I)
for (i in 1:I) {
  y[,i] <- filter(simulated_panel, individual == i)$y
  y_lag[2:T,i] <- filter(simulated_panel, individual == i)$lagged_y[2:T]
}

Afterwards, I estimated my model which looks like this in stan:

data {
  int<lower=1> I;
  int<lower=1> T;
  matrix[T,I] y;
  matrix[T,I] y_lag;
  
  int<lower=0> Ni;
  
  // Variables with fixed coefficient
  matrix[Ni,p] fixedEffectVars;
}
parameters {
  real<lower=0,upper=1> delta_raw;
  real<lower=0> sigma_u;
  real<lower=0> sigma_e;
  vector[I] u_raw;
  
  // Fixed effects
  vector[p] beta;
}
transformed parameters {
  vector[I] u = sigma_u * u_raw;
  real delta = delta_raw * 2 - 1;
 
  //Individual mean
  for (i in 1:Ni) {
    mu[i] = fixedEffectVars[i,]*beta;// + beta_0jk[countryMonthLookup[i]];
  }
}
model {
  real one_minus_delta_sq = (1 - delta) * (1 + delta);
 
  for (i in 1:I) {
    for(j in 1:T){
      if(j == 1){
        y[j,i] ~ normal(u[i] + mu[i], sigma_e / sqrt(one_minus_delta_sq));
      }
      else{
        y[j,i] ~ normal(u[i] + mu[i] + delta * y_lag[j,i] - delta*(u[i]+mu[i]), sigma_e);
      }
    }
  }
  
  u_raw ~ normal(0, 1);
  sigma_u ~ normal(0, 2);
  sigma_e ~ normal(0, 1);
  delta_raw ~ beta(4, 4);
  
}
generated quantities{
  matrix[T,I] y_rep;
  real one_minus_delta_sq = (1 - delta) * (1 + delta);
  
  for (i in 1:I) {
    for(j in 1:T){
      if(j == 1){
        y_rep[j,i] = normal_rng(u[i] + mu[i], sigma_e / sqrt(one_minus_delta_sq));
      }
      else{
        y_rep[j,i] = normal_rng(u[i] + mu[i] + delta * y_lag[j,i] - delta*(u[i]+mu[i]), sigma_e);
      }
    }
  }
}

Using

stan_dat <- list(I = I, T = T, y = y, y_lag = y_lag, Ni = 100, p = 1, fixedEffectVars = Xbeta_init)
library(rstan)

fit <- stan(file = "dynamic_test.stan",
            data = stan_dat, iter = 5000, warmup = 2500, chains = 4)

In this case the delta, sigma_u and sigma_e parameters are estimated correctly. However, the estimated betaparameter equals 1, while it is equal to 10. Also, when changing the beta parameter to a different value in the DGP, the estimated value still equals about 1. Next to that I get the warning message that there were 2500 transitions after warmup that exceeded the maximum treedepth. Do you have any idea why this is the case and how to obtain the correct beta estimates?

Sorry, I’m having trouble following your logic in the Stan program with Ni, but I put together an example with fixed effects.

library(dplyr)
library(rstan)
set.seed(123)
delta <- 0.8
sigma_u <- 0.5
sigma_e <- 1
I <- 100
T <- 30
u = rnorm(I, 0, sigma_u)
p <- 2
beta = rnorm(p)
X <- lapply(1:I, function(x) matrix(rnorm(p * T), nrow = T, ncol = p))
fixed_effects_by_T <- lapply(X, function(x) x %*% beta)

values <- data_frame(
  u = u,
  initial_y = u + sapply(fixed_effects_by_T,function(x) x[1]) + sigma_e / 
    sqrt(1 - delta^2) * rnorm(I), # p(y_1 | u, sigma_e, delta),
  mu = fixed_effects_by_T
)

dfs <- list()
for (i in 1:I) {
  out <- data.frame(y = rep(NA, T),
                    time = rep(NA, T),
                    u = values$u[i],
                    individual = i)
  for(t in 1:T) {
    out$time[t] <- t
    if(t == 1) {
      out$y[t] <- values$initial_y[i]
    } else {
      out$y[t] <- rnorm(1, delta * (out$y[t-1] - values$u[i] - values$mu[[i]][t]) + values$u[i] + values$mu[[i]][t], sigma_e)
    }
  }
  out <- out %>% mutate(
    lagged_y = lag(y)
  )
  dfs[[i]] <- out
}
simulated_panel <- do.call(rbind,dfs)

y <- matrix(0, T, I)
y_lag <- matrix(0, T, I)
for (i in 1:I) {
  y[,i] <- filter(simulated_panel, individual == i)$y
  y_lag[2:T,i] <- filter(simulated_panel, individual == i)$lagged_y[2:T]
}

mod <- stan_model('dynamic-panel_w_predictors.stan')
stan_data <- list(I = I, T = T, p = p, y = y, y_lag = y_lag, X = X)

fit <- sampling(mod, data = stan_data, iter = 2000, cores = 4, chains = 4, control = list(adapt_delta = 0.9))
library(bayesplot)
mcmc_recover_intervals(as.matrix(fit, pars = 'beta'), true = beta)
mcmc_recover_intervals(as.matrix(fit, pars = 'delta'), true = delta)
mcmc_recover_intervals(as.matrix(fit, pars = 'sigma_u'), true = sigma_u)
mcmc_recover_intervals(as.matrix(fit, pars = 'sigma_e'), true = sigma_e)
data {
  int<lower=1> I;
  int<lower=1> T;
  int<lower=1> p;
  matrix[T,I] y;
  matrix[T,I] y_lag;
  matrix[T,p] X[I];
}
parameters {
  real<lower=0,upper=1> delta_raw;
  real<lower=0> sigma_u;
  real<lower=0> sigma_e;
  vector[I] u_raw;
  
  // Fixed effects
  vector[p] beta;
}
transformed parameters {
  vector[I] u = sigma_u * u_raw;
  real delta = delta_raw * 2 - 1;
  matrix[T, I] mu;
  for (i in 1:I)
    mu[,i] = u[i] + X[i] * beta;
}
model {
  real one_minus_delta_sq = (1 - delta) * (1 + delta);
 
  for (i in 1:I) {
    for(j in 1:T){
      if(j == 1){
        y[j,i] ~ normal(mu[1,i], sigma_e / sqrt(one_minus_delta_sq));
      }
      else{
        y[j,i] ~ normal(mu[j,i] * (1 - delta) + delta * y_lag[j,i], sigma_e);
      }
    }
  }
  
  u_raw ~ normal(0, 1);
  sigma_u ~ normal(0, 2);
  sigma_e ~ normal(0, 1);
  delta_raw ~ beta(4, 4);
  beta ~ normal(0, 5);
}
generated quantities{
  matrix[T,I] y_rep;
  real one_minus_delta_sq = (1 - delta) * (1 + delta);
  
  for (i in 1:I) {
    for(j in 1:T){
      if(j == 1){
        y_rep[j,i] = normal_rng(mu[1,i], sigma_e / sqrt(one_minus_delta_sq));
      }
      else{
        y_rep[j,i] = normal_rng(mu[j,i] * (1 - delta) + delta * y_lag[j,i], sigma_e);
      }
    }
  }
}

The model seems to do OK at pulling out the parameters, though there are divergent transitions if you don’t dial up adapt_delta to 0.9. This example should be extensible to other models with group-level predictors.

1 Like

@rtrangucci Thanks you so much! I am going to make this work in my multilevel context. I will let you know if it works!

sidenote: Ni is just the total number of observations

1 Like

@rtrangucci I have one final question about your model. I want to make the lagged variable parameter \delta also dependent on i. So the model then becomes y_{it} = (\alpha+u_{i}) + x_{i}\beta + \delta_{i} + \varepsilon_{it}. In my multilevel setting I could again estimate \delta_i using non-centered parameterization and use:

parameters {
  ...
  real delta_raw;
  real u_delta_raw[Nk];
  real<lower=0> sigma_u_delta
  ...
}
transformed parameters{
  real u_delta[N];
  for(i in 1:Nk){//number of individuals i
  u_delta[i] = sigma_u_delta*u_delta_raw[i];
  }

  for(k in 1:Nk){//number of individuals i
   delta[k] = delta_raw + u_delta[k];
  }
}
model {
...
u_delta_raw ~ normal(0, 1);
sigma_u_delta~normal(0,1);
...
}

However, this does not necessarily mean that my \delta_i is in the interval [-1,1]. Do you maybe know how to incorporate this?

Many thanks in advance!

Hi @c5v,

This is almost all the way there. You can change this line:

  for(k in 1:Nk){//number of individuals i
   delta[k] = delta_raw + u_delta[k];
  }

to

  for(k in 1:Nk){//number of individuals i
   delta[k] = inv_logit(delta_raw + u_delta[k]) * 2 - 1;
  }

which will get you back to the [-1,1] scale.

1 Like

@rtrangucci first of all thanks for all the help!

As final (really) question I would like to ask whether you know a paper or some formal description of the model:

I would like to use it in a paper for university but I cannot find any paper which uses this type of model which I can cite.

This one is not really what you are doing

I’m digging this approach:

@rtrangucci - do you know of any smart ways to deal with irregularly spaced observations?

Imputing missing data gets me most of the way there, but my estimates of \sigma_u and \sigma_e become non-identifiable as the proportion of missing values increases.

Appreciated,
Andrew

Just to make sure, this approach works because now the correlation between \mu_{i} and \delta(y_{i,t-1}-\mu_i) equals 0, right? Since E \left[\mu_i (\delta(y_{i,t-1} -\mu_{i})) \right] = \mu_{i}\delta (E\left[ y_{i,t-1} \right] - \mu_i) = 0, using E\left[ y_{i,t-1} \right] = \mu_i

1 Like

Hi @rtrangucci,

Like others, I’ve found this code really useful. So, thank you!

However, I’m having trouble estimating group-specific \delta_i on some simulated data.

I’ve reduced my model to pretty much exactly what you suggested for the random intercept model with group-specific \delta_i. However, the estimated \delta_i seem to be stuck around .08-.09, even when the true mean is .4.

Interestingly, when I remove the random intercept from the model, the range restriction on \delta_i goes away and the estimation is decent.

So, I’m wondering whether I might have implemented something wrong or whether you might have any insight into this behavior.

@c5v, it looks like you worked quite a bit with these models. Were you able to estimate random \delta_i?

If anyone has any insight into this, I’d love to hear it!

I’ve attached the simulated data (231.7 KB) I’m working with as well as the code (798 Bytes) for running and checking the model.

The stan model I’m using is:

data {
  int<lower=1> S; // Number of states
  int<lower=1> T; // Number of time points
  int<lower=1> P; // Number of predictors
  matrix[T,S] y; // Outcome variable
  matrix[T,S] y_lag; // Lagged outcome
  matrix[T,P] X[S]; // List of [T, P] matrices containing intercept and predictors
}
parameters {

  real<lower=0,upper=1> delta_raw; // Raw AR(1) coefficient
  real u_delta_raw[S];
  real<lower=0> sigma_u_delta;
  
  
  vector[S] u_raw;
  real<lower=0> sigma_u;
  
  real<lower=0> sigma_e; // Residual error

  vector[P] beta;
}
transformed parameters {
  
  real delta[S];
  real u_delta[S];
  vector[S] u;
  matrix[T, S] mu;
  
  u = sigma_u * u_raw;

  for(s in 1:S){//number of states s
    u_delta[s] = sigma_u_delta*u_delta_raw[s];
  }
  
  for(s in 1:S){//number of states s
   delta[s] = inv_logit(delta_raw + u_delta[s]) * 2 - 1;
  }
  
  for (s in 1:S)
    //mu[,s] = u[s] + X[s] * beta;
    mu[,s] = u[s] + X[s] * beta;
    
}
model {
  

  for (s in 1:S) {
    for(j in 1:T){
      if(j == 1){
        y[j,s] ~ normal(mu[1,s], sigma_e / sqrt(1 - delta[s]) * (1 + delta[s]));
      }
      else{
        y[j,s] ~ normal(mu[j,s] * (1 - delta[s]) + delta[s] * y_lag[j,s], sigma_e);
      }
    }
  }
  
  sigma_u ~ normal(0,2);
  u_raw ~ normal(0,1);
  sigma_e ~ normal(0, 1);
  beta[1] ~ normal(0, 2);
  
  delta_raw ~ beta(6, 6);
  u_delta_raw ~ normal(0, 1);
  sigma_u_delta ~ normal(0, 2);

}
generated quantities{
  matrix[T,S] y_rep;

  for (s in 1:S) {
    for(j in 1:T){
      if(j == 1){
        y_rep[j,s] = normal_rng(mu[1,s], sigma_e / sqrt(1 - delta[s]) * (1 + delta[s]));
      }
      else{
        y_rep[j,s] = normal_rng(mu[j,s] * (1 - delta[s]) + delta[s] * y_lag[j,s], sigma_e);
      }
    }
  }
}

@rtrangucci thanks a lot for your post. I’m having troubles understanding

How do you get that target for t=1 from this equation?

Y_{i,t} = \mu_i + \delta (Y_{i,t-1} - \mu_i) + \epsilon_{i,t}

I understand the target for 2:T but the first one is confusing me.

Hi @ignacio!

I’m not the great @rtrangucci, but let me give it a try.

So, unfortunately the notation changed a bit over the course of this thread—but, basically what’s u[i] in the code is \mu_i in equation, right? So far so good.

In the answers above you can find that \mathbb{E}[Y_{i,t}]=\mu_i, so in the code that is the u[i]—the mean of y[1,i].

For the variance we have

\begin{align} \text{Var}[Y_{i,t}]&=\text{Var}[\delta Y_{i,t-1}]+\text{Var}[\epsilon_{i,t}] \\ \text{Var}[Y_{i,t}]&=\delta^2\text{Var}[Y_{i,t-1}] +\sigma^2_\epsilon\\ \text{Var}[Y_{i,t}] - \delta^2\text{Var}[Y_{i,t-1}]&=\sigma^2_\epsilon. \end{align}

Now, I think we need to assume \text{Var}[Y_{i,t}] = \text{Var}[Y_{i,t-1}], which is reasonable (assume iid residuals / a stationary process). Then,

\begin{align} \text{Var}[Y_{i,t}] - \delta^2\text{Var}[Y_{i,t}]&=\sigma^2_\epsilon\\ \text{Var}[Y_{i,t}](1 - \delta^2)&=\sigma^2_\epsilon\\ \text{Var}[Y_{i,t}]&=\frac{\sigma^2_\epsilon}{(1 - \delta^2)}\\ \text{Sd}[Y_{i,t}]&=\frac{\sigma_\epsilon}{\sqrt{1 - \delta^2}}, \end{align}

which is the sigma_e / sqrt(one_minus_delta_sq) part of the code.

I learned this at the Helsinki StanCon Tutorial with Jonah. Have a look—he also discusses a GP formulation of this AR(1) process.

Cheers! :)

4 Likes

Thanks a lot @Max_Mantei! Do you know by any chance if there is a video for @jonah’s tutorial?