Dynamic panel data models with Stan?

c5v · January 31, 2019, 7:27pm

Thank you so much for your explanation! I am going to try this approach.

rtrangucci · January 31, 2019, 7:28pm

Great! Happy to help. Let me know how it goes.

c5v · February 1, 2019, 12:24pm

@rtrangucci Sorry for disturbing you again. But I have been trying to get your approach working in my multilevel context.

My stan code now looks like the following:

data {
  int<lower=1> I;
  int<lower=1> T;
  matrix[T,I] y;
  matrix[T,I] y_lag;
  
  // Define variables in data test
  // Number of level-1 observations (an integer)
  int<lower=0> Ni;
  // Number of level-2 clusters
  int<lower=0> Nj;
  // Number of level-3 clusters
  int<lower=0> Nk;
  // Numer of models x countries
  int<lower = 0> Njk;
  // Number of fixed effect parameters
  int<lower=0> p;
  // Number of random effect parameters
  //int<lower=0> q;
  int<lower=0> Npars;
  
  // Variables with fixed coefficient
  matrix[Ni,p] fixedEffectVars;
  
  // Cluster IDs
  int<lower=1> modelid[Ni];
  int<lower=1> countryid[Ni];
  
  // Level 3 look up vector for level 2
  int<lower=1> countryLookup[Npars];
  int<lower=1> countryMonthLookup[Njk];
}
parameters {
  real<lower=0,upper=1> delta_raw;
  //real<lower=0> sigma_u;
  real<lower=0> sigma_e;
  //vector[I] u_raw;
  
    // Define parameters to estimate
  // Population intercept (a real number)
  real beta_0;
  
  // Fixed effects
  vector[p] beta;
  
  
  // Level-2 random effect
  real u_0jk[Npars];
  real<lower=0> sigma_u0jk;
  
  // Level-3 random effect
  real u_0k[Nk];
  real<lower=0> sigma_u0k;
}
transformed parameters {
  //vector[I] u = sigma_u * u_raw;
  real delta = delta_raw * 2 - 1;
  
    // Varying intercepts
  real beta_0jk[Npars];
  real beta_0k[Nk];
  
  // Individual mean
  real mu[Ni];

  // Varying intercepts definition
  // Level-3 (10 level-3 random intercepts)
  for (k in 1:Nk) {
    beta_0k[k] = beta_0 + u_0k[k];
  }
  // Level-2 (100 level-2 random intercepts)
  for (j in 1:Npars) {
    beta_0jk[j] = beta_0k[countryLookup[j]] + u_0jk[j];
  }
  
    //Individual mean
  for (i in 1:Ni) {
    mu[i] = beta_0jk[countryMonthLookup[i]] + fixedEffectVars[i,]*beta;
  }
}
model {
  real one_minus_delta_sq = (1 - delta) * (1 + delta);
  int count = 1;
  //vector[I] u_delta = u * (1 - delta);
  for (i in 1:I) {
    // target += normal_lpdf(y[1,i] | u[i],
    //                       sigma_e / sqrt(one_minus_delta_sq));
    // target += normal_lpdf(y[2:T,i] | u_delta[i]
    //                       + delta * y_lag[2:T,i], sigma_e);
    for(j in 1:T){
      if(j == 1){
        y[j,i] ~ normal(mu[count], sigma_e / sqrt(one_minus_delta_sq));
      }
      else{
        y[j,i] ~ normal(mu[count] + delta * y_lag[j,i] - delta*mu[count], sigma_e);
      }
      count += 1;
    }
  }
  //u_raw ~ normal(0, 1);
  //sigma_u ~ normal(0, 2);
  sigma_e ~ normal(0, 1);
  delta_raw ~ beta(4, 4);
  
  // Random effects distribution
  u_0k  ~ normal(0, sigma_u0k);
  u_0jk ~ normal(0, sigma_u0jk);
  beta[1] ~ normal(-0.25, 1);
}
generated quantities{
    matrix[T,I] y_rep;
    int count_rep = 1;
    real one_minus_delta_sq = (1 - delta) * (1 + delta);
    //vector[I] u_delta = u * (1 - delta);

    for (i in 1:I) {
      for(j in 1:T){
        if(j == 1){
          y_rep[j,i] = normal_rng(mu[count_rep], sigma_e / sqrt(one_minus_delta_sq));
        }
        else{
          y_rep[j,i] = normal_rng(mu[count_rep] + delta * y_lag[j,i] - delta*mu[count_rep], sigma_e);
        }
        count_rep += 1;
      }
    }
}

Here mu[count] denotes the \mu_{ij} from my previous post.

When I estimate this model I get the warning that all my transitions after warm-up are divergent. However, all my parameters seem to have a reasonable value and a Rhat of 1, except from the delta parameter which is 0.00. This is very strange to me and I don’t really know what is going wrong here. Do you have any idea?

rtrangucci · February 1, 2019, 4:44pm

This is a pretty complex model, so it’s hard for me to see what the problem is. Does the model fit OK without the lagged term? Have you done fake data checking to make sure you can pull out known parameters in a model that doesn’t include the lagged term? If yes, can you fit fake data that includes the AR1 term?

c5v · February 1, 2019, 7:19pm

Yes, without the lagged term the model works perfectly fine and predictive accuracy of the model is already pretty good. So the error only occurs when I add the lagged term in:

if(j == 1){
          y_rep[j,i] = normal_rng(mu[count_rep], sigma_e / sqrt(one_minus_delta_sq));
        }
        else{
          y_rep[j,i] = normal_rng(mu[count_rep] + delta * y_lag[j,i] - delta*mu[count_rep], sigma_e);
}

The parameter estimates are still fine after this model and predictive accuracy is basically the same as without the lagged term. The weird thing is that the delta parameter equals 0 and I get all those divergent transitions.

rtrangucci · February 1, 2019, 7:47pm

Have you tried non-centering your random effects u_0k and u_0jk?

Edit: noncentering would involve declaring a new parameter, u_0k_raw, e.g. and declaring u_0k as a transformed parameter like so:

parameters {
  ...
  vector[Nk] u_0k = sigma_u0k * u_0k_raw;
  ...
}
model {
...
u_0k_raw ~ normal(0, 1);
...
}

c5v · February 1, 2019, 10:04pm

Unfortunately, this gives me again only divergent transitions after warm-up. Next to that, the delta parameter stays 0

rtrangucci · February 1, 2019, 11:02pm

Did you generate fake data from the AR1 model and fit the fake data, or is this fitted to real data? If not, I’d say you should generate fake data from the AR1 model and make sure you can pull out the right parameter values from the fake data before fitting it to the real data.

c5v · February 3, 2019, 4:47pm

@rtrangucci sorry for my late reply, but I have been trying to get the model working for a fake simulated dataset.

I simulated data according to y_{it} = (\alpha + u_{i}) + x_{i}\beta +\delta y_{i,t-1} +\epsilon_{it} using this code:

library(dplyr)
set.seed(123)
delta <- 0.8
sigma_u <- 0.5
sigma_e <- 1
I <- 100
T <- 30
u = rnorm(I, 0, sigma_u)
beta = 10
X_init = rnorm(I, 0, 1)#matrix(rnorm(I*T), nrow  = T, ncol = I)
Xbeta_init = as.matrix(X_init)%*%beta
initial_values <- data_frame(
  individual = 1:I,
  u = u,
  Xbeta = as.vector(Xbeta_init),
  initial_y = u + as.vector(Xbeta_init) + sigma_e / 
    sqrt(1 - delta^2) * rnorm(I) # p(y_1 | u, sigma_e, delta)
)


simulated_panel <- initial_values %>%
  group_by(individual) %>% 
  do({
    simulate <- function(x) {
      out <- data.frame(y = rep(NA, T),
                        time = rep(NA, T),
                        u = x$u)
      for(t in 1:T) {
        out$time[t] <- t
        if(t == 1) {
          out$y[t] <- x$initial_y
        } else {
          out$y[t] <- rnorm(1, delta * (out$y[t-1] - x$u - x$Xbeta) + x$u + x$Xbeta, sigma_e)
        }
      }
      return(out)
    }
    as.data.frame(simulate(.))
  }) %>% 
  group_by(individual) %>% 
  mutate(lagged_y = lag(y)) %>% ungroup()

y <- matrix(0, T, I)
y_lag <- matrix(0, T, I)
for (i in 1:I) {
  y[,i] <- filter(simulated_panel, individual == i)$y
  y_lag[2:T,i] <- filter(simulated_panel, individual == i)$lagged_y[2:T]
}

Afterwards, I estimated my model which looks like this in stan:

data {
  int<lower=1> I;
  int<lower=1> T;
  matrix[T,I] y;
  matrix[T,I] y_lag;
  
  int<lower=0> Ni;
  
  // Variables with fixed coefficient
  matrix[Ni,p] fixedEffectVars;
}
parameters {
  real<lower=0,upper=1> delta_raw;
  real<lower=0> sigma_u;
  real<lower=0> sigma_e;
  vector[I] u_raw;
  
  // Fixed effects
  vector[p] beta;
}
transformed parameters {
  vector[I] u = sigma_u * u_raw;
  real delta = delta_raw * 2 - 1;
 
  //Individual mean
  for (i in 1:Ni) {
    mu[i] = fixedEffectVars[i,]*beta;// + beta_0jk[countryMonthLookup[i]];
  }
}
model {
  real one_minus_delta_sq = (1 - delta) * (1 + delta);
 
  for (i in 1:I) {
    for(j in 1:T){
      if(j == 1){
        y[j,i] ~ normal(u[i] + mu[i], sigma_e / sqrt(one_minus_delta_sq));
      }
      else{
        y[j,i] ~ normal(u[i] + mu[i] + delta * y_lag[j,i] - delta*(u[i]+mu[i]), sigma_e);
      }
    }
  }
  
  u_raw ~ normal(0, 1);
  sigma_u ~ normal(0, 2);
  sigma_e ~ normal(0, 1);
  delta_raw ~ beta(4, 4);
  
}
generated quantities{
  matrix[T,I] y_rep;
  real one_minus_delta_sq = (1 - delta) * (1 + delta);
  
  for (i in 1:I) {
    for(j in 1:T){
      if(j == 1){
        y_rep[j,i] = normal_rng(u[i] + mu[i], sigma_e / sqrt(one_minus_delta_sq));
      }
      else{
        y_rep[j,i] = normal_rng(u[i] + mu[i] + delta * y_lag[j,i] - delta*(u[i]+mu[i]), sigma_e);
      }
    }
  }
}

Using

stan_dat <- list(I = I, T = T, y = y, y_lag = y_lag, Ni = 100, p = 1, fixedEffectVars = Xbeta_init)
library(rstan)

fit <- stan(file = "dynamic_test.stan",
            data = stan_dat, iter = 5000, warmup = 2500, chains = 4)

In this case the delta, sigma_u and sigma_e parameters are estimated correctly. However, the estimated betaparameter equals 1, while it is equal to 10. Also, when changing the beta parameter to a different value in the DGP, the estimated value still equals about 1. Next to that I get the warning message that there were 2500 transitions after warmup that exceeded the maximum treedepth. Do you have any idea why this is the case and how to obtain the correct beta estimates?

rtrangucci · February 4, 2019, 6:21pm

Sorry, I’m having trouble following your logic in the Stan program with Ni, but I put together an example with fixed effects.

library(dplyr)
library(rstan)
set.seed(123)
delta <- 0.8
sigma_u <- 0.5
sigma_e <- 1
I <- 100
T <- 30
u = rnorm(I, 0, sigma_u)
p <- 2
beta = rnorm(p)
X <- lapply(1:I, function(x) matrix(rnorm(p * T), nrow = T, ncol = p))
fixed_effects_by_T <- lapply(X, function(x) x %*% beta)

values <- data_frame(
  u = u,
  initial_y = u + sapply(fixed_effects_by_T,function(x) x[1]) + sigma_e / 
    sqrt(1 - delta^2) * rnorm(I), # p(y_1 | u, sigma_e, delta),
  mu = fixed_effects_by_T
)

dfs <- list()
for (i in 1:I) {
  out <- data.frame(y = rep(NA, T),
                    time = rep(NA, T),
                    u = values$u[i],
                    individual = i)
  for(t in 1:T) {
    out$time[t] <- t
    if(t == 1) {
      out$y[t] <- values$initial_y[i]
    } else {
      out$y[t] <- rnorm(1, delta * (out$y[t-1] - values$u[i] - values$mu[[i]][t]) + values$u[i] + values$mu[[i]][t], sigma_e)
    }
  }
  out <- out %>% mutate(
    lagged_y = lag(y)
  )
  dfs[[i]] <- out
}
simulated_panel <- do.call(rbind,dfs)

y <- matrix(0, T, I)
y_lag <- matrix(0, T, I)
for (i in 1:I) {
  y[,i] <- filter(simulated_panel, individual == i)$y
  y_lag[2:T,i] <- filter(simulated_panel, individual == i)$lagged_y[2:T]
}

mod <- stan_model('dynamic-panel_w_predictors.stan')
stan_data <- list(I = I, T = T, p = p, y = y, y_lag = y_lag, X = X)

fit <- sampling(mod, data = stan_data, iter = 2000, cores = 4, chains = 4, control = list(adapt_delta = 0.9))
library(bayesplot)
mcmc_recover_intervals(as.matrix(fit, pars = 'beta'), true = beta)
mcmc_recover_intervals(as.matrix(fit, pars = 'delta'), true = delta)
mcmc_recover_intervals(as.matrix(fit, pars = 'sigma_u'), true = sigma_u)
mcmc_recover_intervals(as.matrix(fit, pars = 'sigma_e'), true = sigma_e)

data {
  int<lower=1> I;
  int<lower=1> T;
  int<lower=1> p;
  matrix[T,I] y;
  matrix[T,I] y_lag;
  matrix[T,p] X[I];
}
parameters {
  real<lower=0,upper=1> delta_raw;
  real<lower=0> sigma_u;
  real<lower=0> sigma_e;
  vector[I] u_raw;
  
  // Fixed effects
  vector[p] beta;
}
transformed parameters {
  vector[I] u = sigma_u * u_raw;
  real delta = delta_raw * 2 - 1;
  matrix[T, I] mu;
  for (i in 1:I)
    mu[,i] = u[i] + X[i] * beta;
}
model {
  real one_minus_delta_sq = (1 - delta) * (1 + delta);
 
  for (i in 1:I) {
    for(j in 1:T){
      if(j == 1){
        y[j,i] ~ normal(mu[1,i], sigma_e / sqrt(one_minus_delta_sq));
      }
      else{
        y[j,i] ~ normal(mu[j,i] * (1 - delta) + delta * y_lag[j,i], sigma_e);
      }
    }
  }
  
  u_raw ~ normal(0, 1);
  sigma_u ~ normal(0, 2);
  sigma_e ~ normal(0, 1);
  delta_raw ~ beta(4, 4);
  beta ~ normal(0, 5);
}
generated quantities{
  matrix[T,I] y_rep;
  real one_minus_delta_sq = (1 - delta) * (1 + delta);
  
  for (i in 1:I) {
    for(j in 1:T){
      if(j == 1){
        y_rep[j,i] = normal_rng(mu[1,i], sigma_e / sqrt(one_minus_delta_sq));
      }
      else{
        y_rep[j,i] = normal_rng(mu[j,i] * (1 - delta) + delta * y_lag[j,i], sigma_e);
      }
    }
  }
}

The model seems to do OK at pulling out the parameters, though there are divergent transitions if you don’t dial up adapt_delta to 0.9. This example should be extensible to other models with group-level predictors.

c5v · February 4, 2019, 7:23pm

@rtrangucci Thanks you so much! I am going to make this work in my multilevel context. I will let you know if it works!

sidenote: Ni is just the total number of observations

c5v · February 11, 2019, 6:46pm

@rtrangucci I have one final question about your model. I want to make the lagged variable parameter \delta also dependent on i. So the model then becomes y_{it} = (\alpha+u_{i}) + x_{i}\beta + \delta_{i} + \varepsilon_{it}. In my multilevel setting I could again estimate \delta_i using non-centered parameterization and use:

parameters {
  ...
  real delta_raw;
  real u_delta_raw[Nk];
  real<lower=0> sigma_u_delta
  ...
}
transformed parameters{
  real u_delta[N];
  for(i in 1:Nk){//number of individuals i
  u_delta[i] = sigma_u_delta*u_delta_raw[i];
  }

  for(k in 1:Nk){//number of individuals i
   delta[k] = delta_raw + u_delta[k];
  }
}
model {
...
u_delta_raw ~ normal(0, 1);
sigma_u_delta~normal(0,1);
...
}

However, this does not necessarily mean that my \delta_i is in the interval [-1,1]. Do you maybe know how to incorporate this?

Many thanks in advance!

rtrangucci · February 14, 2019, 7:19pm

Hi @c5v,

This is almost all the way there. You can change this line:

  for(k in 1:Nk){//number of individuals i
   delta[k] = delta_raw + u_delta[k];
  }

to

  for(k in 1:Nk){//number of individuals i
   delta[k] = inv_logit(delta_raw + u_delta[k]) * 2 - 1;
  }

which will get you back to the [-1,1] scale.

c5v · February 26, 2019, 7:50pm

@rtrangucci first of all thanks for all the help!

As final (really) question I would like to ask whether you know a paper or some formal description of the model:

I would like to use it in a paper for university but I cannot find any paper which uses this type of model which I can cite.

This one is not really what you are doing

aornugent · March 3, 2019, 9:10pm

I’m digging this approach:

@rtrangucci - do you know of any smart ways to deal with irregularly spaced observations?

Imputing missing data gets me most of the way there, but my estimates of \sigma_u and \sigma_e become non-identifiable as the proportion of missing values increases.

Appreciated,
Andrew

c5v · March 26, 2019, 1:37pm

Just to make sure, this approach works because now the correlation between \mu_{i} and \delta(y_{i,t-1}-\mu_i) equals 0, right? Since E \left[\mu_i (\delta(y_{i,t-1} -\mu_{i})) \right] = \mu_{i}\delta (E\left[ y_{i,t-1} \right] - \mu_i) = 0, using E\left[ y_{i,t-1} \right] = \mu_i

joeHoover · August 17, 2019, 8:33am

Hi @rtrangucci,

Like others, I’ve found this code really useful. So, thank you!

However, I’m having trouble estimating group-specific \delta_i on some simulated data.

I’ve reduced my model to pretty much exactly what you suggested for the random intercept model with group-specific \delta_i. However, the estimated \delta_i seem to be stuck around .08-.09, even when the true mean is .4.

Interestingly, when I remove the random intercept from the model, the range restriction on \delta_i goes away and the estimation is decent.

So, I’m wondering whether I might have implemented something wrong or whether you might have any insight into this behavior.

@c5v, it looks like you worked quite a bit with these models. Were you able to estimate random \delta_i?

If anyone has any insight into this, I’d love to hear it!

I’ve attached the simulated data (231.7 KB) I’m working with as well as the code (798 Bytes) for running and checking the model.

The stan model I’m using is:

data {
  int<lower=1> S; // Number of states
  int<lower=1> T; // Number of time points
  int<lower=1> P; // Number of predictors
  matrix[T,S] y; // Outcome variable
  matrix[T,S] y_lag; // Lagged outcome
  matrix[T,P] X[S]; // List of [T, P] matrices containing intercept and predictors
}
parameters {

  real<lower=0,upper=1> delta_raw; // Raw AR(1) coefficient
  real u_delta_raw[S];
  real<lower=0> sigma_u_delta;
  
  
  vector[S] u_raw;
  real<lower=0> sigma_u;
  
  real<lower=0> sigma_e; // Residual error

  vector[P] beta;
}
transformed parameters {
  
  real delta[S];
  real u_delta[S];
  vector[S] u;
  matrix[T, S] mu;
  
  u = sigma_u * u_raw;

  for(s in 1:S){//number of states s
    u_delta[s] = sigma_u_delta*u_delta_raw[s];
  }
  
  for(s in 1:S){//number of states s
   delta[s] = inv_logit(delta_raw + u_delta[s]) * 2 - 1;
  }
  
  for (s in 1:S)
    //mu[,s] = u[s] + X[s] * beta;
    mu[,s] = u[s] + X[s] * beta;
    
}
model {
  

  for (s in 1:S) {
    for(j in 1:T){
      if(j == 1){
        y[j,s] ~ normal(mu[1,s], sigma_e / sqrt(1 - delta[s]) * (1 + delta[s]));
      }
      else{
        y[j,s] ~ normal(mu[j,s] * (1 - delta[s]) + delta[s] * y_lag[j,s], sigma_e);
      }
    }
  }
  
  sigma_u ~ normal(0,2);
  u_raw ~ normal(0,1);
  sigma_e ~ normal(0, 1);
  beta[1] ~ normal(0, 2);
  
  delta_raw ~ beta(6, 6);
  u_delta_raw ~ normal(0, 1);
  sigma_u_delta ~ normal(0, 2);

}
generated quantities{
  matrix[T,S] y_rep;

  for (s in 1:S) {
    for(j in 1:T){
      if(j == 1){
        y_rep[j,s] = normal_rng(mu[1,s], sigma_e / sqrt(1 - delta[s]) * (1 + delta[s]));
      }
      else{
        y_rep[j,s] = normal_rng(mu[j,s] * (1 - delta[s]) + delta[s] * y_lag[j,s], sigma_e);
      }
    }
  }
}

ignacio · November 15, 2019, 7:25pm

@rtrangucci thanks a lot for your post. I’m having troubles understanding

How do you get that target for t=1 from this equation?

Y_{i,t} = \mu_i + \delta (Y_{i,t-1} - \mu_i) + \epsilon_{i,t}

I understand the target for 2:T but the first one is confusing me.

Max_Mantei · November 15, 2019, 9:57pm

Hi @ignacio!

I’m not the great @rtrangucci, but let me give it a try.

So, unfortunately the notation changed a bit over the course of this thread—but, basically what’s u[i] in the code is \mu_i in equation, right? So far so good.

In the answers above you can find that \mathbb{E}[Y_{i,t}]=\mu_i, so in the code that is the u[i]—the mean of y[1,i].

For the variance we have

\begin{align} \text{Var}[Y_{i,t}]&=\text{Var}[\delta Y_{i,t-1}]+\text{Var}[\epsilon_{i,t}] \\ \text{Var}[Y_{i,t}]&=\delta^2\text{Var}[Y_{i,t-1}] +\sigma^2_\epsilon\\ \text{Var}[Y_{i,t}] - \delta^2\text{Var}[Y_{i,t-1}]&=\sigma^2_\epsilon. \end{align}

Now, I think we need to assume \text{Var}[Y_{i,t}] = \text{Var}[Y_{i,t-1}], which is reasonable (assume iid residuals / a stationary process). Then,

\begin{align} \text{Var}[Y_{i,t}] - \delta^2\text{Var}[Y_{i,t}]&=\sigma^2_\epsilon\\ \text{Var}[Y_{i,t}](1 - \delta^2)&=\sigma^2_\epsilon\\ \text{Var}[Y_{i,t}]&=\frac{\sigma^2_\epsilon}{(1 - \delta^2)}\\ \text{Sd}[Y_{i,t}]&=\frac{\sigma_\epsilon}{\sqrt{1 - \delta^2}}, \end{align}

which is the sigma_e / sqrt(one_minus_delta_sq) part of the code.

I learned this at the Helsinki StanCon Tutorial with Jonah. Have a look—he also discusses a GP formulation of this AR(1) process.

Cheers! :)

ignacio · November 15, 2019, 10:23pm

Thanks a lot @Max_Mantei! Do you know by any chance if there is a video for @jonah’s tutorial?

Topic		Replies	Views
Unable to retrieve parameters from a dynamic model Modeling fitting-issues , specification , ecology	33	1695	March 11, 2020
Time-series in Stan, I am new to Stan and need hints to develop the model. THANKS Modeling rstan , specification	43	2843	June 12, 2020
Model specification for hierarchical AR(1) and Ornstein-Uhlenbeck process Modeling hierarchical-model	14	4821	December 25, 2020
Simple model that converges in JAGS but not in Stan Modeling	18	1623	June 9, 2019
NLME model in STAN for missing + unbalanced longitudinal data Modeling rstan , mixed-model , missing-data	20	472	October 21, 2024

Dynamic panel data models with Stan?

Related topics