Dynamic panel data models with Stan?

Here’s my solution, with some code.

set.seed(123)
delta <- 0.5
sigma_u <- 0.5
sigma_e <- 1
I <- 100
T <- 30
u = rnorm(I, 0, sigma_u)
initial_values <- data_frame(
individual = 1:I,
u = u,
initial_y = u + sigma_e /
sqrt(1 - delta^2) * rnorm(I) # p(y_1 | u, sigma_e, delta)
)

simulated_panel <- initial_values %>%
group_by(individual) %>%
do({
simulate <- function(x) {
out <- data.frame(y = rep(NA, T),
time = rep(NA, T),
u = x$u) for(t in 1:T) { out$time[t] <- t
if(t == 1) {
out$y[t] <- x$initial_y
} else {
out$y[t] <- rnorm(1, delta * (out$y[t-1] - x$u) + x$u, sigma_e)
}
}
return(out)
}
as.data.frame(simulate(.))
}) %>%
group_by(individual) %>%
mutate(lagged_y = lag(y)) %>% ungroup()

y <- matrix(0, T, I)
y_lag <- matrix(0, T, I)
for (i in 1:I) {
y[,i] <- filter(simulated_panel, individual == i)$y y_lag[2:T,i] <- filter(simulated_panel, individual == i)$lagged_y[2:T]
}

stan_dat <- list(I = I, T = T, y = y, y_lag = y_lag)
library(rstan)
mod <- stan_model('dynamic-panel.stan')
fit <- sampling(mod, data = stan_dat, cores = 4, chains = 4, iter = 2000)


Here’s dynamic-panel.stan:

data {
int<lower=1> I;
int<lower=1> T;
matrix[T,I] y;
matrix[T,I] y_lag;
}
parameters {
real<lower=0,upper=1> delta_raw;
real<lower=0> sigma_u;
real<lower=0> sigma_e;
vector[I] u_raw;
}
transformed parameters {
vector[I] u = sigma_u * u_raw;
real delta = delta_raw * 2 - 1;
}
model {
real one_minus_delta_sq = (1 - delta) * (1 + delta);
vector[I] u_delta = u * (1 - delta);
for (i in 1:I) {
target += normal_lpdf(y[1,i] | u[i],
sigma_e / sqrt(one_minus_delta_sq));
target += normal_lpdf(y[2:T,i] | u_delta[i]
+ delta * y_lag[2:T,i], sigma_e);
}
u_raw ~ normal(0, 1);
sigma_u ~ normal(0, 2);
sigma_e ~ normal(0, 1);
delta_raw ~ beta(4, 4);
}


Summary of the model run over the simulated data from above:

> print(fit , pars = c('delta','sigma_u','sigma_e'))
Inference for Stan model: dynamic-panel.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.

mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
delta   0.50    0.00 0.04 0.42 0.47 0.50 0.52  0.57   831 1.01
sigma_u 0.36    0.01 0.13 0.05 0.29 0.37 0.45  0.58   430 1.01
sigma_e 1.01    0.00 0.02 0.96 0.99 1.01 1.03  1.06  4000 1.00

Samples were drawn using NUTS(diag_e) at Fri Sep  7 08:03:36 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).


When I boost the number of observations per individual to 50 the inference gets sharper for sigma_u and approaches the true value.

Inference for Stan model: dynamic-panel.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.

mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
delta   0.48       0 0.01 0.46 0.47 0.48 0.49  0.51  4000    1
sigma_u 0.46       0 0.04 0.38 0.43 0.46 0.49  0.55  1400    1
sigma_e 0.99       0 0.01 0.98 0.99 0.99 1.00  1.01  4000    1

Samples were drawn using NUTS(diag_e) at Fri Sep  7 08:17:52 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

1 Like

Could you refer the paper or model that you based on to write your code? I would like to use your model but I can’t find the dynamic panel data model that you are using in your model.
EDIT:
Just in case if someone is interested in this approach who has no econometric background, I found the following paper useful to understand rtrangucci’s code.

Could you maybe explain why you multiplicate by 1-delta in vector[I] u_delta = u * (1 - delta);?
Why don’t you just use u[i] instead of u_delta[i] in target += normal_lpdf(y[2:T,i] | u_delta[i] + delta * y_lag[2:T,i], sigma_e);?

Hi @c5v, sorry for the delay. The use of u_delta[i] instead of u[i] is just an algebraic trick to cut down on computation. We could instead write the likelihood like:

target += normal_lpdf(y[2:T,i] | u[i] + delta * (y_lag[2:T,i] - u[i]), sigma_e)

target += normal_lpdf(y[2:T,i] | u[i] + delta * y_lag[2:T,i] - delta * u[i], sigma_e)

simplifying to

target += normal_lpdf(y[2:T,i] | u[i] * (1 - delta) + delta * y_lag[2:T,i], sigma_e)

2 Likes

The only thing I don’t really get is why you subtract u[i] in delta*(y_lag[2:T,i] - u[i]) since we are not modelling first differences here, right? Could you maybe explain this to me, this will help me a lot!

It’s because I want the mean of each observation in group i to be u[i]. The model is

Y_{i,t} = \mu_i + \delta (Y_{i,t-1} - \mu_i) + \epsilon_{i,t}\\ \epsilon_{i,t} \sim \text{Normal}(0,1)

Taking expectations of each side:

\mathbb{E}[Y_{i,t}] = \mu_i + \delta \mathbb{E}[(Y_{i,t-1} - \mu_i)] \\ \mathbb{E}[Y_{i,t}] = \mu_i + \delta (\mathbb{E}[Y_{i,t-1}] - \mu_i) \\ \mathbb{E}[Y_{i,t}] = \mu_i
1 Like

@rtrangucci clear explanation thanks!

If I am right, in a multilevel setting (which I am building right now), which then looks like y_{ijt} = \beta_{0ij} + \beta_{1ij}x_{ijt} + \delta y_{ij,t-1} + \epsilon_{ijt}, I could also use your reasoning and model it like:

y_{ijt} = \mu_{ij} + \delta(y_{ij,t-1} - \mu_{ij}) + \epsilon_{ijt}, where \mu_{ij} = \beta_{0ij} + \beta_{1ij}x_{ijt}

Sorry for all my questions, but I am really stuck with this problem.

No problem! You’re correct, you could model your process like that, and you’d get the nice property that \mathbb{E}[y_{ijt}] = \mu_{ij}

1 Like

Thank you so much for your explanation! I am going to try this approach.

Great! Happy to help. Let me know how it goes.

@rtrangucci Sorry for disturbing you again. But I have been trying to get your approach working in my multilevel context.

My stan code now looks like the following:

data {
int<lower=1> I;
int<lower=1> T;
matrix[T,I] y;
matrix[T,I] y_lag;

// Define variables in data test
// Number of level-1 observations (an integer)
int<lower=0> Ni;
// Number of level-2 clusters
int<lower=0> Nj;
// Number of level-3 clusters
int<lower=0> Nk;
// Numer of models x countries
int<lower = 0> Njk;
// Number of fixed effect parameters
int<lower=0> p;
// Number of random effect parameters
//int<lower=0> q;
int<lower=0> Npars;

// Variables with fixed coefficient
matrix[Ni,p] fixedEffectVars;

// Cluster IDs
int<lower=1> modelid[Ni];
int<lower=1> countryid[Ni];

// Level 3 look up vector for level 2
int<lower=1> countryLookup[Npars];
int<lower=1> countryMonthLookup[Njk];
}
parameters {
real<lower=0,upper=1> delta_raw;
//real<lower=0> sigma_u;
real<lower=0> sigma_e;
//vector[I] u_raw;

// Define parameters to estimate
// Population intercept (a real number)
real beta_0;

// Fixed effects
vector[p] beta;

// Level-2 random effect
real u_0jk[Npars];
real<lower=0> sigma_u0jk;

// Level-3 random effect
real u_0k[Nk];
real<lower=0> sigma_u0k;
}
transformed parameters {
//vector[I] u = sigma_u * u_raw;
real delta = delta_raw * 2 - 1;

// Varying intercepts
real beta_0jk[Npars];
real beta_0k[Nk];

// Individual mean
real mu[Ni];

// Varying intercepts definition
// Level-3 (10 level-3 random intercepts)
for (k in 1:Nk) {
beta_0k[k] = beta_0 + u_0k[k];
}
// Level-2 (100 level-2 random intercepts)
for (j in 1:Npars) {
beta_0jk[j] = beta_0k[countryLookup[j]] + u_0jk[j];
}

//Individual mean
for (i in 1:Ni) {
mu[i] = beta_0jk[countryMonthLookup[i]] + fixedEffectVars[i,]*beta;
}
}
model {
real one_minus_delta_sq = (1 - delta) * (1 + delta);
int count = 1;
//vector[I] u_delta = u * (1 - delta);
for (i in 1:I) {
// target += normal_lpdf(y[1,i] | u[i],
//                       sigma_e / sqrt(one_minus_delta_sq));
// target += normal_lpdf(y[2:T,i] | u_delta[i]
//                       + delta * y_lag[2:T,i], sigma_e);
for(j in 1:T){
if(j == 1){
y[j,i] ~ normal(mu[count], sigma_e / sqrt(one_minus_delta_sq));
}
else{
y[j,i] ~ normal(mu[count] + delta * y_lag[j,i] - delta*mu[count], sigma_e);
}
count += 1;
}
}
//u_raw ~ normal(0, 1);
//sigma_u ~ normal(0, 2);
sigma_e ~ normal(0, 1);
delta_raw ~ beta(4, 4);

// Random effects distribution
u_0k  ~ normal(0, sigma_u0k);
u_0jk ~ normal(0, sigma_u0jk);
beta[1] ~ normal(-0.25, 1);
}
generated quantities{
matrix[T,I] y_rep;
int count_rep = 1;
real one_minus_delta_sq = (1 - delta) * (1 + delta);
//vector[I] u_delta = u * (1 - delta);

for (i in 1:I) {
for(j in 1:T){
if(j == 1){
y_rep[j,i] = normal_rng(mu[count_rep], sigma_e / sqrt(one_minus_delta_sq));
}
else{
y_rep[j,i] = normal_rng(mu[count_rep] + delta * y_lag[j,i] - delta*mu[count_rep], sigma_e);
}
count_rep += 1;
}
}
}


Here mu[count] denotes the \mu_{ij} from my previous post.

When I estimate this model I get the warning that all my transitions after warm-up are divergent. However, all my parameters seem to have a reasonable value and a Rhat of 1, except from the delta parameter which is 0.00. This is very strange to me and I don’t really know what is going wrong here. Do you have any idea?

This is a pretty complex model, so it’s hard for me to see what the problem is. Does the model fit OK without the lagged term? Have you done fake data checking to make sure you can pull out known parameters in a model that doesn’t include the lagged term? If yes, can you fit fake data that includes the AR1 term?

Yes, without the lagged term the model works perfectly fine and predictive accuracy of the model is already pretty good. So the error only occurs when I add the lagged term in:

if(j == 1){
y_rep[j,i] = normal_rng(mu[count_rep], sigma_e / sqrt(one_minus_delta_sq));
}
else{
y_rep[j,i] = normal_rng(mu[count_rep] + delta * y_lag[j,i] - delta*mu[count_rep], sigma_e);
}


The parameter estimates are still fine after this model and predictive accuracy is basically the same as without the lagged term. The weird thing is that the delta parameter equals 0 and I get all those divergent transitions.

Have you tried non-centering your random effects u_0k and u_0jk?

Edit: noncentering would involve declaring a new parameter, u_0k_raw, e.g. and declaring u_0k as a transformed parameter like so:

parameters {
...
vector[Nk] u_0k = sigma_u0k * u_0k_raw;
...
}
model {
...
u_0k_raw ~ normal(0, 1);
...
}



Unfortunately, this gives me again only divergent transitions after warm-up. Next to that, the delta parameter stays 0

Did you generate fake data from the AR1 model and fit the fake data, or is this fitted to real data? If not, I’d say you should generate fake data from the AR1 model and make sure you can pull out the right parameter values from the fake data before fitting it to the real data.

@rtrangucci sorry for my late reply, but I have been trying to get the model working for a fake simulated dataset.

I simulated data according to y_{it} = (\alpha + u_{i}) + x_{i}\beta +\delta y_{i,t-1} +\epsilon_{it} using this code:

library(dplyr)
set.seed(123)
delta <- 0.8
sigma_u <- 0.5
sigma_e <- 1
I <- 100
T <- 30
u = rnorm(I, 0, sigma_u)
beta = 10
X_init = rnorm(I, 0, 1)#matrix(rnorm(I*T), nrow  = T, ncol = I)
Xbeta_init = as.matrix(X_init)%*%beta
initial_values <- data_frame(
individual = 1:I,
u = u,
Xbeta = as.vector(Xbeta_init),
initial_y = u + as.vector(Xbeta_init) + sigma_e /
sqrt(1 - delta^2) * rnorm(I) # p(y_1 | u, sigma_e, delta)
)

simulated_panel <- initial_values %>%
group_by(individual) %>%
do({
simulate <- function(x) {
out <- data.frame(y = rep(NA, T),
time = rep(NA, T),
u = x$u) for(t in 1:T) { out$time[t] <- t
if(t == 1) {
out$y[t] <- x$initial_y
} else {
out$y[t] <- rnorm(1, delta * (out$y[t-1] - x$u - x$Xbeta) + x$u + x$Xbeta, sigma_e)
}
}
return(out)
}
as.data.frame(simulate(.))
}) %>%
group_by(individual) %>%
mutate(lagged_y = lag(y)) %>% ungroup()

y <- matrix(0, T, I)
y_lag <- matrix(0, T, I)
for (i in 1:I) {
y[,i] <- filter(simulated_panel, individual == i)$y y_lag[2:T,i] <- filter(simulated_panel, individual == i)$lagged_y[2:T]
}


Afterwards, I estimated my model which looks like this in stan:

data {
int<lower=1> I;
int<lower=1> T;
matrix[T,I] y;
matrix[T,I] y_lag;

int<lower=0> Ni;

// Variables with fixed coefficient
matrix[Ni,p] fixedEffectVars;
}
parameters {
real<lower=0,upper=1> delta_raw;
real<lower=0> sigma_u;
real<lower=0> sigma_e;
vector[I] u_raw;

// Fixed effects
vector[p] beta;
}
transformed parameters {
vector[I] u = sigma_u * u_raw;
real delta = delta_raw * 2 - 1;

//Individual mean
for (i in 1:Ni) {
mu[i] = fixedEffectVars[i,]*beta;// + beta_0jk[countryMonthLookup[i]];
}
}
model {
real one_minus_delta_sq = (1 - delta) * (1 + delta);

for (i in 1:I) {
for(j in 1:T){
if(j == 1){
y[j,i] ~ normal(u[i] + mu[i], sigma_e / sqrt(one_minus_delta_sq));
}
else{
y[j,i] ~ normal(u[i] + mu[i] + delta * y_lag[j,i] - delta*(u[i]+mu[i]), sigma_e);
}
}
}

u_raw ~ normal(0, 1);
sigma_u ~ normal(0, 2);
sigma_e ~ normal(0, 1);
delta_raw ~ beta(4, 4);

}
generated quantities{
matrix[T,I] y_rep;
real one_minus_delta_sq = (1 - delta) * (1 + delta);

for (i in 1:I) {
for(j in 1:T){
if(j == 1){
y_rep[j,i] = normal_rng(u[i] + mu[i], sigma_e / sqrt(one_minus_delta_sq));
}
else{
y_rep[j,i] = normal_rng(u[i] + mu[i] + delta * y_lag[j,i] - delta*(u[i]+mu[i]), sigma_e);
}
}
}
}


Using

stan_dat <- list(I = I, T = T, y = y, y_lag = y_lag, Ni = 100, p = 1, fixedEffectVars = Xbeta_init)
library(rstan)

fit <- stan(file = "dynamic_test.stan",
data = stan_dat, iter = 5000, warmup = 2500, chains = 4)


In this case the delta, sigma_u and sigma_e parameters are estimated correctly. However, the estimated betaparameter equals 1, while it is equal to 10. Also, when changing the beta parameter to a different value in the DGP, the estimated value still equals about 1. Next to that I get the warning message that there were 2500 transitions after warmup that exceeded the maximum treedepth. Do you have any idea why this is the case and how to obtain the correct beta estimates?

Sorry, I’m having trouble following your logic in the Stan program with Ni, but I put together an example with fixed effects.

library(dplyr)
library(rstan)
set.seed(123)
delta <- 0.8
sigma_u <- 0.5
sigma_e <- 1
I <- 100
T <- 30
u = rnorm(I, 0, sigma_u)
p <- 2
beta = rnorm(p)
X <- lapply(1:I, function(x) matrix(rnorm(p * T), nrow = T, ncol = p))
fixed_effects_by_T <- lapply(X, function(x) x %*% beta)

values <- data_frame(
u = u,
initial_y = u + sapply(fixed_effects_by_T,function(x) x[1]) + sigma_e /
sqrt(1 - delta^2) * rnorm(I), # p(y_1 | u, sigma_e, delta),
mu = fixed_effects_by_T
)

dfs <- list()
for (i in 1:I) {
out <- data.frame(y = rep(NA, T),
time = rep(NA, T),
u = values$u[i], individual = i) for(t in 1:T) { out$time[t] <- t
if(t == 1) {
out$y[t] <- values$initial_y[i]
} else {
out$y[t] <- rnorm(1, delta * (out$y[t-1] - values$u[i] - values$mu[[i]][t]) + values$u[i] + values$mu[[i]][t], sigma_e)
}
}
out <- out %>% mutate(
lagged_y = lag(y)
)
dfs[[i]] <- out
}
simulated_panel <- do.call(rbind,dfs)

y <- matrix(0, T, I)
y_lag <- matrix(0, T, I)
for (i in 1:I) {
y[,i] <- filter(simulated_panel, individual == i)$y y_lag[2:T,i] <- filter(simulated_panel, individual == i)$lagged_y[2:T]
}

mod <- stan_model('dynamic-panel_w_predictors.stan')
stan_data <- list(I = I, T = T, p = p, y = y, y_lag = y_lag, X = X)

fit <- sampling(mod, data = stan_data, iter = 2000, cores = 4, chains = 4, control = list(adapt_delta = 0.9))
library(bayesplot)
mcmc_recover_intervals(as.matrix(fit, pars = 'beta'), true = beta)
mcmc_recover_intervals(as.matrix(fit, pars = 'delta'), true = delta)
mcmc_recover_intervals(as.matrix(fit, pars = 'sigma_u'), true = sigma_u)
mcmc_recover_intervals(as.matrix(fit, pars = 'sigma_e'), true = sigma_e)

data {
int<lower=1> I;
int<lower=1> T;
int<lower=1> p;
matrix[T,I] y;
matrix[T,I] y_lag;
matrix[T,p] X[I];
}
parameters {
real<lower=0,upper=1> delta_raw;
real<lower=0> sigma_u;
real<lower=0> sigma_e;
vector[I] u_raw;

// Fixed effects
vector[p] beta;
}
transformed parameters {
vector[I] u = sigma_u * u_raw;
real delta = delta_raw * 2 - 1;
matrix[T, I] mu;
for (i in 1:I)
mu[,i] = u[i] + X[i] * beta;
}
model {
real one_minus_delta_sq = (1 - delta) * (1 + delta);

for (i in 1:I) {
for(j in 1:T){
if(j == 1){
y[j,i] ~ normal(mu[1,i], sigma_e / sqrt(one_minus_delta_sq));
}
else{
y[j,i] ~ normal(mu[j,i] * (1 - delta) + delta * y_lag[j,i], sigma_e);
}
}
}

u_raw ~ normal(0, 1);
sigma_u ~ normal(0, 2);
sigma_e ~ normal(0, 1);
delta_raw ~ beta(4, 4);
beta ~ normal(0, 5);
}
generated quantities{
matrix[T,I] y_rep;
real one_minus_delta_sq = (1 - delta) * (1 + delta);

for (i in 1:I) {
for(j in 1:T){
if(j == 1){
y_rep[j,i] = normal_rng(mu[1,i], sigma_e / sqrt(one_minus_delta_sq));
}
else{
y_rep[j,i] = normal_rng(mu[j,i] * (1 - delta) + delta * y_lag[j,i], sigma_e);
}
}
}
}


The model seems to do OK at pulling out the parameters, though there are divergent transitions if you don’t dial up adapt_delta to 0.9. This example should be extensible to other models with group-level predictors.

1 Like

@rtrangucci Thanks you so much! I am going to make this work in my multilevel context. I will let you know if it works!

sidenote: Ni is just the total number of observations

1 Like

@rtrangucci I have one final question about your model. I want to make the lagged variable parameter \delta also dependent on i. So the model then becomes y_{it} = (\alpha+u_{i}) + x_{i}\beta + \delta_{i} + \varepsilon_{it}. In my multilevel setting I could again estimate \delta_i using non-centered parameterization and use:

parameters {
...
real delta_raw;
real u_delta_raw[Nk];
real<lower=0> sigma_u_delta
...
}
transformed parameters{
real u_delta[N];
for(i in 1:Nk){//number of individuals i
u_delta[i] = sigma_u_delta*u_delta_raw[i];
}

for(k in 1:Nk){//number of individuals i
delta[k] = delta_raw + u_delta[k];
}
}
model {
...
u_delta_raw ~ normal(0, 1);
sigma_u_delta~normal(0,1);
...
}


However, this does not necessarily mean that my \delta_i is in the interval [-1,1]. Do you maybe know how to incorporate this?