# Dynamic panel data models with Stan?

No just a single individual, but without the \alpha term. I figured I’d start simple. The posterior pairs plots don’t actually look bad for that model or show any non-identifiabilities.

Just bumping this in case the right person missed it. Seems too common a problem to not have been discussed on the forum before.

You should look at the R package ctsem, got continuous time SEM. This allows you to do dynamic systems with either frequentist or Bayesian, Bayesian is done with Stan.

The simplest version of the dinámica system is the panel model.

1 Like

I never took the time to quite wrap my head around the Arellano Bond thing, but the poor performance of the rstanarm spec was eye opening. As Mauricio said, ctsem provides a front end for specifying hierarchical time series / state space models in Stan. I took a quick look and it seems to behave as you want, though performance is much slower than it could be for a perfectly observed first order model like you have (since there is no need to integrate out the latent states).

install.packages("devtools")
library(devtools)
install_github("cdriveraus/ctsem")

library(rstanarm); library(tidyverse); library(ctsem)

# Simulate some fake data
delta <- 0.5
sigma_u <- 2
sigma_e <- 1
I <- 100
T <- 10

initial_values <- data_frame(individual = 1:I,
initial_y = rnorm(I),
u = rnorm(I, 0, sigma_u))

simulated_panel <- initial_values %>%
group_by(individual) %>%
do({
simulate <- function(x) {
out <- data.frame(y = rep(NA, T),
time = rep(NA, T),
u = x$u) for(t in 1:T) { out$time[t] <- t
if(t == 1) {
out$y[t] <- rnorm(1, delta * x$initial_y + x$u, sigma_e) } else { out$y[t] <- rnorm(1, delta * out$y[t-1] + x$u, sigma_e)
}
}
return(out)
}
as.data.frame(simulate(.))
}) %>%
group_by(individual) %>%
mutate(lagged_y = lag(y)) %>%
filter(!is.na(lagged_y))

cor(simulated_panel$lagged_y, simulated_panel$u)

#ctsem model
ctm <- ctModel(n.manifest = 1,n.latent = 1,LAMBDA = diag(1), type = 'standt',
MANIFESTMEANS=matrix(0,1,1),CINT=matrix('u',1,1),
MANIFESTVAR=diag(1e-6,1),
T0VAR=matrix('stationary',1,1),
manifestNames = 'y')

#set all parameters except the initial states and intercept fixed across subjects
ctm$pars$indvarying[!ctm$pars$matrix %in% c('T0MEANS','CINT')] <- FALSE

ctm$subjectIDname <- 'individual' #fit using optimization ctf <- ctStanFit(as.data.frame(simulated_panel),ctstanmodel = ctm, chains=6,optimize=TRUE,verbose=0,iter=300,deoptim = F,isloops = 0,issamples = 100) summary(ctf) #fit with sampling ctf <- ctStanFit(as.data.frame(simulated_panel),ctstanmodel = ctm,chains=6,iter=300) cat(ctf$stanmodeltext) #to view the stan model -- lots of unnecessary stuff for this problem though!

Ask @syclik—somehow he or someone else turned on Stan highlighting. So I imagine if there’s a Stan plugin, there’s an R plugin!

Here’s my solution, with some code.

set.seed(123)
delta <- 0.5
sigma_u <- 0.5
sigma_e <- 1
I <- 100
T <- 30
u = rnorm(I, 0, sigma_u)
initial_values <- data_frame(
individual = 1:I,
u = u,
initial_y = u + sigma_e /
sqrt(1 - delta^2) * rnorm(I) # p(y_1 | u, sigma_e, delta)
)

simulated_panel <- initial_values %>%
group_by(individual) %>%
do({
simulate <- function(x) {
out <- data.frame(y = rep(NA, T),
time = rep(NA, T),
u = x$u) for(t in 1:T) { out$time[t] <- t
if(t == 1) {
out$y[t] <- x$initial_y
} else {
out$y[t] <- rnorm(1, delta * (out$y[t-1] - x$u) + x$u, sigma_e)
}
}
return(out)
}
as.data.frame(simulate(.))
}) %>%
group_by(individual) %>%
mutate(lagged_y = lag(y)) %>% ungroup()

y <- matrix(0, T, I)
y_lag <- matrix(0, T, I)
for (i in 1:I) {
y[,i] <- filter(simulated_panel, individual == i)$y y_lag[2:T,i] <- filter(simulated_panel, individual == i)$lagged_y[2:T]
}

stan_dat <- list(I = I, T = T, y = y, y_lag = y_lag)
library(rstan)
mod <- stan_model('dynamic-panel.stan')
fit <- sampling(mod, data = stan_dat, cores = 4, chains = 4, iter = 2000)


Here’s dynamic-panel.stan:

data {
int<lower=1> I;
int<lower=1> T;
matrix[T,I] y;
matrix[T,I] y_lag;
}
parameters {
real<lower=0,upper=1> delta_raw;
real<lower=0> sigma_u;
real<lower=0> sigma_e;
vector[I] u_raw;
}
transformed parameters {
vector[I] u = sigma_u * u_raw;
real delta = delta_raw * 2 - 1;
}
model {
real one_minus_delta_sq = (1 - delta) * (1 + delta);
vector[I] u_delta = u * (1 - delta);
for (i in 1:I) {
target += normal_lpdf(y[1,i] | u[i],
sigma_e / sqrt(one_minus_delta_sq));
target += normal_lpdf(y[2:T,i] | u_delta[i]
+ delta * y_lag[2:T,i], sigma_e);
}
u_raw ~ normal(0, 1);
sigma_u ~ normal(0, 2);
sigma_e ~ normal(0, 1);
delta_raw ~ beta(4, 4);
}


Summary of the model run over the simulated data from above:

> print(fit , pars = c('delta','sigma_u','sigma_e'))
Inference for Stan model: dynamic-panel.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.

mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
delta   0.50    0.00 0.04 0.42 0.47 0.50 0.52  0.57   831 1.01
sigma_u 0.36    0.01 0.13 0.05 0.29 0.37 0.45  0.58   430 1.01
sigma_e 1.01    0.00 0.02 0.96 0.99 1.01 1.03  1.06  4000 1.00

Samples were drawn using NUTS(diag_e) at Fri Sep  7 08:03:36 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).


When I boost the number of observations per individual to 50 the inference gets sharper for sigma_u and approaches the true value.

Inference for Stan model: dynamic-panel.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.

mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
delta   0.48       0 0.01 0.46 0.47 0.48 0.49  0.51  4000    1
sigma_u 0.46       0 0.04 0.38 0.43 0.46 0.49  0.55  1400    1
sigma_e 0.99       0 0.01 0.98 0.99 0.99 1.00  1.01  4000    1

Samples were drawn using NUTS(diag_e) at Fri Sep  7 08:17:52 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

1 Like

Could you refer the paper or model that you based on to write your code? I would like to use your model but I can’t find the dynamic panel data model that you are using in your model.
EDIT:
Just in case if someone is interested in this approach who has no econometric background, I found the following paper useful to understand rtrangucci’s code.

Could you maybe explain why you multiplicate by 1-delta in vector[I] u_delta = u * (1 - delta);?
Why don’t you just use u[i] instead of u_delta[i] in target += normal_lpdf(y[2:T,i] | u_delta[i] + delta * y_lag[2:T,i], sigma_e);?

Hi @c5v, sorry for the delay. The use of u_delta[i] instead of u[i] is just an algebraic trick to cut down on computation. We could instead write the likelihood like:

target += normal_lpdf(y[2:T,i] | u[i] + delta * (y_lag[2:T,i] - u[i]), sigma_e)

target += normal_lpdf(y[2:T,i] | u[i] + delta * y_lag[2:T,i] - delta * u[i], sigma_e)

simplifying to

target += normal_lpdf(y[2:T,i] | u[i] * (1 - delta) + delta * y_lag[2:T,i], sigma_e)

2 Likes

The only thing I don’t really get is why you subtract u[i] in delta*(y_lag[2:T,i] - u[i]) since we are not modelling first differences here, right? Could you maybe explain this to me, this will help me a lot!

It’s because I want the mean of each observation in group i to be u[i]. The model is

Y_{i,t} = \mu_i + \delta (Y_{i,t-1} - \mu_i) + \epsilon_{i,t}\\ \epsilon_{i,t} \sim \text{Normal}(0,1)

Taking expectations of each side:

\mathbb{E}[Y_{i,t}] = \mu_i + \delta \mathbb{E}[(Y_{i,t-1} - \mu_i)] \\ \mathbb{E}[Y_{i,t}] = \mu_i + \delta (\mathbb{E}[Y_{i,t-1}] - \mu_i) \\ \mathbb{E}[Y_{i,t}] = \mu_i
1 Like

@rtrangucci clear explanation thanks!

If I am right, in a multilevel setting (which I am building right now), which then looks like y_{ijt} = \beta_{0ij} + \beta_{1ij}x_{ijt} + \delta y_{ij,t-1} + \epsilon_{ijt}, I could also use your reasoning and model it like:

y_{ijt} = \mu_{ij} + \delta(y_{ij,t-1} - \mu_{ij}) + \epsilon_{ijt}, where \mu_{ij} = \beta_{0ij} + \beta_{1ij}x_{ijt}

Sorry for all my questions, but I am really stuck with this problem.

No problem! You’re correct, you could model your process like that, and you’d get the nice property that \mathbb{E}[y_{ijt}] = \mu_{ij}

1 Like

Thank you so much for your explanation! I am going to try this approach.

Great! Happy to help. Let me know how it goes.

@rtrangucci Sorry for disturbing you again. But I have been trying to get your approach working in my multilevel context.

My stan code now looks like the following:

data {
int<lower=1> I;
int<lower=1> T;
matrix[T,I] y;
matrix[T,I] y_lag;

// Define variables in data test
// Number of level-1 observations (an integer)
int<lower=0> Ni;
// Number of level-2 clusters
int<lower=0> Nj;
// Number of level-3 clusters
int<lower=0> Nk;
// Numer of models x countries
int<lower = 0> Njk;
// Number of fixed effect parameters
int<lower=0> p;
// Number of random effect parameters
//int<lower=0> q;
int<lower=0> Npars;

// Variables with fixed coefficient
matrix[Ni,p] fixedEffectVars;

// Cluster IDs
int<lower=1> modelid[Ni];
int<lower=1> countryid[Ni];

// Level 3 look up vector for level 2
int<lower=1> countryLookup[Npars];
int<lower=1> countryMonthLookup[Njk];
}
parameters {
real<lower=0,upper=1> delta_raw;
//real<lower=0> sigma_u;
real<lower=0> sigma_e;
//vector[I] u_raw;

// Define parameters to estimate
// Population intercept (a real number)
real beta_0;

// Fixed effects
vector[p] beta;

// Level-2 random effect
real u_0jk[Npars];
real<lower=0> sigma_u0jk;

// Level-3 random effect
real u_0k[Nk];
real<lower=0> sigma_u0k;
}
transformed parameters {
//vector[I] u = sigma_u * u_raw;
real delta = delta_raw * 2 - 1;

// Varying intercepts
real beta_0jk[Npars];
real beta_0k[Nk];

// Individual mean
real mu[Ni];

// Varying intercepts definition
// Level-3 (10 level-3 random intercepts)
for (k in 1:Nk) {
beta_0k[k] = beta_0 + u_0k[k];
}
// Level-2 (100 level-2 random intercepts)
for (j in 1:Npars) {
beta_0jk[j] = beta_0k[countryLookup[j]] + u_0jk[j];
}

//Individual mean
for (i in 1:Ni) {
mu[i] = beta_0jk[countryMonthLookup[i]] + fixedEffectVars[i,]*beta;
}
}
model {
real one_minus_delta_sq = (1 - delta) * (1 + delta);
int count = 1;
//vector[I] u_delta = u * (1 - delta);
for (i in 1:I) {
// target += normal_lpdf(y[1,i] | u[i],
//                       sigma_e / sqrt(one_minus_delta_sq));
// target += normal_lpdf(y[2:T,i] | u_delta[i]
//                       + delta * y_lag[2:T,i], sigma_e);
for(j in 1:T){
if(j == 1){
y[j,i] ~ normal(mu[count], sigma_e / sqrt(one_minus_delta_sq));
}
else{
y[j,i] ~ normal(mu[count] + delta * y_lag[j,i] - delta*mu[count], sigma_e);
}
count += 1;
}
}
//u_raw ~ normal(0, 1);
//sigma_u ~ normal(0, 2);
sigma_e ~ normal(0, 1);
delta_raw ~ beta(4, 4);

// Random effects distribution
u_0k  ~ normal(0, sigma_u0k);
u_0jk ~ normal(0, sigma_u0jk);
beta[1] ~ normal(-0.25, 1);
}
generated quantities{
matrix[T,I] y_rep;
int count_rep = 1;
real one_minus_delta_sq = (1 - delta) * (1 + delta);
//vector[I] u_delta = u * (1 - delta);

for (i in 1:I) {
for(j in 1:T){
if(j == 1){
y_rep[j,i] = normal_rng(mu[count_rep], sigma_e / sqrt(one_minus_delta_sq));
}
else{
y_rep[j,i] = normal_rng(mu[count_rep] + delta * y_lag[j,i] - delta*mu[count_rep], sigma_e);
}
count_rep += 1;
}
}
}


Here mu[count] denotes the \mu_{ij} from my previous post.

When I estimate this model I get the warning that all my transitions after warm-up are divergent. However, all my parameters seem to have a reasonable value and a Rhat of 1, except from the delta parameter which is 0.00. This is very strange to me and I don’t really know what is going wrong here. Do you have any idea?

This is a pretty complex model, so it’s hard for me to see what the problem is. Does the model fit OK without the lagged term? Have you done fake data checking to make sure you can pull out known parameters in a model that doesn’t include the lagged term? If yes, can you fit fake data that includes the AR1 term?

Yes, without the lagged term the model works perfectly fine and predictive accuracy of the model is already pretty good. So the error only occurs when I add the lagged term in:

if(j == 1){
y_rep[j,i] = normal_rng(mu[count_rep], sigma_e / sqrt(one_minus_delta_sq));
}
else{
y_rep[j,i] = normal_rng(mu[count_rep] + delta * y_lag[j,i] - delta*mu[count_rep], sigma_e);
}


The parameter estimates are still fine after this model and predictive accuracy is basically the same as without the lagged term. The weird thing is that the delta parameter equals 0 and I get all those divergent transitions.

Have you tried non-centering your random effects u_0k and u_0jk?

Edit: noncentering would involve declaring a new parameter, u_0k_raw, e.g. and declaring u_0k as a transformed parameter like so:

parameters {
...
vector[Nk] u_0k = sigma_u0k * u_0k_raw;
...
}
model {
...
u_0k_raw ~ normal(0, 1);
...
}



Unfortunately, this gives me again only divergent transitions after warm-up. Next to that, the delta parameter stays 0