# Dynamic panel data models with Stan?

@rtrangucci first of all thanks for all the help!

As final (really) question I would like to ask whether you know a paper or some formal description of the model:

I would like to use it in a paper for university but I cannot find any paper which uses this type of model which I can cite.

This one is not really what you are doing

I’m digging this approach:

@rtrangucci - do you know of any smart ways to deal with irregularly spaced observations?

Imputing missing data gets me most of the way there, but my estimates of \sigma_u and \sigma_e become non-identifiable as the proportion of missing values increases.

Appreciated,
Andrew

Just to make sure, this approach works because now the correlation between \mu_{i} and \delta(y_{i,t-1}-\mu_i) equals 0, right? Since E \left[\mu_i (\delta(y_{i,t-1} -\mu_{i})) \right] = \mu_{i}\delta (E\left[ y_{i,t-1} \right] - \mu_i) = 0, using E\left[ y_{i,t-1} \right] = \mu_i

1 Like

Hi @rtrangucci,

Like others, I’ve found this code really useful. So, thank you!

However, I’m having trouble estimating group-specific \delta_i on some simulated data.

I’ve reduced my model to pretty much exactly what you suggested for the random intercept model with group-specific \delta_i. However, the estimated \delta_i seem to be stuck around .08-.09, even when the true mean is .4.

Interestingly, when I remove the random intercept from the model, the range restriction on \delta_i goes away and the estimation is decent.

So, I’m wondering whether I might have implemented something wrong or whether you might have any insight into this behavior.

@c5v, it looks like you worked quite a bit with these models. Were you able to estimate random \delta_i?

If anyone has any insight into this, I’d love to hear it!

I’ve attached the simulated data (231.7 KB) I’m working with as well as the code (798 Bytes) for running and checking the model.

The stan model I’m using is:

data {
int<lower=1> S; // Number of states
int<lower=1> T; // Number of time points
int<lower=1> P; // Number of predictors
matrix[T,S] y; // Outcome variable
matrix[T,S] y_lag; // Lagged outcome
matrix[T,P] X[S]; // List of [T, P] matrices containing intercept and predictors
}
parameters {

real<lower=0,upper=1> delta_raw; // Raw AR(1) coefficient
real u_delta_raw[S];
real<lower=0> sigma_u_delta;

vector[S] u_raw;
real<lower=0> sigma_u;

real<lower=0> sigma_e; // Residual error

vector[P] beta;
}
transformed parameters {

real delta[S];
real u_delta[S];
vector[S] u;
matrix[T, S] mu;

u = sigma_u * u_raw;

for(s in 1:S){//number of states s
u_delta[s] = sigma_u_delta*u_delta_raw[s];
}

for(s in 1:S){//number of states s
delta[s] = inv_logit(delta_raw + u_delta[s]) * 2 - 1;
}

for (s in 1:S)
//mu[,s] = u[s] + X[s] * beta;
mu[,s] = u[s] + X[s] * beta;

}
model {

for (s in 1:S) {
for(j in 1:T){
if(j == 1){
y[j,s] ~ normal(mu[1,s], sigma_e / sqrt(1 - delta[s]) * (1 + delta[s]));
}
else{
y[j,s] ~ normal(mu[j,s] * (1 - delta[s]) + delta[s] * y_lag[j,s], sigma_e);
}
}
}

sigma_u ~ normal(0,2);
u_raw ~ normal(0,1);
sigma_e ~ normal(0, 1);
beta[1] ~ normal(0, 2);

delta_raw ~ beta(6, 6);
u_delta_raw ~ normal(0, 1);
sigma_u_delta ~ normal(0, 2);

}
generated quantities{
matrix[T,S] y_rep;

for (s in 1:S) {
for(j in 1:T){
if(j == 1){
y_rep[j,s] = normal_rng(mu[1,s], sigma_e / sqrt(1 - delta[s]) * (1 + delta[s]));
}
else{
y_rep[j,s] = normal_rng(mu[j,s] * (1 - delta[s]) + delta[s] * y_lag[j,s], sigma_e);
}
}
}
}


@rtrangucci thanks a lot for your post. I’m having troubles understanding

How do you get that target for t=1 from this equation?

Y_{i,t} = \mu_i + \delta (Y_{i,t-1} - \mu_i) + \epsilon_{i,t}

I understand the target for 2:T but the first one is confusing me.

Hi @ignacio!

I’m not the great @rtrangucci, but let me give it a try.

So, unfortunately the notation changed a bit over the course of this thread—but, basically what’s u[i] in the code is \mu_i in equation, right? So far so good.

In the answers above you can find that \mathbb{E}[Y_{i,t}]=\mu_i, so in the code that is the u[i]—the mean of y[1,i].

For the variance we have

\begin{align} \text{Var}[Y_{i,t}]&=\text{Var}[\delta Y_{i,t-1}]+\text{Var}[\epsilon_{i,t}] \\ \text{Var}[Y_{i,t}]&=\delta^2\text{Var}[Y_{i,t-1}] +\sigma^2_\epsilon\\ \text{Var}[Y_{i,t}] - \delta^2\text{Var}[Y_{i,t-1}]&=\sigma^2_\epsilon. \end{align}

Now, I think we need to assume \text{Var}[Y_{i,t}] = \text{Var}[Y_{i,t-1}], which is reasonable (assume iid residuals / a stationary process). Then,

\begin{align} \text{Var}[Y_{i,t}] - \delta^2\text{Var}[Y_{i,t}]&=\sigma^2_\epsilon\\ \text{Var}[Y_{i,t}](1 - \delta^2)&=\sigma^2_\epsilon\\ \text{Var}[Y_{i,t}]&=\frac{\sigma^2_\epsilon}{(1 - \delta^2)}\\ \text{Sd}[Y_{i,t}]&=\frac{\sigma_\epsilon}{\sqrt{1 - \delta^2}}, \end{align}

which is the sigma_e / sqrt(one_minus_delta_sq) part of the code.

I learned this at the Helsinki StanCon Tutorial with Jonah. Have a look—he also discusses a GP formulation of this AR(1) process.

Cheers! :)

4 Likes

Thanks a lot @Max_Mantei! Do you know by any chance if there is a video for @jonah’s tutorial?

Videos can be found here, I think. But iirc, Jonah didn’t have enough time to cover this part of the tutorial in class. I worked through it later, when he uploaded the full thing. The markdown file is really easy to follow. :)

3 Likes

I’m trying to generate fake data for the multilevel model described by @c5v :

y_{ijt} = \mu_{ij} + \delta (y_{i,j,t-1} - \mu_{ij}) + \epsilon_{ijt}

To keep things simple:

\mu_{ij} = a_i + b_{j[i]}

How should I generate the fake data? This is what I did but I’m not sure if it is okay:

set.seed(123)
delta <- 0.5

sigma_a <- 0.5
sigma_b <- 0.25
sigma_e <- 1

I <- 100   # Number of individuals
T <- 50   # Number of time periods
J <- 10   # Number of groups

fake_data <- tibble(individual = 1:I,
group = sample(1:J, I, replace=TRUE))

a <- tibble(individual = 1:I,
a = rnorm(I, 0, sigma_a))
b <- tibble(group = 1:J,
b = rnorm(J, 0, sigma_b))

initial_values <- fake_data %>%
inner_join(a, by = "individual") %>%
inner_join(b, by = "group") %>%
mutate(u = a + b,
initial_y = u +
sigma_e / sqrt(1 - delta^2) * rnorm(I))  # Is this right???

simulated_panel <- initial_values %>%
group_by(individual, group) %>%  # Is this right???
do({
simulate <- function(x) {
out <- data.frame(y = rep(NA, T),
time = rep(NA, T),
u = x$u) for(t in 1:T) { out$time[t] <- t
if(t == 1) {
out$y[t] <- x$initial_y
} else {
out$y[t] <- rnorm(1, delta * (out$y[t-1] - x$u) + x$u, sigma_e)
}
}
return(out)
}
as.data.frame(simulate(.))
}) %>%
group_by(individual, group) %>%  # Is this right???
mutate(lagged_y = lag(y)) %>% ungroup()

y <- matrix(0, T, I)
y_lag <- matrix(0, T, I)

for (i in 1:I) {
y[,i] <- filter(simulated_panel, individual == i)$y y_lag[2:T,i] <- filter(simulated_panel, individual == i)$lagged_y[2:T]
}


Thanks for the help!

@rtrangucci could you help me understand how the model you described would change if I added linear trend?. What I have in mind is something like this:

Y_{i,t} = \mu_i + \delta(Y_{i,t-1} - \mu_i - \beta (t-1)) + \beta t + \epsilon_{i,t}

Is this the right way of thinking about adding a linear trend or am I missing something?

1 Like

Hi everyone,

I have a question regarding the data generating process used here. I am not an econometrician and doing this stuff mainly for my PhD thesis where we plan to estimate a hierarchical error correction model in Stan. The code provided by @rtrangucci works nice but could anyone explain me, what is the reason behind creating {y_{t > 1}} as:

out$y[t] <- rnorm(1, delta * (out$y[t-1] - values$u[i] - values$mu[[i]][t]) + values$u[i] + values$mu[[i]][t], sigma_e)


out$y[t] <- rnorm(1, delta * out$y[t-1]  + values$u[i] + values$mu[[i]][t], sigma_e)


for @rtrangucci 's code?

In the beginning of the thread, the second approach was used. But then it was changed to the first one. By using the first alternative, I was not able to get meaningful estimates of the parameters in a group-level setting (i.e. estimating the model e.g. only for individual 1), although the estimates are fine using the code provided here for the panel setting. This is not surprising since I regress {y_{t}} on {y_{t-1}} and not on e.g., {y_{t-1}- \mu_{i}-\beta x_{t}} for the univariate case. If I change the data generating process to out$y[t] <- rnorm(1, delta * out$y[t-1] + values$u[i] + values$mu[[i]][t], sigma_e), so what @James_Savage originally did, the provided code doesn’t discover the true parameters anymore. But in practice we would assume that {y_{t}} is generated as

{y_{t}= \mu_{i}+\beta x_{t}+\delta y_{t-1}+ \epsilon_{t}}

and not

{y_{t}= \mu_{i}+\beta x_{t}+\delta (y_{t-1}- \mu_{i}-\beta x_{t})+ \epsilon_{t}}

or am I totally wrong?

Thanks!

Hi @ignacio, sorry for the delay! Yes, I think you’ve got the right idea. The key is that you want E[Y_{i,t}] = \mu_i + \beta t which you should have here. This is just a special case of the model shown here, which allows for covariates that vary by i,t

2 Likes

Hi @Dirk_B, the reason for generating the observations like

out$y[t] <- rnorm(1, delta * (out$y[t-1] - values$u[i] - values$mu[[i]][t])
+ values$u[i] + values$mu[[i]][t], sigma_e)


is to ensure that the conditional expectation for Y_{i,t} - X_{i,t} \beta | \mu_i is \mu_i for all t for values of \delta \in (0,1). If you generate data from the latter code:

out$y[t] <- rnorm(1, delta * out$y[t-1]
+ values$u[i] + values$mu[[i]][t], sigma_e)


then, as you say, you’re generating observations according to

Y_{i,t} = \mu_i + X_{i,t} \beta + \delta Y_{i,t-1} + \epsilon_{i,t}, so (in all the following E[ \cdot ] is E[ \cdot | \mu_i])

E[Y_{i,t} - X_{i,t} \beta] = E[\mu_i + \delta Y_{i,t-1} - \delta X_{i,t-1} \beta + \delta X_{i,t-1} \beta + \epsilon_{i,t}]

E[Y_{i,t} - X_{i,t} \beta] = \mu_i + \delta (E[Y_{i,t-1} - X_{i,t-1} \beta]) + \delta X_{i,t-1} \beta

given that E[Y_{i,t} - X_{i,t} \beta] = E[Y_{i,t-1} - X_{i,t-1} \beta], then

E[Y_{i,t} - X_{i,t} \beta] - \delta (E[Y_{i,t} - X_{i,t} \beta]) = \mu_i + \delta X_{i,t-1} \beta so

E[Y_{i,t}] = X_{i,t} \beta + \tfrac{\mu_i}{1-\delta} + \tfrac{\delta}{1 - \delta} X_{i,t-1}\beta.

If you fit the model:

Y_{i,t} = \mu_i + \beta x_{i,t} + \delta Y_{i,t-1} + \epsilon_{i,t} but your data are generated according to:

Y_{i,t} = \tfrac{\mu_i}{1-\delta} + X_{i,t} \beta + \tfrac{\delta}{1 - \delta} X_{i,t-1}\beta + \epsilon_{i,t}, you’ll get misfit.

Y_{i,t} = \mu_i + X_{i,t} \beta + \delta (Y_{i,t-1} - X_{i,t-1} \beta - \mu_i) + \epsilon_{i,t}.

If E[Y_{i,t} - X_{i,t} \beta] = \mu_i for all t, then

E[Y_{i,t} - X_{i,t} \beta] = \mu_i + \delta (E[Y_{i,t-1} - X_{i,t-1} \beta] - \mu_i ) + E[\epsilon_{i,t}].

or

E[Y_{i,t} - X_{i,t} \beta] = \mu_i + \delta (\mu_i - \mu_i ) + 0

and

E[Y_{i,t} - X_{i,t} \beta] = \mu_i

Yes, that’s right, though I might think about it using conditional expectation:

E[\mu_i (Y_{i,t-1} - \mu_i)] = E[\mu_i Y_{i,t-1}] - E[\mu_i^2] = E[E[\mu_i Y_{i,t-1} | \mu_i]] - \sigma_u^2 = E[\mu_i E[Y_{i,t-1} | \mu_i]] - \sigma_u^2
so
E[\mu_i \mu_i] - \sigma_u^2 = 0

Thanks @Max_Mantei! That’s spot on.

@aornugent Probably you’ve moved on from this, but if not…

If everything is Gaussian, you should be able to integrate out the missing observations, but if you’ve got covariates associated with the missing values, you’ll probably need to explicitly model those missing covariates, which might get messy.

@joeHoover if you’ve already fixed this, great! If not, it looks like you’ve got a typo for the likelihood of the initial observation:

 y[j,s] ~ normal(mu[1,s], sigma_e / sqrt(1 - delta[s]) * (1 + delta[s]));


should be

 y[j,s] ~ normal(mu[1,s], sigma_e / sqrt((1 - delta[s]) * (1 + delta[s])));


that might be the fix. I’ll see if I can get something running with the full hierarchy over the AR1 parameter \delta_s and post back here.

1 Like

Hi @joeHoover,

I got the following model to run OK:

data {
int<lower=1> I;
int<lower=1> TT;
int<lower=1> p;
matrix[TT,I] y;
matrix[TT,I] y_lag;
matrix[TT,p] X[I];
}
parameters {
real delta_mean;
real<lower=0> sigma_delta;
vector[I] delta_raw;
real<lower=0> sigma_u;
real<lower=0> sigma_e;
vector[I] u_raw;

// Fixed effects
vector[p] beta;
}
transformed parameters {
vector[I] u = sigma_u * u_raw;
vector[I] delta = 2 * inv_logit(delta_mean + sigma_delta * delta_raw) - 1;
matrix[TT, I] mu;
for (i in 1:I)
mu[,i] = X[i] * beta;
}
model {
vector[I] one_minus_delta_sq = (1 - delta) .* (1 + delta);
vector[I] u_delta = u .* (1 - delta);

for (i in 1:I) {
for(t in 1:TT){
if(t == 1){
y[t,i] ~ normal(u[i] + mu[1,i], sigma_e / sqrt(one_minus_delta_sq[i]));
}
else{
y[t,i] ~ normal(u_delta[i] + mu[t,i] + delta[i] * (y_lag[t,i] - mu[t-1,i]), sigma_e);
}
}
}

u_raw ~ normal(0, 1);
sigma_u ~ normal(0, 2);
sigma_e ~ normal(0, 1);
sigma_delta ~ normal(0, 1);
delta_mean ~ normal(0, 2);
delta_raw ~ normal(0, 1);
beta ~ normal(0, 5);
}
generated quantities{
matrix[TT,I] y_rep;
vector[I] one_minus_delta_sq = (1 - delta) .* (1 + delta);

for (i in 1:I) {
for(t in 1:TT){
if(t == 1){
y_rep[t,i] = normal_rng(u[i] + mu[1,i], sigma_e / sqrt(one_minus_delta_sq[i]));
}
else{
y_rep[t,i] = normal_rng(u[i] * (1 - delta[i]) + mu[t,i] + delta[i] * (y_lag[t,i] - mu[t-1,i]), sigma_e);
}
}
}
}


when using this data:

library(dplyr)
library(rstan)
set.seed(123)
delta_logit_mean <- qlogis((0.8 + 1)/2)
sigma_delta_logit <- 0.5
sigma_u <- 0.5
sigma_e <- 1
I <- 100
T <- 50
u = rnorm(I, 0, sigma_u)
deltas <- plogis(delta_logit_mean + sigma_delta_logit * rnorm(I)) * 2 - 1
p <- 2
beta = rnorm(p)
X <- lapply(1:I, function(x) matrix(rnorm(p * T), nrow = T, ncol = p))
fixed_effects_by_T <- lapply(X, function(x) x %*% beta)

values <- data_frame(
u = u,
initial_y = u + sapply(fixed_effects_by_T,function(x) x[1]) + sigma_e /
sqrt(1 - deltas^2) * rnorm(I), # p(y_1 | u, sigma_e, delta[i]),
mu = fixed_effects_by_T
)

dfs <- list()
for (i in 1:I) {
out <- data.frame(y = rep(NA, T),
time = rep(NA, T),
u = values$u[i], individual = i) for(t in 1:T) { out$time[t] <- t
if(t == 1) {
out$y[t] <- values$initial_y[i]
} else {
out$y[t] <- rnorm(1, deltas[i] * (out$y[t-1] - values$u[i] - values$mu[[i]][t - 1]) + values$u[i] + values$mu[[i]][t], sigma_e)
}
}
out <- out %>% mutate(
lagged_y = lag(y)
)
dfs[[i]] <- out
}
simulated_panel <- do.call(rbind,dfs)

y <- matrix(0, T, I)
y_lag <- matrix(0, T, I)
for (i in 1:I) {
y[,i] <- filter(simulated_panel, individual == i)$y y_lag[2:T,i] <- filter(simulated_panel, individual == i)$lagged_y[2:T]
}

mod <- stan_model('dynamic-panel-hier-ar.stan')
stan_data <- list(I = I, TT = T, p = p, y = y, y_lag = y_lag, X = X)

fit <- sampling(mod, data = stan_data, iter = 2000, cores = 4, chains = 4, control = list(adapt_delta = 0.9))
library(bayesplot)
mcmc_recover_intervals(as.matrix(fit, pars = 'beta'), true = beta)
mcmc_recover_intervals(as.matrix(fit, pars = 'u'), true = u)
mcmc_recover_intervals(as.matrix(fit, pars = 'sigma_u'), true = sigma_u)
mcmc_recover_intervals(as.matrix(fit, pars = 'sigma_e'), true = sigma_e)
mcmc_recover_intervals(as.matrix(fit, pars = 'sigma_delta'), true = sigma_delta_logit)
mcmc_recover_intervals(as.matrix(fit, pars = 'delta_mean'), true = delta_logit_mean)
mcmc_recover_intervals(as.matrix(fit, pars = 'delta'), true = deltas)


Lemme know if this runs for you (if you already figured this out, feel free to ignore!)

1 Like

Hi @rtrangucci,

thanks for your explanation! As I said, I am not an econometrician or statistician and also pretty new to Bayes, so please excuse me if my questions are very nonsense :) I am already ashamed to ask this, but I really try to understand.

I think I can generally follow your explanations, but I do not quite understand why we want E[Y_{it}-X_{it}\beta] = \mu_i and not e.g. E[Y_{it}-X_{it}\beta-\delta Y_{it-1}] = \mu_i which is how we define our model. What happens to \delta Y_{it-1}? Unfortunately it is also still difficult for me to understand the intuition regarding real observation data, which I would like to use the code for. Do I also assume that all real data was generated this way? What confuses me the most is that I can’t reproduce the simulated values for y_{it}. If I get a real time series data set with only one cross-section and my outcome and predictor variables and want to run a regression with lagged DV, I create the lagged DV and run my model Y_{t}=\alpha+X_{t}\beta+\delta Y_{t-1}+\epsilon_{t}. If we assume the error is very small, just calculating \hat\alpha+X_{t} \hat\beta+\hat\delta Y_{t-1} should give me values pretty close to Y_{t}. In my intuition, this should also work for the panel case with the only difference that we have \mu_i instead of \alpha which shouldn’t differ from the previous situation if we look at e.g. I=1 and ignore all the other cases. I know the true value for \mu_1 and I know the previous observation y_{1, t-1}. But if I do this after generating the data according to the code this does not work for me:
I.e., if I want to calculate Y_{i=1,t=3} given \mu_1=3.88, \delta=0.8 and Y_{i=1,t=2}=3.87 with \sigma_e=0.1 this gives me 6.98 which is not even close to the generated Y_{i=1,t=3} which is 3.83. I don’t know if I am a blockhead right now, but this really confuses me.

And my last question: you say that we generate data from Y_{i,t}=\mu_i+X_{it}\beta+\delta (Y_{it-1}-X_{it-1}\beta-\mu_i)+\epsilon_{it}. Wouldn’t this imply that we have to generate the data as:

out$y[t] <- rnorm(1, delta * (out$y[t-1] - values$u[i] - values$mu[[i]][t-1])
+ values$u[i] + values$mu[[i]][t], sigma_e)


?

Why do we use

out$y[t] <- rnorm(1, delta * (out$y[t-1] - values$u[i] - values$mu[[i]][t])
+ values$u[i] + values$mu[[i]][t], sigma_e)


here?

Thank you so much!

I’m also interested in dynamic hierarchical models. I’d just like to point out that @rtrangucci 's suggestion is equivalent to estimating an AR(1) model in errors, but this is different than a “dynamic” model where the lagged dependent variable is employed as a predictor.

Specifically, an AR(1) model in errors is given by:

E\left[y_{i,t}\right | \mu_{i,t}] = \mu_{i,t} + \epsilon_{i,t},

where \epsilon_{i,t} = \delta \epsilon_{i,t-1} + \eta_{i,t} and \eta_{i,t} is i.i.d. white noise. This can be rewritten as:

E\left[y_{i,t}\right | \mu_{i,t}] = \mu_{i,t} + \delta (y_{i,t-1} - \mu_{i,t-1}) + \eta_{i,t},

where \eta_{i,t} is i.i.d. white noise. This can be estimated using a non-linear least squares approach or via MLE. In contrast, a dynamic model is given by:

E\left[y_{i,t} | \mu_{i,t}\right] = \phi y_{i,t-1} + \mu_{i,t} + \nu_{i,t},

where \nu_{i,t} = u_{i} + \xi_{i,t}, u_{i} is a random or fixed effect, and \xi_{i,t} is i.i.d. white noise. This type of model assumes there is state-dependence of order 1 (i.e., last period’s level of the response variable affects this period’s level of the response).

Due to the presence of unobserved heterogeneity u_{i}, which is correlated with y_{i,t-1} by construction, estimation of a dynamic multilevel model is not straightforward.

I’m new to the Bayesian world, so I could be wrong, but the endogeneity of the problem precludes consistent and unbiased estimates of the coefficients. Frequentist methods generally get around this problem by first-differencing or applying some other transformation to the estimating equation. But any such transformation requires instrumental variables (i.e., higher order lags of the dependent variable) to be used to purge endogeneity between the transformed lagged dependent variable and the transformed \xi_{i,t}.

For a good primer, please see https://fmwww.bc.edu/EC-C/S2013/823/EC823.S2013.nn05.slides.pdf.

Personally I use PyMC3, but I would love to learn about a way to estimate dynamic models in a Bayesian framework. In economics, instrumental variables are often necessary due to endogeneity of some form, but the cure (instrumental variables) can sometime be worse than the disease, particularly if the instrumental variables are weakly correlated with the endogenous variable(s).

You’re right, my solution is a bit different than the question that spurred the thread. Here’s another solution for a dynamic model that I think gets the modeling right:

data {
int<lower=1> I;
int<lower=1> T;
int<lower=1> p;
matrix[T,I] y;
matrix[T,I] y_lag;
matrix[T,p] X[I];
}
parameters {
real delta_mean;
real<lower=0> sigma_delta;
vector[I] delta_raw;
real<lower=0> sigma_u;
real<lower=0> sigma_e;
vector[I] u_raw;

// Fixed effects
vector[p] beta;
real alpha;
}
transformed parameters {
vector[I] u = sigma_u * u_raw;
vector[I] delta = 2 * inv_logit(delta_mean + sigma_delta * delta_raw) - 1;
}
model {
vector[I] one_minus_delta_sq = (1 - delta) .* (1 + delta);

for (i in 1:I) {
vector[T] mu = X[i] * beta + alpha + u[i];
for(j in 1:T){
if(j == 1){
y[j,i] ~ normal(mu[1] / (1 - delta[i]), sigma_e / sqrt(one_minus_delta_sq[i]));
}
else{
y[j,i] ~ normal(mu[j] + delta[i] * y_lag[j,i], sigma_e);
}
}
}

u_raw ~ std_normal();
sigma_u ~ normal(0, 2);
sigma_e ~ std_normal();
sigma_delta ~ std_normal();
delta_mean ~ normal(0, 2);
delta_raw ~ std_normal();
beta ~ normal(0, 5);
alpha ~ std_normal();
}


Simulation code:

library(dplyr)
library(rstan)
set.seed(123)
delta_logit_mean <- qlogis((0.8 + 1)/2)
sigma_delta_logit <- 0.5
sigma_u <- 0.5
sigma_e <- 1
I <- 100
T <- 50
u = rnorm(I, 0, sigma_u)
alpha = rnorm(1)
deltas <- plogis(delta_logit_mean + sigma_delta_logit * rnorm(I)) * 2 - 1
p <- 2
beta = rnorm(p)
X <- lapply(1:I, function(x) matrix(rnorm(p * T), nrow = T, ncol = p))
fixed_effects_by_T <- lapply(X, function(x) x %*% beta + alpha)

values <- data_frame(
u = u,
initial_y = (u + sapply(fixed_effects_by_T,function(x) x[1]))/(1 - deltas) + sigma_e /
sqrt(1 - deltas^2) * rnorm(I), # N(y_1 | u / (1 - delta[i]), sigma_e / (1 - delta[i]^2)),
mu = fixed_effects_by_T
)

dfs <- list()
for (i in 1:I) {
out <- data.frame(y = rep(NA, T),
time = rep(NA, T),
u = values$u[i], individual = i) for(t in 1:T) { out$time[t] <- t
if(t == 1) {
out$y[t] <- values$initial_y[i]
} else {
out$y[t] <- rnorm(1, deltas[i] * out$y[t-1] + values$u[i] + values$mu[[i]][t], sigma_e)
}
}
out <- out %>% mutate(
lagged_y = lag(y)
)
dfs[[i]] <- out
}
simulated_panel <- do.call(rbind,dfs)

y <- matrix(0, T, I)
y_lag <- matrix(0, T, I)
for (i in 1:I) {
y[,i] <- filter(simulated_panel, individual == i)$y y_lag[2:T,i] <- filter(simulated_panel, individual == i)$lagged_y[2:T]
}

mod <- stan_model('dynamic-panel-hier-ar-alt.stan')
stan_data <- list(I = I, T = T, p = p, y = y, y_lag = y_lag, X = X)

fit <- sampling(mod, data = stan_data, iter = 2000, cores = 4, chains = 4, control = list(adapt_delta = 0.9))
library(bayesplot)
mcmc_recover_intervals(as.matrix(fit, pars = c('beta','alpha')), true = c(beta,alpha))
mcmc_recover_intervals(as.matrix(fit, pars = 'u'), true = u)
mcmc_recover_intervals(as.matrix(fit, pars = 'sigma_u'), true = sigma_u)
mcmc_recover_intervals(as.matrix(fit, pars = 'sigma_e'), true = sigma_e)
mcmc_recover_intervals(as.matrix(fit, pars = 'sigma_delta'), true = sigma_delta_logit)
mcmc_recover_intervals(as.matrix(fit, pars = 'delta_mean'), true = delta_logit_mean)
mcmc_recover_intervals(as.matrix(fit, pars = 'delta'), true = deltas)

cover <- function(x, l, u) {
if (x >= l && x <= u)
return(TRUE)
else
return(FALSE)
}

delta_ints <- bayesplot::mcmc_intervals_data(as.matrix(fit, pars = 'delta'))
mean(sapply(1:100,function(i) cover(deltas[i], delta_ints[i,'ll'], delta_ints[i,'hh'])))
mean(sapply(1:100,function(i) cover(deltas[i], delta_ints[i,'l'], delta_ints[i,'h'])))

u_ints <- bayesplot::mcmc_intervals_data(as.matrix(fit, pars = 'u'))
mean(sapply(1:100,function(i) cover(u[i], u_ints[i,'ll'], u_ints[i,'hh'])))
mean(sapply(1:100,function(i) cover(u[i], u_ints[i,'l'], u_ints[i,'h'])))

1 Like

Hi @Dirk_B,

you’re right that there’s a typo in the data generating code! Thanks for pointing that out. It should be:

out$y[t] <- rnorm(1, delta * (out$y[t-1] - values$u[i] - values$mu[[i]][t-1]) + values$u[i] + values$mu[[i]][t], sigma_e)

After going back to check my code, I realized I had another error in my model code! I edited the original post (Dynamic panel data models with Stan?) so that the code is correct. I think it’s too late to edit the original post where I made the first mistake (Dynamic panel data models with Stan?)

As for your first question, I took another crack at putting together a model with a DGP of Y_t = \alpha_i + X_t \beta + \delta_i Y_{t-1} + \epsilon_{i,t}, see the solution Dynamic panel data models with Stan?

1 Like

Thanks @rtrangucci. I’m not confident that this new solution solves the problem, but perhaps I’m missing something. From what I can tell, you’re treating the unobserved fixed/random effect, u_i, as a parameter to be estimated, namely as a group-specific constant. More formally, assuming the model is

y_{it} = \beta_{0,i} + \phi_{i} y_{i, t-1} + \beta_{1,i} x_{it} + u_{i} + \epsilon_{it},

you’re in effect treating the u_{i} as \beta_{0,i}. That is, you’re really estimating

y_{it} = \beta_{0,i} + \phi_{i} y_{i, t-1} + \beta_{1,i} x_{it} + \epsilon_{it},

which assumes no unobserved heterogeneity through u_i. The distinction is that u_{i} is an unobserved set of group-specific traits (e.g., u_i could be brand value, operational efficiency or know-how, and y_{it} is firm i's profitability) correlated with y_{i, t-1}, whereas \beta_{0,i} is a parameter.

Thus, if we wanted to estimate the equation as-is (i.e., without transformation to eliminate the u_{i} and without instrumental variables), I think there needs to be some restriction or assumption that reflects the correlation between u_{i} and y_{i,t-1}.

I’m treating u_i as a random intercept at the group level i, or unobserved heterogeneity at the group level, so wouldn’t that imply a correlation structure between y_{i,t-1} and u_i?

So the full model would be:

\begin{align} y_{i,t} | u_i, y_{i,t-1}, X_i, \sigma, \beta, \phi_i & \sim N(u_i + \phi_i y_{i,t-1} + X_{it} \beta, \sigma^2) \\ u_i | \tau & \sim N(0, \tau^2) \\ \tau & \sim \text{N}^+(0, 2) \\ \sigma & \sim \text{N}^+(0, 1) \end{align}

Then \text{Cov}(y_{i,t-1},u_i) would be:

\begin{align} \text{Cov}(y_{i,t-1}, u_i) & = \text{Cov}(u_i + \phi_i y_{i,t-2} + X_{it} \beta + \epsilon_{i,t},u_i) \\ & = \text{Cov}(u_i,u_i) + \text{Cov}(\phi_i y_{i,t-2},u_i) + \text{Cov}(X_{it} \beta,u_i) + \text{Cov}(\epsilon_{i,t},u_i) \\ & = \tau^2 + \phi_i \text{Cov}(y_{i,t-2},u_i) + 0 + 0 \\ & = \tau^2 + \phi_i \text{Cov}(y_{i,t-1},u_i) \,\,\, \text{By stationarity} \\ \end{align}

giving \text{Cov}(y_{i,t-1},u_i) = \frac{\tau^2}{1 - \phi_i}

But I also might be totally misunderstanding the model, in which case, I apologize!

I agree, I’m certainly imposing assumptions in order to fit this model, though I’m not eliminating the covariance between u_i and y_{i,t-1}. There’s a distributional assumption over the random intercepts, and I’m imposing stationarity as well through the assumption that \phi_i \in (-1, 1) for all i.