@rtrangucci - do you know of any smart ways to deal with irregularly spaced observations?
Imputing missing data gets me most of the way there, but my estimates of \sigma_u and \sigma_e become non-identifiable as the proportion of missing values increases.
Just to make sure, this approach works because now the correlation between \mu_{i} and \delta(y_{i,t-1}-\mu_i) equals 0, right? Since E \left[\mu_i (\delta(y_{i,t-1} -\mu_{i})) \right] = \mu_{i}\delta (E\left[ y_{i,t-1} \right] - \mu_i) = 0, using E\left[ y_{i,t-1} \right] = \mu_i
Like others, I’ve found this code really useful. So, thank you!
However, I’m having trouble estimating group-specific \delta_i on some simulated data.
I’ve reduced my model to pretty much exactly what you suggested for the random intercept model with group-specific \delta_i. However, the estimated \delta_i seem to be stuck around .08-.09, even when the true mean is .4.
Interestingly, when I remove the random intercept from the model, the range restriction on \delta_i goes away and the estimation is decent.
So, I’m wondering whether I might have implemented something wrong or whether you might have any insight into this behavior.
@c5v, it looks like you worked quite a bit with these models. Were you able to estimate random \delta_i?
If anyone has any insight into this, I’d love to hear it!
I’ve attached the simulated data (231.7 KB) I’m working with as well as the code (798 Bytes) for running and checking the model.
The stan model I’m using is:
data {
int<lower=1> S; // Number of states
int<lower=1> T; // Number of time points
int<lower=1> P; // Number of predictors
matrix[T,S] y; // Outcome variable
matrix[T,S] y_lag; // Lagged outcome
matrix[T,P] X[S]; // List of [T, P] matrices containing intercept and predictors
}
parameters {
real<lower=0,upper=1> delta_raw; // Raw AR(1) coefficient
real u_delta_raw[S];
real<lower=0> sigma_u_delta;
vector[S] u_raw;
real<lower=0> sigma_u;
real<lower=0> sigma_e; // Residual error
vector[P] beta;
}
transformed parameters {
real delta[S];
real u_delta[S];
vector[S] u;
matrix[T, S] mu;
u = sigma_u * u_raw;
for(s in 1:S){//number of states s
u_delta[s] = sigma_u_delta*u_delta_raw[s];
}
for(s in 1:S){//number of states s
delta[s] = inv_logit(delta_raw + u_delta[s]) * 2 - 1;
}
for (s in 1:S)
//mu[,s] = u[s] + X[s] * beta;
mu[,s] = u[s] + X[s] * beta;
}
model {
for (s in 1:S) {
for(j in 1:T){
if(j == 1){
y[j,s] ~ normal(mu[1,s], sigma_e / sqrt(1 - delta[s]) * (1 + delta[s]));
}
else{
y[j,s] ~ normal(mu[j,s] * (1 - delta[s]) + delta[s] * y_lag[j,s], sigma_e);
}
}
}
sigma_u ~ normal(0,2);
u_raw ~ normal(0,1);
sigma_e ~ normal(0, 1);
beta[1] ~ normal(0, 2);
delta_raw ~ beta(6, 6);
u_delta_raw ~ normal(0, 1);
sigma_u_delta ~ normal(0, 2);
}
generated quantities{
matrix[T,S] y_rep;
for (s in 1:S) {
for(j in 1:T){
if(j == 1){
y_rep[j,s] = normal_rng(mu[1,s], sigma_e / sqrt(1 - delta[s]) * (1 + delta[s]));
}
else{
y_rep[j,s] = normal_rng(mu[j,s] * (1 - delta[s]) + delta[s] * y_lag[j,s], sigma_e);
}
}
}
}
I’m not the great @rtrangucci, but let me give it a try.
So, unfortunately the notation changed a bit over the course of this thread—but, basically what’s u[i] in the code is \mu_i in equation, right? So far so good.
In the answers above you can find that \mathbb{E}[Y_{i,t}]=\mu_i, so in the code that is the u[i]—the mean of y[1,i].
Videos can be found here, I think. But iirc, Jonah didn’t have enough time to cover this part of the tutorial in class. I worked through it later, when he uploaded the full thing. The markdown file is really easy to follow. :)
@rtrangucci could you help me understand how the model you described would change if I added linear trend?. What I have in mind is something like this:
I have a question regarding the data generating process used here. I am not an econometrician and doing this stuff mainly for my PhD thesis where we plan to estimate a hierarchical error correction model in Stan. The code provided by @rtrangucci works nice but could anyone explain me, what is the reason behind creating {y_{t > 1}} as:
In the beginning of the thread, the second approach was used. But then it was changed to the first one. By using the first alternative, I was not able to get meaningful estimates of the parameters in a group-level setting (i.e. estimating the model e.g. only for individual 1), although the estimates are fine using the code provided here for the panel setting. This is not surprising since I regress {y_{t}} on {y_{t-1}} and not on e.g., {y_{t-1}- \mu_{i}-\beta x_{t}} for the univariate case. If I change the data generating process to out$y[t] <- rnorm(1, delta * out$y[t-1] + values$u[i] + values$mu[[i]][t], sigma_e), so what @James_Savage originally did, the provided code doesn’t discover the true parameters anymore. But in practice we would assume that {y_{t}} is generated as
Hi @ignacio, sorry for the delay! Yes, I think you’ve got the right idea. The key is that you want E[Y_{i,t}] = \mu_i + \beta t which you should have here. This is just a special case of the model shown here, which allows for covariates that vary by i,t
is to ensure that the conditional expectation for Y_{i,t} - X_{i,t} \beta | \mu_i is \mu_i for all t for values of \delta \in (0,1). If you generate data from the latter code:
@aornugent Probably you’ve moved on from this, but if not…
If everything is Gaussian, you should be able to integrate out the missing observations, but if you’ve got covariates associated with the missing values, you’ll probably need to explicitly model those missing covariates, which might get messy.
@joeHoover if you’ve already fixed this, great! If not, it looks like you’ve got a typo for the likelihood of the initial observation:
thanks for your explanation! As I said, I am not an econometrician or statistician and also pretty new to Bayes, so please excuse me if my questions are very nonsense :) I am already ashamed to ask this, but I really try to understand.
I think I can generally follow your explanations, but I do not quite understand why we want E[Y_{it}-X_{it}\beta] = \mu_i and not e.g. E[Y_{it}-X_{it}\beta-\delta Y_{it-1}] = \mu_i which is how we define our model. What happens to \delta Y_{it-1}? Unfortunately it is also still difficult for me to understand the intuition regarding real observation data, which I would like to use the code for. Do I also assume that all real data was generated this way? What confuses me the most is that I can’t reproduce the simulated values for y_{it}. If I get a real time series data set with only one cross-section and my outcome and predictor variables and want to run a regression with lagged DV, I create the lagged DV and run my model Y_{t}=\alpha+X_{t}\beta+\delta Y_{t-1}+\epsilon_{t}. If we assume the error is very small, just calculating \hat\alpha+X_{t} \hat\beta+\hat\delta Y_{t-1} should give me values pretty close to Y_{t}. In my intuition, this should also work for the panel case with the only difference that we have \mu_i instead of \alpha which shouldn’t differ from the previous situation if we look at e.g. I=1 and ignore all the other cases. I know the true value for \mu_1 and I know the previous observation y_{1, t-1}. But if I do this after generating the data according to the code this does not work for me:
I.e., if I want to calculate Y_{i=1,t=3} given \mu_1=3.88, \delta=0.8 and Y_{i=1,t=2}=3.87 with \sigma_e=0.1 this gives me 6.98 which is not even close to the generated Y_{i=1,t=3} which is 3.83. I don’t know if I am a blockhead right now, but this really confuses me.
And my last question: you say that we generate data from Y_{i,t}=\mu_i+X_{it}\beta+\delta (Y_{it-1}-X_{it-1}\beta-\mu_i)+\epsilon_{it}. Wouldn’t this imply that we have to generate the data as:
I’m also interested in dynamic hierarchical models. I’d just like to point out that @rtrangucci 's suggestion is equivalent to estimating an AR(1) model in errors, but this is different than a “dynamic” model where the lagged dependent variable is employed as a predictor.
Specifically, an AR(1) model in errors is given by:
where \eta_{i,t} is i.i.d. white noise. This can be estimated using a non-linear least squares approach or via MLE. In contrast, a dynamic model is given by:
where \nu_{i,t} = u_{i} + \xi_{i,t}, u_{i} is a random or fixed effect, and \xi_{i,t} is i.i.d. white noise. This type of model assumes there is state-dependence of order 1 (i.e., last period’s level of the response variable affects this period’s level of the response).
Due to the presence of unobserved heterogeneity u_{i}, which is correlated with y_{i,t-1} by construction, estimation of a dynamic multilevel model is not straightforward.
I’m new to the Bayesian world, so I could be wrong, but the endogeneity of the problem precludes consistent and unbiased estimates of the coefficients. Frequentist methods generally get around this problem by first-differencing or applying some other transformation to the estimating equation. But any such transformation requires instrumental variables (i.e., higher order lags of the dependent variable) to be used to purge endogeneity between the transformed lagged dependent variable and the transformed \xi_{i,t}.
Personally I use PyMC3, but I would love to learn about a way to estimate dynamic models in a Bayesian framework. In economics, instrumental variables are often necessary due to endogeneity of some form, but the cure (instrumental variables) can sometime be worse than the disease, particularly if the instrumental variables are weakly correlated with the endogenous variable(s).
You’re right, my solution is a bit different than the question that spurred the thread. Here’s another solution for a dynamic model that I think gets the modeling right:
After going back to check my code, I realized I had another error in my model code! I edited the original post (Dynamic panel data models with Stan?) so that the code is correct. I think it’s too late to edit the original post where I made the first mistake (Dynamic panel data models with Stan?)
As for your first question, I took another crack at putting together a model with a DGP of Y_t = \alpha_i + X_t \beta + \delta_i Y_{t-1} + \epsilon_{i,t}, see the solution Dynamic panel data models with Stan?
Thanks @rtrangucci. I’m not confident that this new solution solves the problem, but perhaps I’m missing something. From what I can tell, you’re treating the unobserved fixed/random effect, u_i, as a parameter to be estimated, namely as a group-specific constant. More formally, assuming the model is
which assumes no unobserved heterogeneity through u_i. The distinction is that u_{i} is an unobserved set of group-specific traits (e.g., u_i could be brand value, operational efficiency or know-how, and y_{it} is firm i's profitability) correlated with y_{i, t-1}, whereas \beta_{0,i} is a parameter.
Thus, if we wanted to estimate the equation as-is (i.e., without transformation to eliminate the u_{i} and without instrumental variables), I think there needs to be some restriction or assumption that reflects the correlation between u_{i} and y_{i,t-1}.
I’m treating u_i as a random intercept at the group level i, or unobserved heterogeneity at the group level, so wouldn’t that imply a correlation structure between y_{i,t-1} and u_i?
But I also might be totally misunderstanding the model, in which case, I apologize!
I agree, I’m certainly imposing assumptions in order to fit this model, though I’m not eliminating the covariance between u_i and y_{i,t-1}. There’s a distributional assumption over the random intercepts, and I’m imposing stationarity as well through the assumption that \phi_i \in (-1, 1) for all i.