Incorporating lagged dependent variable into multilevel model

At the moment I have built an multilevel model with three levels to analyze a panel dataset. So my model looks basically like the following:
y_{ijt} = \beta_{0ij} + X\beta + \epsilon_{ijt}, where \beta_{0ij} is dependent on the i and j level. My stan code for this model looks like the following:

data {
  // Define variables in data
  // Number of level-1 observations (an integer)
  int<lower=0> Ni;
  // Number of level-2 clusters
  int<lower=0> Nj;
  // Number of level-3 clusters
  int<lower=0> Nk;
  // Numer of models x countries
  int<lower = 0> Njk;
  // Number of fixed effect parameters
  int<lower=0> p;
  // Number of random effect parameters
  //int<lower=0> q;
  int<lower=0> Npars;
  // Variables with fixed coefficient
  matrix[Ni,p] fixedEffectVars;
  // Cluster IDs
  int<lower=1> modelid[Ni];
  int<lower=1> countryid[Ni];
  // Level 3 look up vector for level 2
  int<lower=1> countryLookup[Npars];
  int<lower=1> countryMonthLookup[Njk];
  // Continuous outcome
  int activations[Ni];

parameters {
  // Define parameters to estimate
  // Population intercept (a real number)
  real beta_0;
  // Fixed effects
  vector[p] beta;
  // Level-1 errors
  real<lower=0> sigma_e0;
  // Level-2 random effect
  real u_0jk[Npars];
  real<lower=0> sigma_u0jk;
  // Level-3 random effect
  real u_0k[Nk];
  real<lower=0> sigma_u0k;

transformed parameters  {
  // Varying intercepts
  real beta_0jk[Npars];
  real beta_0k[Nk];
  // Individual mean
  real mu[Ni];
  // Varying intercepts definition
  // Level-3 (10 level-3 random intercepts)
  for (k in 1:Nk) {
    beta_0k[k] = beta_0 + u_0k[k];
  // Level-2 (100 level-2 random intercepts)
  for (j in 1:Npars) {
    beta_0jk[j] = beta_0k[countryLookup[j]] + u_0jk[j];
  // Individual mean
  for (i in 1:Ni) {
    mu[i] = beta_0jk[countryMonthLookup[i]]  + fixedEffectVars[i,]*beta;

model {
  // Prior part of Bayesian inference
  // Flat prior for mu (no need to specify if non-informative)
  // Random effects distribution
  u_0k  ~ normal(0, sigma_u0k);
  u_0jk ~ normal(0, sigma_u0jk);
  beta[1] ~ normal(-0.25, 1);
    // Likelihood part of Bayesian inference
  // Outcome model N(mu, sigma^2) (use SD rather than Var)
  for (i in 1:Ni) {
    activations[i] ~ normal(mu[i], sigma_e0);

Now, I want to include lagged values of the dependent variable in my first layer of the model. So, my model then becomes: y_{ijt} = \beta_{0ij} + X\beta + \rho y_{ij(t-1)} + \epsilon_{ijt}. To incorporate this I could add just another parameter to my model, but I don’t think that will because I don’t know how to model the value at t = 0 (y_{ij0}) in my model. Does anyone know how I can incorporate this into my model? There are maybe also problems with the lagged value being correlated to the error-terms. Can someone help me with this?
I searched for this across the entire web, but I did not find any solution.

In your searches, I recommend looking for the “initial conditions problem.”

Here’s a relevant example for dichotomous outcomes that might point you in the right direction:

@jeremy.koster Thanks for your reply.

But do you also have experience with adding dynamics in a bayesian multilevel model?