Displaying three-level multilevel model in vector notation

I estimated a three-level multilevel model in stan but I have some trouble writing it correctly in terms of vectors and matrices. Now, I have the following:

\begin{equation} \begin{gathered} y_{ijt} = \boldsymbol{z}_{ijt}'\boldsymbol{\pi}_{ij} + \boldsymbol{a}'\boldsymbol{\theta} + \varepsilon_{ijt} , \\ \boldsymbol{\pi}_{ij} = \boldsymbol{\beta}_{j}'\boldsymbol{x}_{ij} + \boldsymbol{\eta}_{ij}, \\ \boldsymbol{\beta}_{j} = \boldsymbol{\Gamma}'\boldsymbol{w}_j + \boldsymbol{u}_{j}. \end{gathered} \end{equation}
With dimensions:
\boldsymbol{z}_{ijt}: p \times 1
\boldsymbol{\pi}_{ij}: p \times 1
\boldsymbol{a}: g \times 1
\boldsymbol{\theta}: g \times 1
\boldsymbol{\beta}_{j}: q \times p
\boldsymbol{x}_{ij}: q \times 1
\boldsymbol{\Gamma}: s\times q
\boldsymbol{w}_j: s \times p
So, there are p variables in the first level, q in the second and s in the third. However, to get the dimensions right my \boldsymbol{w}_j has to have p columns, but I do not know how to incorporate this in stan, since I would think that I only have a \boldsymbol{w}_j with size s \times 1 in the third level (so s variables with only one value each). Now, I could use p equal columns for \boldsymbol{w}_j, but I do not know whether this is right. Can someone explain this to me?

Next to that, I can define the joint posterior to be:
p(\boldsymbol{\pi}_{ij}, \boldsymbol{\beta}_{j}, \boldsymbol{\Gamma}| \boldsymbol{y}) \propto p(\boldsymbol{y}| \boldsymbol{\pi_{ij}})p(\boldsymbol{\pi_{ij}}|\boldsymbol{\beta}_j)p(\boldsymbol{\beta_{j}}|\boldsymbol{\Gamma})p(\boldsymbol{\Gamma}).
But can someone explain how to incorporate the fixed effect part \boldsymbol{a}'\boldsymbol{\theta} in this equation?

Thanks in advance for the help!

At a brief glance, I would say you would have p(y|\pi_{ij},\alpha^\prime,\theta) and $p(\alpha^\prime)p(\theta)

I don’t really understand the first part of the question - you seem to be worried at the same time about ways to lay out your model in mathematical formalism (presumably for publication) and about the implementation in Stan, which I find weird - it definitely does not make sense to adjust you Stan model because you cannot easily express it as a math formula. Could you elaborate a bit more about the background and what the problem actually is?

I would also place a bet that very few people actually read those mathematical descriptions of models and many would rather see the Stan code, but I understand there is pressure to write those formulas down anyway.

@martinmodrak Thanks for your reply!

At a brief glance, I would say you would have
p(y|π_{ij},α′θ) and $p(\alpha^\prime)p(\theta)

So, using this approach I only have to replace p(\boldsymbol{y}|\boldsymbol{\pi}_{ij}, \boldsymbol{\theta})p(\boldsymbol{\theta}), because \boldsymbol{\alpha} is only a variable vector so I don’t think that should enter my equation.

Concerning the first part of my question. I, indeed, want to write my model as generic as possible in formula form. However, I am not sure about the third line with \boldsymbol{w}_{j} being a matrix. My stan code for this model is relatively specific for one setting, so I have problems writing it down in this way. Very appreciated if you could help a bit!

I find your question is hard to answer in the abstract, without access to the Stan code in question. Also remember that almost every operation in Stan code can be written in the formula formalism, so you can stay on the safe side by just replicating it (e.g. using explicit sums or products to mirror for loops in your code instead of trying to figure out the vector/matrix notation). If it is hard for you to write correctly, it likely will be difficult for others to read correctly.

The (transformed) parameters blocks look like:

parameters {
  // Define parameters to estimate
  // Population intercept (a real number)
  real beta_0;
  //real beta_1;
  
  // Fixed effects
  vector[p] beta;
  
  // Level-1 errors
  real<lower=0> sigma_e0;
  
  // Level-2 random effect
  real u_0jk[Npars];
  real<lower=0> sigma_u0jk;
  //real u_1jk[Npars];
  //real<lower=0> sigma_u1jk;
  
  // Level-3 random effect
  real u_0k[Nk];
  real<lower=0> sigma_u0k;
  //real u_1k[Nk];
  //real<lower=0> sigma_u1k;
}

transformed parameters  {
  // Varying intercepts
  real beta_0jk[Npars];
  real beta_0k[Nk];
  //real beta_1jk[Npars];
  //real beta_1k[Nk];
  
  // Individual mean
  real mu[Ni];
  
  // Varying intercepts definition
  // Level-3 (10 level-3 random intercepts)
  for (k in 1:Nk) {
    beta_0k[k] = beta_0 + u_0k[k];
    //beta_1k[k] = beta_1 + u_1k[k];
  }
  // Level-2 (100 level-2 random intercepts)
  for (j in 1:Npars) {
    beta_0jk[j] = beta_0k[countryLookup[j]] + u_0jk[j];
    //beta_1jk[j] = beta_1k[countryLookup[j]] + u_1jk[j];
  }
  // Individual mean
  for (i in 1:Ni) {
    mu[i] = beta_0jk[countryMonthLookup[i]] + //beta_1jk[countryMonthLookup[i]]*randomEffectVars[i,1] + 
      fixedEffectVars[i,]*beta;

  }
}

where I commented just another group level variable. In the model block I just use mu[i] as mean in a normal distribution.

What I would do is to rename the variables in Stan code for more clarity e.g. beta_0k to gamma and beta_0jk to delta and then just write things down directly as e.g. (just writing down a part of the equations):

\mu_i = \delta_{countryMonth(i)} + ... \\ \delta_j = \gamma_{country(j)} + ... \\

I generally dislike when the math notation in paper does not match the notation in code as it makes the whole thing impenetrable, so matching the Stan with the formulas code is IMHO both easy and desirable.

Thanks for your reply. However, problem with all the summation notation is that I can’t write down the joint posterior distribution in a nice way. This is the case with vector notation

I might be a bit biased here, since I’ve personally never found those joint distribution equations in papers helpful and didn’t understand why people write them - the papers not having them seem to do just as well.

Anyway, I don’t see how the way I proposed interferes with the joint posterior - those are mostly separate, as the joint posterior expresses which conditional independences hold and the “other” formula describe how the individual conditional distributions work, so the way I proposed it you could still write

p(\beta,\gamma,\delta | y) \propto p(y | \delta)p(\delta | \gamma)p(\gamma)

or am I missing something?