I am a new Stan’s user and I would like to ask for helping to create a multilevel model.

My background is based on the Andrew Gelman books and papers about this subject. I also explored the Youtube videos about the theme such as Jonah Gabry & Lauren Kennedy Pestcontrol example.

The dataset that I am using is Males from the Ecdat package.

The help file description of the 12 variables in the dataset is as follows:

nr subject identifier

year year (1980-1987)

school years of schooling

exper year of experience (=age-6- school )

union factor; whether wage was set by collective bargaining

ethn factor with 3 levels

maried factor marital status

health factor for presence of health problems

wage log of hourly wage

industry factor with 12 levels

occupation factor with 9 levels

residence factor with 4 levels

The questions that I would like to answer are:

1 - I would like to build a model for studying the effects of covariates on wage with rstan package, taking into account the hierarchical structure of the data. In particular the relation between unionism and wage.
2 - Implement the effect of time in the Model.
3 - How could I check the model fit by using the proper tools of posterior predictive checking?

The simplest model that I could run was:

data {
int<lower=0> N; // number of observations
vector[N] x; // union
vector[N] y; // wage
}
parameters {
real alpha;
real beta;
real<lower=0> sigma;
}
model {
y ~ normal(alpha + beta * x, sigma);
}

data {
int<lower=0> N; // number of obs
int<lower=0> J; // Number of years
vector[N] y; // outcome data (log wage)
int<lower=0,upper=1> x[N]; // Input data (union)
int year[N]; //year ID variable
}
parameters {
real a[J]; // year j intercept
real b;
real mu_a; // prior on alpha
real<lower=0> sigma_y; // prior sigma of y in year j
real<lower=0> sigma_a; // hyperparameter sigma of alpha
}
model {
a ~ normal(mu_a, sigma_a);
for (n in 1:N)
y[n] ~ normal(a[year[n]] + b * x[n], sigma_y);
}
generated quantities {
vector[N] y_hat;
for (i in 1:N)
y_hat[i] = normal_rng(a[year[i]] + b * x[i], sigma_y);
}

Input data:

List of 5

N : int 4360

J : int 8 (years from 1980 to 1987)

y : num [1:4360] 1.2 1.85 1.34 1.43 1.57 … (log wage values)

x : num [1:4360] 0 1 0 0 0 0 0 0 0 0 … ( categorical variable 0 - No union; 1 - Yes Union)

I would like to create different models using this data set:

1 - Pooling
2 - no pooling
3 - partial pooling.

From the model that I have already described, implement with multiple varying intercepts and slopes.

A question that I have about the groups when applying hierarchical models in Stan:
From the examples that I have seen, the groups might be states or other geographic divisions, however is it possible to take into account the years as groups ?