Creating a hierarchical model (multilevel)

pythonicpyrate · June 11, 2020, 3:00pm

Hello,

I am a new Stan’s user and I would like to ask for helping to create a multilevel model.

My background is based on the Andrew Gelman books and papers about this subject. I also explored the Youtube videos about the theme such as Jonah Gabry & Lauren Kennedy Pestcontrol example.

The dataset that I am using is Males from the Ecdat package.

The help file description of the 12 variables in the dataset is as follows:

nr subject identifier
year year (1980-1987)
school years of schooling
exper year of experience (=age-6- school )
union factor; whether wage was set by collective bargaining
ethn factor with 3 levels
maried factor marital status
health factor for presence of health problems
wage log of hourly wage
industry factor with 12 levels
occupation factor with 9 levels
residence factor with 4 levels

The questions that I would like to answer are:

1 - I would like to build a model for studying the effects of covariates on wage with rstan package, taking into account the hierarchical structure of the data. In particular the relation between unionism and wage.
2 - Implement the effect of time in the Model.
3 - How could I check the model fit by using the proper tools of posterior predictive checking?

The simplest model that I could run was:


data {
  int<lower=0> N; // number of observations
  vector[N] x; // union
  vector[N] y; // wage
}

parameters {
  real alpha;
  real beta;
  real<lower=0> sigma;
}

model {
  y ~ normal(alpha + beta * x, sigma);
}

pythonicpyrate · June 27, 2020, 9:05am

Hello,

I created a partial pooling code in Stan. Could someone tell me whether I made some mistake ?

wage = \alpha_{j[i]} + \beta * Union_i + \epsilon_i
\epsilon \sim N(0,\sigma_y^2)
\alpha_{j,[i]} \sim N(\mu_\alpha , \sigma_\alpha^2)

The code follows belllow:


data {
  int<lower=0> N; // number of obs
  int<lower=0> J; // Number of years
  vector[N] y; // outcome data (log wage)
  int<lower=0,upper=1> x[N]; // Input data (union)
  int year[N]; //year ID variable
}
parameters {
  real a[J]; // year j intercept
  real b;                           
  real mu_a; // prior on alpha
  real<lower=0> sigma_y; // prior sigma of y in year j
  real<lower=0> sigma_a; // hyperparameter sigma of alpha
}
model {
  a ~ normal(mu_a, sigma_a);            
  for (n in 1:N)
    y[n] ~ normal(a[year[n]] + b * x[n], sigma_y);
}

generated quantities {
 vector[N] y_hat;
  for (i in 1:N)
    y_hat[i] = normal_rng(a[year[i]] + b * x[i], sigma_y);
}

Input data:

List of 5

N : int 4360
J : int 8 (years from 1980 to 1987)
y : num [1:4360] 1.2 1.85 1.34 1.43 1.57 … (log wage values)
x : num [1:4360] 0 1 0 0 0 0 0 0 0 0 … ( categorical variable 0 - No union; 1 - Yes Union)
year: num [1:4360] 1980 1981 1982 1983 1984 …

Results:

    mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat

a[1] 1.35 0 0.02 1.31 1.34 1.35 1.37 1.39 2726 1
a[2] 1.47 0 0.02 1.43 1.45 1.47 1.48 1.51 3243 1
a[3] 1.53 0 0.02 1.48 1.51 1.53 1.54 1.57 2665 1
a[4] 1.57 0 0.02 1.53 1.56 1.57 1.59 1.62 3843 1
a[5] 1.64 0 0.02 1.60 1.63 1.64 1.66 1.69 3546 1
a[6] 1.70 0 0.02 1.66 1.68 1.70 1.71 1.74 3069 1
a[7] 1.76 0 0.02 1.72 1.74 1.76 1.77 1.80 2340 1
a[8] 1.81 0 0.02 1.77 1.80 1.81 1.83 1.86 3033 1
b 0.18 0 0.02 0.15 0.17 0.18 0.20 0.22 3052 1
sigma_y 0.51 0 0.01 0.50 0.50 0.51 0.51 0.52 4094 1

Comparing with the brms, it is the same result for the formula = wage ~ (1|year) + union

coef(bmod4)
$year
, , Intercept

 Estimate  Est.Error     Q2.5    Q97.5

1980 1.351247 0.02202636 1.308087 1.393600
1981 1.468648 0.02156949 1.426353 1.510592
1982 1.525916 0.02143721 1.482440 1.567273
1983 1.574645 0.02196081 1.531361 1.617394
1984 1.643587 0.02134702 1.601799 1.686262
1985 1.696632 0.02230993 1.653166 1.740112
1986 1.758255 0.02235138 1.714098 1.802303
1987 1.814969 0.02215333 1.771203 1.857841

Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 1.60 0.07 1.46 1.75 1.00 923 1379
unionyes 0.18 0.02 0.15 0.22 1.00 2620 2082

andre.pfeuffer · June 27, 2020, 10:13am

I think at least one point you have to make more clear:

wage = \alpha_{j[i]} + \beta * Union_i + \epsilon_i\\ \epsilon \sim N(0,\sigma_y^2)

You write \epsilon_i and \epsilon. Is your measurement variance heterogeneous, eg.
your variance is different each year or same for all years?

pythonicpyrate · June 29, 2020, 5:33pm

I am sorry, it is only \epsilon, without i

pythonicpyrate · July 27, 2020, 4:15pm

Hello,

I would like to create different models using this data set:

1 - Pooling
2 - no pooling
3 - partial pooling.

From the model that I have already described, implement with multiple varying intercepts and slopes.

A question that I have about the groups when applying hierarchical models in Stan:
From the examples that I have seen, the groups might be states or other geographic divisions, however is it possible to take into account the years as groups ?

Topic		Replies	Views
Multilevel model with two predictors and their interaction Modeling rstan , specification	3	105	October 2, 2024
Multi-level multivariate normal model Modeling	4	5013	April 3, 2018
Help with simple multilevel model code [solved] Modeling specification	5	1208	February 10, 2018
Multilevel hierarchical linear regression model with nested predictors Modeling	1	1561	January 9, 2018
Understanding Multilevel modelling STAN code Modeling	2	177	April 26, 2024

Creating a hierarchical model (multilevel)

Related topics