Linear Mixed Model (intercept only) with crossed and nested factors: historical US votes example

fabio · September 26, 2017, 3:29pm

Hi everyone,
I am tackling Gaussian Processes following the examples provided by STAN user manual and Rob Trangucci’s document (see http://mc-stan.org/events/stancon2017-notebooks/stancon2017-trangucci-hierarchical-gps.pdf).
With this post I am referring to the GP example starting at page 16.
Before coding the Gaussian Process component of the mean function, I was bogged interpreting the STAN code that you’ll find from page 21 to page 23 of Trangucci’s example.
So, I decided to dissect the model into pieces, starting with a random intercepts (only) model.
Basically I have the republican share results (dependent variable, beta distributed) of 11 elections, in 50 countries grouped into 10 regions (5 states for each regions).
So “elections” are crossed into “countries” and “countries” are nested into “regions”.
So after applying the Ferrari and Cribari-Neto re-parametrization the model should be (sorry but I did not know how to write formulas in this post forum):
$$
y_{t,j} \sim Beta(inv_logit (\mu_{t,j}), \nu) \
\mu_{t,j} = \theta^{year}t + \theta^{state}j + \theta^{region}{k[j]}\
\theta_j^{state} \sim Normal(0, \sigma^{state})\
\theta_k^{region} \sim Normal(0, \sigma^{region})\\theta_j^{state} \sim Normal(0, sigma{k[j]}^{state})
$$

Could you please point me to a book chapter with the model (with both crossed and nested factors at the same time) with the mathematical formulation and hopefully to a clear STAN code? So far I achieved a random intercept model only for
$$
\mu_{t,k} = \theta^{year}_t + \theta^{region}_k
$$

but obviously is not what I would achieve.
Thank you in advance !

bgoodri · September 26, 2017, 4:02pm

Books are unlikely to have a model that like that corresponds to how it would be done in Stan. However, you can get Stan code for it by doing

library(brms)
make_stancode(y ~ (1 | year) + (1 | state/region), family = "beta", 
              data = dataset, prior = set_prior(...))

You are going to need to specify a prior for the sigmas.

fabio · September 28, 2017, 12:39pm

@bgoodri : you’ve guided me in the right direction

data { 
  int<lower=1> N;  // total number of observations 
  vector[N] Y;  // response variable 
  // data for group-level effects of ID 1 
  int<lower=1> J_1[N]; 
  int<lower=1> N_1; 
  int<lower=1> M_1; 
  vector[N] Z_1_1; 
  // data for group-level effects of ID 2 
  int<lower=1> J_2[N]; 
  int<lower=1> N_2; 
  int<lower=1> M_2; 
  vector[N] Z_2_1; 
  // data for group-level effects of ID 3 
  int<lower=1> J_3[N]; 
  int<lower=1> N_3; 
  int<lower=1> M_3; 
  vector[N] Z_3_1; 
}

Thank you for pointing me to the automated-code-creation of the brms package.
I understand that J_[N] are the indicators (the different values of the factors for each y value).
N_ could be the number of levels for each factor.
But what about M_* and Z_*_1 ?
It’s hard for me retro-engeneering the STAN code to the math underlying such a model

bgoodri · September 28, 2017, 1:43pm

It tells you farther down in the code. M_* is the number of standard deviations that it has to estimate for that grouping factor. And the Z_ variables are the variables to the left of the | in the formula, which are often going to be 1. So, you get code in the model block like

mu[n] = mu[n] + (r_1_1[J_1[n]]) * Z_1_1[n] + (r_2_1[J_2[n]]) * Z_2_1[n];

which extracts the appropriate r_ value by ID and multiplies it by the corresponding Z_ value to update the conditional mean.

Topic		Replies	Views
DHGLM with two (nested) random intercept Modeling rstan , fitting-issues , specification	3	421	May 8, 2020
Gaussian Process out-of-sample predictive distribution Modeling gaussian-process	21	2136	December 6, 2024
Question: Gaussian Processes and Varying Intercepts/Slopes brms	3	619	September 20, 2021
Nested effects in Stan vs. stan_lmer() Modeling	5	1401	October 2, 2017
Simple intercept only model with poor chain mixing Modeling rstan	6	539	September 17, 2021

Linear Mixed Model (intercept only) with crossed and nested factors: historical US votes example

Related topics