Linear Mixed Model (intercept only) with crossed and nested factors: historical US votes example

Hi everyone,
I am tackling Gaussian Processes following the examples provided by STAN user manual and Rob Trangucci’s document (see http://mc-stan.org/events/stancon2017-notebooks/stancon2017-trangucci-hierarchical-gps.pdf).
With this post I am referring to the GP example starting at page 16.
Before coding the Gaussian Process component of the mean function, I was bogged interpreting the STAN code that you’ll find from page 21 to page 23 of Trangucci’s example.
So, I decided to dissect the model into pieces, starting with a random intercepts (only) model.
Basically I have the republican share results (dependent variable, beta distributed) of 11 elections, in 50 countries grouped into 10 regions (5 states for each regions).
So “elections” are crossed into “countries” and “countries” are nested into “regions”.
So after applying the Ferrari and Cribari-Neto re-parametrization the model should be (sorry but I did not know how to write formulas in this post forum):
$$
y_{t,j} \sim Beta(inv_logit (\mu_{t,j}), \nu) \
\mu_{t,j} = \theta^{year}t + \theta^{state}j + \theta^{region}{k[j]}\
\theta_j^{state} \sim Normal(0, \sigma^{state})\
\theta_k^{region} \sim Normal(0, \sigma^{region})\\theta_j^{state} \sim Normal(0, sigma
{k[j]}^{state})
$$

Could you please point me to a book chapter with the model (with both crossed and nested factors at the same time) with the mathematical formulation and hopefully to a clear STAN code? So far I achieved a random intercept model only for
$$
\mu_{t,k} = \theta^{year}_t + \theta^{region}_k
$$

but obviously is not what I would achieve.
Thank you in advance !

Books are unlikely to have a model that like that corresponds to how it would be done in Stan. However, you can get Stan code for it by doing

library(brms)
make_stancode(y ~ (1 | year) + (1 | state/region), family = "beta", 
              data = dataset, prior = set_prior(...))

You are going to need to specify a prior for the sigmas.

@bgoodri : you’ve guided me in the right direction

data { 
  int<lower=1> N;  // total number of observations 
  vector[N] Y;  // response variable 
  // data for group-level effects of ID 1 
  int<lower=1> J_1[N]; 
  int<lower=1> N_1; 
  int<lower=1> M_1; 
  vector[N] Z_1_1; 
  // data for group-level effects of ID 2 
  int<lower=1> J_2[N]; 
  int<lower=1> N_2; 
  int<lower=1> M_2; 
  vector[N] Z_2_1; 
  // data for group-level effects of ID 3 
  int<lower=1> J_3[N]; 
  int<lower=1> N_3; 
  int<lower=1> M_3; 
  vector[N] Z_3_1; 
}

Thank you for pointing me to the automated-code-creation of the brms package.
I understand that J_[N] are the indicators (the different values of the factors for each y value).
N_
could be the number of levels for each factor.
But what about M_* and Z_*_1 ?
It’s hard for me retro-engeneering the STAN code to the math underlying such a model

It tells you farther down in the code. M_* is the number of standard deviations that it has to estimate for that grouping factor. And the Z_ variables are the variables to the left of the | in the formula, which are often going to be 1. So, you get code in the model block like

mu[n] = mu[n] + (r_1_1[J_1[n]]) * Z_1_1[n] + (r_2_1[J_2[n]]) * Z_2_1[n];

which extracts the appropriate r_ value by ID and multiplies it by the corresponding Z_ value to update the conditional mean.

1 Like