Random Intercept Model (Non-Centered?)

Hi, forum,

So I am trying to fit a random intercept model that has a random intercept for each unit and each year in the data which looks like:

model {
data {
  int<lower=0> N; // Number of rows in data
  int<lower=0> P; // Number of cols in X  (variables in model)
  int<lower=0> C; // Number of units (eg: countries)
  int<lower=0> T; // Number of time periods (eg: years)
  int y[N]; // // Target
  matrix[N, P] x; // input features
  int<lower=1, upper=C> unit[N]; //map obs to unit
  int<lower=1, upper=T> time[N]; //map obs to time
}

parameters {
  vector[C] alpha2; // random unit intercept
  vector[T] alpha3; // random time intercept
  real alpha; // fixed intercept
  vector[P] beta; // coeficient for variables
  real <lower=0> sigma; // error sd
  real <lower=0> sigma_u; // unit hyperprior sd
  real <lower=0> sigma_t; // time hyperprior sd
  real <lower=0> sigma_i; // fixed hyperprior sd
  real mu_u; // unit hyperprior mean
  real mu_t; // time hyperprior mean
  real mu_i; // fixed hyperprior mean
}

model {
  target += normal_lpdf(mu_u|0,1);
  target += normal_lpdf(mu_t|0,1);
  target += normal_lpdf(mu_i|0,1);
  target += cauchy_lpdf(sigma_t|0,1);
  target += cauchy_lpdf(sigma_u|0,1);
  target += cauchy_lpdf(sigma_i|0,1);
  target += cauchy_lpdf(sigma|0,1);
  target += normal_lpdf(alpha|mu_i,sigma_i);
  target += normal_lpdf(alpha2|mu_u,sigma_u);
  target += normal_lpdf(alpha3|mu_t,sigma_t);
  target += normal_lpdf(beta|0,1);

  for (n in 1:N) {
    real ystar;
    ystar = alpha + alpha2[unit[n]] + alpha3[time[n]] + x[n]*beta; //
    target += normal_lpdf(y[n]| ystar,sigma);
  }
}
} 

The model above works fine, however, I am having trouble getting the model to converge (I am using 2000 iterations, going up in numbers does help but I suspect this is not the best solution.) So I decided to try using a non-centered reparametrization. Since most of the convergence problems seem to be on the intercept parameters, I suspect these are the parameters that need to be reparameterized. I coded up the mode as:

model {
data {
  int<lower=0> N; // Number of rows in data
  int<lower=0> P; // Number of cols in X  (variables in model)
  int<lower=0> C; // Number of units (eg: countries)
  int<lower=0> T; // Number of time periods (eg: years)
  int y[N]; // // Target
  matrix[N, P] x; // input features
  int<lower=1, upper=C> unit[N]; //map obs to unit
  int<lower=1, upper=T> time[N]; //map obs to time
}

parameters {
  real alpha; // fixed intercept
  vector[P] beta; // coeficient for variables
  real <lower=0> sigma; // error sd
  real <lower=0> sigma_u; // unit hyperprior sd
  real <lower=0> sigma_t; // time hyperprior sd
  real <lower=0> sigma_i; // fixed hyperprior sd
  real mu_u; // unit hyperprior mean
  real mu_t; // time hyperprior mean
  real mu_i; // fixed hyperprior mean
  vector[C] eta_u;
  vector[T] eta_t;
}

transformed parameters {
  vector[C] alpha2; // random unit intercept
  vector[T] alpha3; // random time intercept

  alpha3 = mu_t + sigma_t * eta_t;
  alpha2 = mu_u + sigma_u * eta_u;
}

model {
  target += normal_lpdf(eta_u|0,1);
  target += normal_lpdf(eta_t|0,1);
  target += normal_lpdf(mu_u|0,1);
  target += normal_lpdf(mu_t|0,1);
  target += normal_lpdf(mu_i|0,1);
  target += cauchy_lpdf(sigma_t|0,1);
  target += cauchy_lpdf(sigma_u|0,1);
  target += cauchy_lpdf(sigma_i|0,1);
  target += cauchy_lpdf(sigma|0,1);
  target += normal_lpdf(beta|0,1);

  for (n in 1:N) {
    real ystar;
    ystar = alpha + alpha2[unit[n]] + alpha3[time[n]] + x[n]*beta; //
    target += normal_lpdf(y[n]| ystar,sigma);
  }
}
} 

However, this version of the model is running incredibly slow on my data. I am not sure if I made a mistake in the reparameterization or what the problem is. If anyone has any suggestions, hints, ideas, or know where I have coded something wrong please let me know. Also, let me know if more information is needed.

Thank you in advance, and sorry if my question is not clear!

So a question and some recommendations. First, is there some reason you don’t have a prior set on the population-level intercept \alpha? That may very well be what is causing some of your convergence issues (or at the very least it’s sure not helping them).

Second, replacing normal_lpdf(parameter | 0, 1) with the more computationally efficient std_normal_lpdf(parameter) and vectorizing the loop in your likelihood calculation should give you some decent speed gains. For models with a gaussian likelihood and identity link, you can use normal_id_glm_lpdf

Based on what you’ve provided above, your model is something like this

data {
  int<lower=1> N;  // Number of observations
  vector[N] Y;  // Response vector
  int<lower=1> K;   // Number of Predictors
  matrix[N, K] X;  // Design Matrix for the Predictors
  // Data for Country-Level Random Effects
  int<lower=1> J;  // Number of Countries J
  int<lower=1> J_K;  // Number of Country-Level Coefficients
  int<lower=1> jj[N];  // Mapping Countries to Observations
  // Country-Level Predictor Values
  vector[N] Z_J_1; // If no varying slopes, this is just a vector of 1s
  // Data for Time-Level Random Effects
  int<lower=1> T;  // Number of Years (or Country-Years) T
  int<lower=1> T_K;  // Number of Time-Level Coefficients
  int<lower=1> tt[N];  // Mapping Time to Observations
  // Time-Level Predictor Values
  vector[N] Z_T_1;  // If no varying slopes, this is just a vector of 1s
}

parameters {
  vector[K] beta;  // Population-Level Regression Coefficients
  real alpha; // Population-Level Intercept
  real<lower=0> sigma;  // Population-Level Dispersion Parameter
  vector<lower=0>[J_K] sigma_j;  // Country-Level Standard Deviations
  vector[J] z_j[J_K];  // Standardized Country-Level Effects
  vector<lower=0>[T_K] sigma_t;  //  Time-Level Standard Deviations
  vector[T] z_t[T_K];  // Standardized Time-Level Effects
}

transformed parameters {
  vector[J] alpha_country;  //  Country-Level Effects
  vector[T] alpha_year;  // Time-Level Effects
  alpha_country = (sigma_j[1] * (z_j[1]));
  alpha_year = (sigma_t[1] * (z_t[1]));
}

model {
  // Likelihood
  target += normal_id_glm_lpdf(Y | X,  alpha + alpha_country[jj] .* Z_J_1 + alpha_year[tt] .* Z_T_1, beta, sigma);
  // Priors for the Model Parameters
  target += cauchy_lpdf(sigma | 0, 1) ;
  target += cauchy_lpdf(sigma_j | 0, 1);
  target += cauchy_lpdf(sigma_t | 0, 1);
  target += std_normal_lpdf(z_j[1]);
  target += std_normal_lpdf(z_t[1]);
  target += std_normal_lpdf(beta);
  target += student_t_lpdf(alpha | 3, 0, 2); // Change this to whatever your prior for the intercept should be
}

You can implement the non-centered parameterisation a bit more simply by using the offset and multiplier construction, where offset is the mean parameter and multiplier is the SD parameter.

Additionally, you can vectorise your likelihood very efficiently by using the normal_id_glm likelihood as suggested by @ajnafa. This likelihood can also be GPU-accelerated if you’re working with large datasets.

Putting these suggestions together, a new non-centered model would look like:


data {
  int<lower=0> N; // Number of rows in data
  int<lower=0> P; // Number of cols in X  (variables in model)
  int<lower=0> C; // Number of units (eg: countries)
  int<lower=0> T; // Number of time periods (eg: years)
  vector[N] y; // // Target
  matrix[N, P] x; // input features
  int<lower=1, upper=C> unit[N]; //map obs to unit
  int<lower=1, upper=T> time[N]; //map obs to time
}

parameters {
  vector[P] beta; // coeficient for variables
  real <lower=0> sigma; // error sd
  real <lower=0> sigma_u; // unit hyperprior sd
  real <lower=0> sigma_t; // time hyperprior sd
  real <lower=0> sigma_i; // fixed hyperprior sd
  real mu_u; // unit hyperprior mean
  real mu_t; // time hyperprior mean
  real mu_i; // fixed hyperprior mean
  real<offset=mu_i, multiplier=sigma_i> alpha; // fixed intercept
  vector<offset=mu_u, multiplier=sigma_u>[C] alpha2; // random unit intercept
  vector<offset=mu_t, multiplier=sigma_t>[T] alpha3; // random time intercept
}

model {
  target += std_normal_lpdf(mu_u);
  target += std_normal_lpdf(mu_t);
  target += std_normal_lpdf(mu_i);
  target += cauchy_lpdf(sigma_t|0,1);
  target += cauchy_lpdf(sigma_u|0,1);
  target += cauchy_lpdf(sigma_i|0,1);
  target += cauchy_lpdf(sigma|0,1);
  target += normal_lpdf(alpha|mu_i,sigma_i);
  target += normal_lpdf(alpha2|mu_u,sigma_u);
  target += normal_lpdf(alpha3|mu_t,sigma_t);
  target += std_normal_lpdf(beta);

  target += normal_id_glm_lpdf(y | x, alpha + alpha2[unit] + alpha3[time], beta, sigma);
}
1 Like

Thanks! ill try out these suggestions and report back what worked best.

@ajnafa nope no reason that was a typo it should be there.

1 Like

Are there any gains in computational efficiency with the offset multiplier approach or is it mainly just that it’s syntactically cleaner and easier to understand?

I believe it’s computationally identical (or mostly so), so the main benefit is the reduction in syntax/specification. Although it also means that the additional ‘raw’ (std normal) variable isn’t needed, which can reduce the ram usage and results size for larger models

1 Like