Random Intercept Model (Non-Centered?)

LeoGFernandes · October 17, 2021, 11:29pm

Hi, forum,

So I am trying to fit a random intercept model that has a random intercept for each unit and each year in the data which looks like:

model {
data {
  int<lower=0> N; // Number of rows in data
  int<lower=0> P; // Number of cols in X  (variables in model)
  int<lower=0> C; // Number of units (eg: countries)
  int<lower=0> T; // Number of time periods (eg: years)
  int y[N]; // // Target
  matrix[N, P] x; // input features
  int<lower=1, upper=C> unit[N]; //map obs to unit
  int<lower=1, upper=T> time[N]; //map obs to time
}

parameters {
  vector[C] alpha2; // random unit intercept
  vector[T] alpha3; // random time intercept
  real alpha; // fixed intercept
  vector[P] beta; // coeficient for variables
  real <lower=0> sigma; // error sd
  real <lower=0> sigma_u; // unit hyperprior sd
  real <lower=0> sigma_t; // time hyperprior sd
  real <lower=0> sigma_i; // fixed hyperprior sd
  real mu_u; // unit hyperprior mean
  real mu_t; // time hyperprior mean
  real mu_i; // fixed hyperprior mean
}

model {
  target += normal_lpdf(mu_u|0,1);
  target += normal_lpdf(mu_t|0,1);
  target += normal_lpdf(mu_i|0,1);
  target += cauchy_lpdf(sigma_t|0,1);
  target += cauchy_lpdf(sigma_u|0,1);
  target += cauchy_lpdf(sigma_i|0,1);
  target += cauchy_lpdf(sigma|0,1);
  target += normal_lpdf(alpha|mu_i,sigma_i);
  target += normal_lpdf(alpha2|mu_u,sigma_u);
  target += normal_lpdf(alpha3|mu_t,sigma_t);
  target += normal_lpdf(beta|0,1);

  for (n in 1:N) {
    real ystar;
    ystar = alpha + alpha2[unit[n]] + alpha3[time[n]] + x[n]*beta; //
    target += normal_lpdf(y[n]| ystar,sigma);
  }
}
}

The model above works fine, however, I am having trouble getting the model to converge (I am using 2000 iterations, going up in numbers does help but I suspect this is not the best solution.) So I decided to try using a non-centered reparametrization. Since most of the convergence problems seem to be on the intercept parameters, I suspect these are the parameters that need to be reparameterized. I coded up the mode as:

model {
data {
  int<lower=0> N; // Number of rows in data
  int<lower=0> P; // Number of cols in X  (variables in model)
  int<lower=0> C; // Number of units (eg: countries)
  int<lower=0> T; // Number of time periods (eg: years)
  int y[N]; // // Target
  matrix[N, P] x; // input features
  int<lower=1, upper=C> unit[N]; //map obs to unit
  int<lower=1, upper=T> time[N]; //map obs to time
}

parameters {
  real alpha; // fixed intercept
  vector[P] beta; // coeficient for variables
  real <lower=0> sigma; // error sd
  real <lower=0> sigma_u; // unit hyperprior sd
  real <lower=0> sigma_t; // time hyperprior sd
  real <lower=0> sigma_i; // fixed hyperprior sd
  real mu_u; // unit hyperprior mean
  real mu_t; // time hyperprior mean
  real mu_i; // fixed hyperprior mean
  vector[C] eta_u;
  vector[T] eta_t;
}

transformed parameters {
  vector[C] alpha2; // random unit intercept
  vector[T] alpha3; // random time intercept

  alpha3 = mu_t + sigma_t * eta_t;
  alpha2 = mu_u + sigma_u * eta_u;
}

model {
  target += normal_lpdf(eta_u|0,1);
  target += normal_lpdf(eta_t|0,1);
  target += normal_lpdf(mu_u|0,1);
  target += normal_lpdf(mu_t|0,1);
  target += normal_lpdf(mu_i|0,1);
  target += cauchy_lpdf(sigma_t|0,1);
  target += cauchy_lpdf(sigma_u|0,1);
  target += cauchy_lpdf(sigma_i|0,1);
  target += cauchy_lpdf(sigma|0,1);
  target += normal_lpdf(beta|0,1);

  for (n in 1:N) {
    real ystar;
    ystar = alpha + alpha2[unit[n]] + alpha3[time[n]] + x[n]*beta; //
    target += normal_lpdf(y[n]| ystar,sigma);
  }
}
}

However, this version of the model is running incredibly slow on my data. I am not sure if I made a mistake in the reparameterization or what the problem is. If anyone has any suggestions, hints, ideas, or know where I have coded something wrong please let me know. Also, let me know if more information is needed.

Thank you in advance, and sorry if my question is not clear!

ajnafa · October 18, 2021, 2:31am

So a question and some recommendations. First, is there some reason you don’t have a prior set on the population-level intercept \alpha? That may very well be what is causing some of your convergence issues (or at the very least it’s sure not helping them).

Second, replacing normal_lpdf(parameter | 0, 1) with the more computationally efficient std_normal_lpdf(parameter) and vectorizing the loop in your likelihood calculation should give you some decent speed gains. For models with a gaussian likelihood and identity link, you can use normal_id_glm_lpdf

Based on what you’ve provided above, your model is something like this

data {
  int<lower=1> N;  // Number of observations
  vector[N] Y;  // Response vector
  int<lower=1> K;   // Number of Predictors
  matrix[N, K] X;  // Design Matrix for the Predictors
  // Data for Country-Level Random Effects
  int<lower=1> J;  // Number of Countries J
  int<lower=1> J_K;  // Number of Country-Level Coefficients
  int<lower=1> jj[N];  // Mapping Countries to Observations
  // Country-Level Predictor Values
  vector[N] Z_J_1; // If no varying slopes, this is just a vector of 1s
  // Data for Time-Level Random Effects
  int<lower=1> T;  // Number of Years (or Country-Years) T
  int<lower=1> T_K;  // Number of Time-Level Coefficients
  int<lower=1> tt[N];  // Mapping Time to Observations
  // Time-Level Predictor Values
  vector[N] Z_T_1;  // If no varying slopes, this is just a vector of 1s
}

parameters {
  vector[K] beta;  // Population-Level Regression Coefficients
  real alpha; // Population-Level Intercept
  real<lower=0> sigma;  // Population-Level Dispersion Parameter
  vector<lower=0>[J_K] sigma_j;  // Country-Level Standard Deviations
  vector[J] z_j[J_K];  // Standardized Country-Level Effects
  vector<lower=0>[T_K] sigma_t;  //  Time-Level Standard Deviations
  vector[T] z_t[T_K];  // Standardized Time-Level Effects
}

transformed parameters {
  vector[J] alpha_country;  //  Country-Level Effects
  vector[T] alpha_year;  // Time-Level Effects
  alpha_country = (sigma_j[1] * (z_j[1]));
  alpha_year = (sigma_t[1] * (z_t[1]));
}

model {
  // Likelihood
  target += normal_id_glm_lpdf(Y | X,  alpha + alpha_country[jj] .* Z_J_1 + alpha_year[tt] .* Z_T_1, beta, sigma);
  // Priors for the Model Parameters
  target += cauchy_lpdf(sigma | 0, 1) ;
  target += cauchy_lpdf(sigma_j | 0, 1);
  target += cauchy_lpdf(sigma_t | 0, 1);
  target += std_normal_lpdf(z_j[1]);
  target += std_normal_lpdf(z_t[1]);
  target += std_normal_lpdf(beta);
  target += student_t_lpdf(alpha | 3, 0, 2); // Change this to whatever your prior for the intercept should be
}

andrjohns · October 18, 2021, 3:44am

You can implement the non-centered parameterisation a bit more simply by using the offset and multiplier construction, where offset is the mean parameter and multiplier is the SD parameter.

Additionally, you can vectorise your likelihood very efficiently by using the normal_id_glm likelihood as suggested by @ajnafa. This likelihood can also be GPU-accelerated if you’re working with large datasets.

Putting these suggestions together, a new non-centered model would look like:


data {
  int<lower=0> N; // Number of rows in data
  int<lower=0> P; // Number of cols in X  (variables in model)
  int<lower=0> C; // Number of units (eg: countries)
  int<lower=0> T; // Number of time periods (eg: years)
  vector[N] y; // // Target
  matrix[N, P] x; // input features
  int<lower=1, upper=C> unit[N]; //map obs to unit
  int<lower=1, upper=T> time[N]; //map obs to time
}

parameters {
  vector[P] beta; // coeficient for variables
  real <lower=0> sigma; // error sd
  real <lower=0> sigma_u; // unit hyperprior sd
  real <lower=0> sigma_t; // time hyperprior sd
  real <lower=0> sigma_i; // fixed hyperprior sd
  real mu_u; // unit hyperprior mean
  real mu_t; // time hyperprior mean
  real mu_i; // fixed hyperprior mean
  real<offset=mu_i, multiplier=sigma_i> alpha; // fixed intercept
  vector<offset=mu_u, multiplier=sigma_u>[C] alpha2; // random unit intercept
  vector<offset=mu_t, multiplier=sigma_t>[T] alpha3; // random time intercept
}

model {
  target += std_normal_lpdf(mu_u);
  target += std_normal_lpdf(mu_t);
  target += std_normal_lpdf(mu_i);
  target += cauchy_lpdf(sigma_t|0,1);
  target += cauchy_lpdf(sigma_u|0,1);
  target += cauchy_lpdf(sigma_i|0,1);
  target += cauchy_lpdf(sigma|0,1);
  target += normal_lpdf(alpha|mu_i,sigma_i);
  target += normal_lpdf(alpha2|mu_u,sigma_u);
  target += normal_lpdf(alpha3|mu_t,sigma_t);
  target += std_normal_lpdf(beta);

  target += normal_id_glm_lpdf(y | x, alpha + alpha2[unit] + alpha3[time], beta, sigma);
}

LeoGFernandes · October 18, 2021, 11:36am

Thanks! ill try out these suggestions and report back what worked best.

@ajnafa nope no reason that was a typo it should be there.

ajnafa · October 18, 2021, 3:55pm

Are there any gains in computational efficiency with the offset multiplier approach or is it mainly just that it’s syntactically cleaner and easier to understand?

andrjohns · October 18, 2021, 4:19pm

I believe it’s computationally identical (or mostly so), so the main benefit is the reduction in syntax/specification. Although it also means that the additional ‘raw’ (std normal) variable isn’t needed, which can reduce the ram usage and results size for larger models

Topic		Replies	Views
Non-Centered Parameterization - Divergence issues w PyStan hierarchical model Modeling specification , hierarchical-model , divergences , reparametrization	3	796	April 4, 2022
Question on mutlivariate non-centered parametrizations Modeling rstan , techniques , fitting-issues , reparametrization	2	669	June 17, 2022
Non Centered Parameterization in GP Model Modeling	18	2175	June 20, 2017
Random intercepts Explain Almost All Variance Modeling rstan , specification , brms	7	835	October 24, 2023
Problems with non-centered variance parameters (Matt tick) Modeling performance	0	343	January 6, 2019

Random Intercept Model (Non-Centered?)

Related topics