Hello,
I was reading a paper by Bafumi and Gelman where they showcase a simple method for adjusting for correlations between predictors and group-level effects. I tried to implement this in a model I am currently working on for school. The objective is basically to perform MrP to predict cycling in small areas in Europe. However, when I apply the simple correction, the model takes incredibly long to fit (around 3 hours on 20 thousand rows; when I do it without the correction the time goes to around 40 minutes to 1 hour), Here is the code:
data{
int<lower = 0> N; // Number of rows
array[N] int<lower = 0, upper = 1> y; // Outcome variable (cycling)
array[N] int<lower = 1, upper = 7> age_id; // Age ID
array[N] int<lower = 1, upper = 246> nuts_id; // Unique nuts region identifier
int<lower=0> K; // Number of fixed effects predictors
matrix[N, K] x; // Fixed effects matrix
matrix[246, K] fe_adj; // [N_regions, N_fixed effects] median fixed effects by region
matrix[7, K] fe_adj_age; // [N_category, N_fixed effects] median fixed effects by age category
}
parameters{
real alpha; // Global intercept
// Random effects: Creating scale and intercept parameters
real<lower = 0> gamma_age_scale;
vector[7] gamma_age;
real<lower = 0> gamma_region_scale;
vector[246] gamma_region;
// Fixed effects coefficients
vector[K] beta;
vector[K] adj_beta;
vector[K] adj_beta_age;
}
transformed parameters{
vector[N] mu;
// Non-centred parametrisation
vector[7] gamma_age_star = gamma_age * gamma_age_scale + fe_adj_age * adj_beta_age;
vector[246] gamma_region_star = gamma_region * gamma_region_scale + fe_adj * adj_beta;
mu = alpha + gamma_age_star[age_id] + gamma_region_star[nuts_id] + x * beta;
}
model{
alpha ~ normal(0, 5);
// Standard random effects
to_vector(gamma_age) ~ normal(0, 1);
gamma_age_scale ~ normal(0, 5);
to_vector(gamma_region) ~ normal(0, 1);
gamma_region_scale ~ normal(0, 5);
beta ~ normal(0, 5);
adj_beta ~ normal(0, 3);
adj_beta_age ~ normal(0, 3);
y ~ bernoulli_logit(mu);
}
generated quantities{
vector[N] log_lik; // log likelihood for model checking and comparison later
for (n in 1:N) {
log_lik[n] = bernoulli_lpmf(y[n] | inv_logit(mu[n]));
}
}
I adjust for the fixed effects correlation using the fe_adj * fe_beta line, where fe_adj is a matrix where for every unique region I put the group-level medians of the fixed effects. Same goes for fe_adj_age.
Am I doing something wrong? If so, please tell me .
Kind regards,
Caio