# Brms: non-centered to centered parameterization

Hi,
I know that non-centered parameterization is generally preferred over centered parameterization but in some cases, it is advised to use centered parameterization.

when there is a lot of data, the centered parameterization is more efficienthttps://mc-stan.org/docs/2_18/stan-users-guide/reparameterization-section.html

Since brms follows the non-centered parameterization approach, can someone please guide me how best to change brms generated stan code from non-centered to centered parameterization.

Below is an example of stan code generated for a nonlinear model with three correlated random effects.

``````library(brms)

b <- c(2, 0.75)
x <- rnorm(100)
y <- rnorm(100, mean = b[1] * exp(b[2] * x))
dat1 <- data.frame(x, y)

fit_loss <- brm(
bf(cum ~ ult * (1 - exp(-(dev/theta)^omega)),
ult ~ 1 + (1|i|AY),
omega ~ 1 + (1|i|AY),
theta ~ 1 + (1|i|AY),
nl = TRUE),
data = loss, family = gaussian(),
prior = c(
prior(normal(5000, 1000), nlpar = "ult"),
prior(normal(1, 2), nlpar = "omega"),
prior(normal(45, 10), nlpar = "theta")
),
)

# scode <- stancode(fit_loss)

scode <-
"
// generated with brms 2.20.3
functions {
/* compute correlated group-level effects
* Args:
*   z: matrix of unscaled group-level effects
*   SD: vector of standard deviation parameters
*   L: cholesky factor correlation matrix
* Returns:
*   matrix of scaled group-level effects
*/
matrix scale_r_cor(matrix z, vector SD, matrix L) {
// r is stored in another dimension order than z
return transpose(diag_pre_multiply(SD, L) * z);
}
}
data {
int<lower=1> N;  // total number of observations
vector[N] Y;  // response variable
int<lower=1> K_ult;  // number of population-level effects
matrix[N, K_ult] X_ult;  // population-level design matrix
int<lower=1> K_omega;  // number of population-level effects
matrix[N, K_omega] X_omega;  // population-level design matrix
int<lower=1> K_theta;  // number of population-level effects
matrix[N, K_theta] X_theta;  // population-level design matrix
// covariates for non-linear functions
array[N] int C_1;
// data for group-level effects of ID 1
int<lower=1> N_1;  // number of grouping levels
int<lower=1> M_1;  // number of coefficients per level
array[N] int<lower=1> J_1;  // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_ult_1;
vector[N] Z_1_omega_2;
vector[N] Z_1_theta_3;
int<lower=1> NC_1;  // number of group-level correlations
int prior_only;  // should the likelihood be ignored?
}
transformed data {
}
parameters {
vector[K_ult] b_ult;  // regression coefficients
vector[K_omega] b_omega;  // regression coefficients
vector[K_theta] b_theta;  // regression coefficients
real<lower=0> sigma;  // dispersion parameter
vector<lower=0>[M_1] sd_1;  // group-level standard deviations
matrix[M_1, N_1] z_1;  // standardized group-level effects
cholesky_factor_corr[M_1] L_1;  // cholesky factor of correlation matrix
}
transformed parameters {
matrix[N_1, M_1] r_1;  // actual group-level effects
// using vectors speeds up indexing in loops
vector[N_1] r_1_ult_1;
vector[N_1] r_1_omega_2;
vector[N_1] r_1_theta_3;
real lprior = 0;  // prior contributions to the log posterior
// compute actual group-level effects
r_1 = scale_r_cor(z_1, sd_1, L_1);
r_1_ult_1 = r_1[, 1];
r_1_omega_2 = r_1[, 2];
r_1_theta_3 = r_1[, 3];
lprior += normal_lpdf(b_ult | 5000, 1000);
lprior += normal_lpdf(b_omega | 1, 2);
lprior += normal_lpdf(b_theta | 45, 10);
lprior += student_t_lpdf(sigma | 3, 0, 1963.7)
- 1 * student_t_lccdf(0 | 3, 0, 1963.7);
lprior += student_t_lpdf(sd_1 | 3, 0, 1963.7)
- 3 * student_t_lccdf(0 | 3, 0, 1963.7);
lprior += lkj_corr_cholesky_lpdf(L_1 | 1);
}
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] nlp_ult = rep_vector(0.0, N);
// initialize linear predictor term
vector[N] nlp_omega = rep_vector(0.0, N);
// initialize linear predictor term
vector[N] nlp_theta = rep_vector(0.0, N);
// initialize non-linear predictor term
vector[N] mu;
nlp_ult += X_ult * b_ult;
nlp_omega += X_omega * b_omega;
nlp_theta += X_theta * b_theta;
for (n in 1:N) {
// add more terms to the linear predictor
nlp_ult[n] += r_1_ult_1[J_1[n]] * Z_1_ult_1[n];
}
for (n in 1:N) {
// add more terms to the linear predictor
nlp_omega[n] += r_1_omega_2[J_1[n]] * Z_1_omega_2[n];
}
for (n in 1:N) {
// add more terms to the linear predictor
nlp_theta[n] += r_1_theta_3[J_1[n]] * Z_1_theta_3[n];
}
for (n in 1:N) {
// compute non-linear predictor values
mu[n] = (nlp_ult[n] * (1 - exp( - (C_1[n] / nlp_theta[n]) ^ nlp_omega[n])));
}
target += normal_lpdf(Y | mu, sigma);
}
// priors including constants
target += lprior;
target += std_normal_lpdf(to_vector(z_1));
}
generated quantities {
// compute group-level correlations
corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
vector<lower=-1,upper=1>[NC_1] cor_1;
// extract upper diagonal of correlation matrix
for (k in 1:M_1) {
for (j in 1:(k - 1)) {
cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
}
}
}

"

``````

Thank you

the R code creates `dat1` with columns `x`, `y` but it seems that data frame `loss` contains `cum`, `dev`, and `AY` ??

if you specify `0` instead of `1` for the intercept in the formula, the stancode will use the centered parameterization.

Please ignore the dat1, it was posted by mistake.

I think my original question (non-centered to centered parameterization) was not stated clearly. Sorry for that.

I am looking for the ways to modify brms generated stancode to implement centered parameterization approach for random effects (Hierarchical models).

Please see below an example for a single random effect (source: https://mc-stan.org/docs/stan-users-guide/reparameterization.html)

#### Centered parameterization

``````parameters {
real mu_beta;
real<lower=0> sigma_beta;
vector[K] beta;
// ...
}
model {
beta ~ normal(mu_beta, sigma_beta);
// ...
}
``````

### Non-centered parameterization

``````parameters {
vector[K] beta_raw;
// ...
}
transformed parameters {
vector[K] beta;
// implies: beta ~ normal(mu_beta, sigma_beta)
beta = mu_beta + sigma_beta * beta_raw;
}
model {
beta_raw ~ std_normal();
// ...
}
``````

I am looking to modify the stancode (for multiple correleated random effects, as shown in the original post) from Non-centered parameterization to Centered parameterization.

Thank you

the BRMS-generated Stan code doesn’t always line up with the way that the Stan code is written in the user’s guide. I think that it might be easier if you take a look at the BRMS-generated Stan code for a formula specifying the intercept both as `0` and `1` - that might make it easier to see what needs to be changed.

@mitzimorris `0 + Intercept` turns off the internal design matrix centering, but does not switch the random effects parametrization from non-centered to centered, AFAIK.

For example both of the following still come out non-centered:

``````make_stancode(rating ~ 0 + Intercept + (1 + treat | subject), data = inhaler)
make_stancode(rating ~ 0 + Intercept + (0 + treat | subject), data = inhaler)
``````

@satpal to switch from non-centered to centered, you should remove the parameter

``````matrix[M_1, N_1] z_1;  // standardized group-level effects
``````

add as a parameter (move from transformed parameters)

``````matrix[N_1, M_1] r_1;  // actual group-level effects
``````

And then sample the matrix parametes from a multivariate normal with covariance matrix constructed from the parameters `sd_1` and `L_1`.

@jsocolar that’s exactly what I am looking for but not able to figure out how best to edit brms generated stan code (best in terms of efficiency and minimal changes required to achieve that).

I would really appreciate if please provide an example.

Thank you

For minimal changes, something like:

``````for(i in 1:N_1){
r_1[i, ] ~ multi_normal_cholesky(rep_row_vector(0, M_1), diag_pre_multiply(sd_1, L1))
}
``````

Thanks @jsocolar

Worked as expected!

Despite the fact that loss data used in the above example are really small (only 10 groups, 55 observations), both non-centered parametrization (NCP) and centered parametrization (CP) provided similar results. As expected (small data), the CP resulted in 27 divergences (same control for both NCP and CP: control = list(adapt_delta = 0.98, max_treedepth = 12)

Below I provide the edited stan code (used for CP) and a comparison of results for NCP and CP results.

As you see below (CP), I have commented out the whole transformed parameters block and moved it (except matrix[N_1, M_1] r_1; that is now in the parameters block, and r_1 = scale_r_cor(z_1, sd_1, L_1); which is not needed) to the model block.

Is there any way of using ~ multi_normal_cholesky (i.e, sampling) within the transformed parameters via functions block?

#### Non-centered parameterization

``````fit_loss_ncp <- brm(
bf(cum ~ ult * (1 - exp(-(dev/theta)^omega)),
ult ~   1 + (1|i|AY),
omega ~ 1 + (1|i|AY),
theta ~ 1 + (1|i|AY),
nl = TRUE),
data = loss, family = gaussian(),
prior = c(
prior(normal(5000, 1000), nlpar = "ult"),
prior(normal(1, 2), nlpar = "omega"),
prior(normal(45, 10), nlpar = "theta")
),
control = list(adapt_delta = 0.98, max_treedepth = 12),
normalize = FALSE, # Switch off normalizing constant for speed
cores = 8,
init = 'random',
seed = 123
)

summary(fit_loss_ncp)

Family: gaussian
Links: mu = identity; sigma = identity
Formula: cum ~ ult * (1 - exp(-(dev/theta)^omega))
ult ~ 1 + (1 | i | AY)
omega ~ 1 + (1 | i | AY)
theta ~ 1 + (1 | i | AY)
Data: loss (Number of observations: 55)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000

Group-Level Effects:
~AY (Number of levels: 10)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(ult_Intercept)                      764.50    299.69   328.61  1495.02 1.00     1763
sd(omega_Intercept)                      0.09      0.07     0.00     0.25 1.01      797
sd(theta_Intercept)                      4.40      2.97     0.27    11.66 1.00      624
cor(ult_Intercept,omega_Intercept)       0.26      0.45    -0.70     0.93 1.00     2865
cor(ult_Intercept,theta_Intercept)       0.03      0.43    -0.81     0.79 1.00     2435
cor(omega_Intercept,theta_Intercept)    -0.43      0.47    -0.98     0.69 1.01      868
Tail_ESS
sd(ult_Intercept)                        2106
sd(omega_Intercept)                      1573
sd(theta_Intercept)                      1508
cor(ult_Intercept,omega_Intercept)       2225
cor(ult_Intercept,theta_Intercept)       2302
cor(omega_Intercept,theta_Intercept)     2420

Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
ult_Intercept    5243.66    316.97  4664.95  5900.01 1.00     1421     1893
omega_Intercept     1.36      0.06     1.23     1.49 1.00     1872     2175
theta_Intercept    45.66      2.88    40.51    51.99 1.00     1937     2142

Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma   133.32     16.27   105.18   169.87 1.00     1788     2504

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

``````

#### Centered parameterization

``````# Get stancode
scode <- brms::stancode(fit_loss_ncp)
# Get data
sdata <- brms::standata(fit_loss_ncp)
# Get initials
sinit <- fit_loss_ncp\$fit@inits

# Edit stancode
scode <-
"
// generated with brms 2.20.3
functions {
/* compute correlated group-level effects
* Args:
*   z: matrix of unscaled group-level effects
*   SD: vector of standard deviation parameters
*   L: cholesky factor correlation matrix
* Returns:
*   matrix of scaled group-level effects
*/
matrix scale_r_cor(matrix z, vector SD, matrix L) {
// r is stored in another dimension order than z
return transpose(diag_pre_multiply(SD, L) * z);
}
}
data {
int<lower=1> N;  // total number of observations
vector[N] Y;  // response variable
int<lower=1> K_ult;  // number of population-level effects
matrix[N, K_ult] X_ult;  // population-level design matrix
int<lower=1> K_omega;  // number of population-level effects
matrix[N, K_omega] X_omega;  // population-level design matrix
int<lower=1> K_theta;  // number of population-level effects
matrix[N, K_theta] X_theta;  // population-level design matrix
// covariates for non-linear functions
array[N] int C_1;
// data for group-level effects of ID 1
int<lower=1> N_1;  // number of grouping levels
int<lower=1> M_1;  // number of coefficients per level
array[N] int<lower=1> J_1;  // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_ult_1;
vector[N] Z_1_omega_2;
vector[N] Z_1_theta_3;
int<lower=1> NC_1;  // number of group-level correlations
int prior_only;  // should the likelihood be ignored?
}
transformed data {
}
parameters {
vector[K_ult] b_ult;  // regression coefficients
vector[K_omega] b_omega;  // regression coefficients
vector[K_theta] b_theta;  // regression coefficients
real<lower=0> sigma;  // dispersion parameter
vector<lower=0>[M_1] sd_1;  // group-level standard deviations
//matrix[M_1, N_1] z_1;  // standardized group-level effects
cholesky_factor_corr[M_1] L_1;  // cholesky factor of correlation matrix
matrix[N_1, M_1] r_1;  // actual group-level effects
}
transformed parameters {
// Comment out the whole chunk from the transformed parameters
/*
matrix[N_1, M_1] r_1;  // actual group-level effects
// using vectors speeds up indexing in loops
vector[N_1] r_1_ult_1;
vector[N_1] r_1_omega_2;
vector[N_1] r_1_theta_3;
// compute actual group-level effects
r_1 = scale_r_cor(z_1, sd_1, L_1);
r_1_ult_1 = r_1[, 1];
r_1_omega_2 = r_1[, 2];
r_1_theta_3 = r_1[, 3];
*/
}
model {
for(i in 1:N_1){
r_1[i, ] ~ multi_normal_cholesky(rep_row_vector(0, M_1), diag_pre_multiply(sd_1, L_1));
}
// using vectors speeds up indexing in loops
vector[N_1] r_1_ult_1;
vector[N_1] r_1_omega_2;
vector[N_1] r_1_theta_3;
// compute actual group-level effects
//r_1 = scale_r_cor(z_1, sd_1, L_1);
r_1_ult_1 = r_1[, 1];
r_1_omega_2 = r_1[, 2];
r_1_theta_3 = r_1[, 3];
real lprior = 0;  // prior contributions to the log posterior
// likelihood not including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] nlp_ult = rep_vector(0.0, N);
// initialize linear predictor term
vector[N] nlp_omega = rep_vector(0.0, N);
// initialize linear predictor term
vector[N] nlp_theta = rep_vector(0.0, N);
// initialize non-linear predictor term
vector[N] mu;
nlp_ult += X_ult * b_ult;
nlp_omega += X_omega * b_omega;
nlp_theta += X_theta * b_theta;
for (n in 1:N) {
// add more terms to the linear predictor
nlp_ult[n] += r_1_ult_1[J_1[n]] * Z_1_ult_1[n];
}
for (n in 1:N) {
// add more terms to the linear predictor
nlp_omega[n] += r_1_omega_2[J_1[n]] * Z_1_omega_2[n];
}
for (n in 1:N) {
// add more terms to the linear predictor
nlp_theta[n] += r_1_theta_3[J_1[n]] * Z_1_theta_3[n];
}
for (n in 1:N) {
// compute non-linear predictor values
mu[n] = (nlp_ult[n] * (1 - exp( - (C_1[n] / nlp_theta[n]) ^ nlp_omega[n])));
}
target += normal_lupdf(Y | mu, sigma);
}
// priors not including constants
lprior += normal_lupdf(b_ult | 5000, 1000);
lprior += normal_lupdf(b_omega | 1, 2);
lprior += normal_lupdf(b_theta | 45, 10);
lprior += student_t_lupdf(sigma | 3, 0, 1963.7);
lprior += student_t_lupdf(sd_1 | 3, 0, 1963.7);
lprior += lkj_corr_cholesky_lupdf(L_1 | 1);
target += lprior;
//  target += std_normal_lupdf(to_vector(z_1));
}
generated quantities {
// compute group-level correlations
corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
vector<lower=-1,upper=1>[NC_1] cor_1;
// extract upper diagonal of correlation matrix
for (k in 1:M_1) {
for (j in 1:(k - 1)) {
cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
}
}
}

"

# Fit CP model using rstan

rfit_loss_cp <- rstan::stan(model_code = scode, data = sdata, init = sinit,
control = list(adapt_delta = 0.98, max_treedepth = 12),
seed = 123,
cores = 8)

# Posterior summary
options(scipen=999)
fit_summary <- summary(rfit_loss_cp)
fit_summary_beta_sd <- fit_summary\$summary[1:7, ]
print(round(fit_summary_beta_sd, 3))

# Or, for a better comparison with the fit_loss_ncp,
# Create an empty model and populate results from the above rfit_loss_cp fit

fit_loss_cp <- brm(
bf(cum ~ ult * (1 - exp(-(dev/theta)^omega)),
ult ~   1 + (1|i|AY),
omega ~ 1 + (1|i|AY),
theta ~ 1 + (1|i|AY),
nl = TRUE),
data = loss, family = gaussian(),
prior = c(
prior(normal(5000, 1000), nlpar = "ult"),
prior(normal(1, 2), nlpar = "omega"),
prior(normal(45, 10), nlpar = "theta")
),
control = list(adapt_delta = 0.98, max_treedepth = 12),
init = 'random',
normalize = FALSE, # Switch off normalizing constant for speed
empty = TRUE
)

fit_loss_cp\$fit <- rfit_loss_cp
fit_loss_cp     <- brms::rename_pars(fit_loss_cp)

summary(fit_loss_cp)

Family: gaussian
Links: mu = identity; sigma = identity
Formula: cum ~ ult * (1 - exp(-(dev/theta)^omega))
ult ~ 1 + (1 | i | AY)
omega ~ 1 + (1 | i | AY)
theta ~ 1 + (1 | i | AY)
Data: loss (Number of observations: 55)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000

Group-Level Effects:
~AY (Number of levels: 10)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(ult_Intercept)                      751.57    295.21   321.48  1456.84 1.00     1227
sd(omega_Intercept)                      0.09      0.07     0.01     0.25 1.02      294
sd(theta_Intercept)                      4.44      2.85     0.48    11.15 1.01      274
cor(ult_Intercept,omega_Intercept)       0.27      0.44    -0.69     0.92 1.00     1455
cor(ult_Intercept,theta_Intercept)       0.01      0.43    -0.83     0.77 1.00     1171
cor(omega_Intercept,theta_Intercept)    -0.44      0.47    -0.97     0.67 1.01      531
Tail_ESS
sd(ult_Intercept)                        1781
sd(omega_Intercept)                       203
sd(theta_Intercept)                       316
cor(ult_Intercept,omega_Intercept)       2143
cor(ult_Intercept,theta_Intercept)       1694
cor(omega_Intercept,theta_Intercept)     1549

Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
ult_Intercept    5238.25    312.07  4699.74  5894.41 1.00     1333     1451
omega_Intercept     1.36      0.06     1.24     1.49 1.00     1330     1622
theta_Intercept    45.58      2.85    40.44    51.65 1.01     1514     1717

Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma   133.34     16.00   105.97   167.90 1.00     2163     2539

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Warning message:
There were 27 divergent transitions after warmup. Increasing adapt_delta above 0.98 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup

``````

Thank you

1 Like

Right, this affects the global intercept by changing whether the X matrix (“fixed effects” design matrix) is centered or not, but has no affect on the parameterization of “random effects” terms. Although it would be nice to have a quick switch between the two!

2 Likes