# Sample mean of random effect is not even approximately zero

I’m fitting model `brms`. The model includes 8 correlated random effects that share a grouping factor. The bizarre thing is that despite decent sampler diagnostics and Rhats, and despite no obvious prior-data conflict, the sample means of some of the random effects (i.e. from `brms::ranef`) are not even approximately zero. Here’s the posterior margin for the sample mean of one of the random effect terms:

``````hist(rowMeans(ranef(fm, summary = FALSE)\$species[,,"Intercept"]))
``````

The corresponding fixed effect is estimated to be -1.1 and it’s prior is zero-centered, so it should be no problem to estimate the fixed effect to be 1.1 rather than -1.1, which should allow the sampled levels of the random effect to be approximately zero-centered like you’d expect.

I’m at a total loss for how this can come to pass. In the generated Stan code (full Stan model at end of post), we have

``````functions {
/* compute correlated group-level effects
* Args:
*   z: matrix of unscaled group-level effects
*   SD: vector of standard deviation parameters
*   L: cholesky factor correlation matrix
* Returns:
*   matrix of scaled group-level effects
*/
matrix scale_r_cor(matrix z, vector SD, matrix L) {
// r is stored in another dimension order than z
return transpose(diag_pre_multiply(SD, L) * z);
}
...
}

parameters {
...
vector<lower=0>[M_1] sd_1; // group-level standard deviations
matrix[M_1, N_1] z_1; // standardized group-level effects
cholesky_factor_corr[M_1] L_1; // cholesky factor of correlation matrix
}

transformed parameters {
matrix[N_1, M_1] r_1; // actual group-level effects
vector[N_1] r_1_1;
...
// compute actual group-level effects
r_1 = scale_r_cor(z_1, sd_1, L_1);
r_1_1 = r_1[ : , 1];
...
}

model {
...
// initialize linear predictor term
vector[N] mu = Intercept + Xc * b;
...
for (n in 1 : N) {
// add more terms to the linear predictor
mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + ... ;
}
...
THERE IS A LIKELIHOOD HERE
...
target += std_normal_lpdf(to_vector(z_1));
}
``````

The likelihood depends on `mu` in the way you’d expect, with no direct dependence on `z_1` or `r_1`.

And the crazy thing is that after fitting, the sample mean of the first row of `z_1` isn’t even anywhere close to zero!

``````samps <- fm\$fit@sim\$samples
the_names <- names(samps[[1]])
the_names_1 <- the_names[grepl("^z_1\\[1", the_names)]

hist(
c(
rowMeans(samps[[1]][3001:4000, the_names_1]),
rowMeans(samps[[2]][3001:4000, the_names_1]),
rowMeans(samps[[3]][3001:4000, the_names_1]),
rowMeans(samps[[4]][3001:4000, the_names_1])
)
)
``````

Once this occurs, then in general none of the columns of `r_1` will have means near zero because of the correlations implied by `L_1`. And that’s what I see; none of my random effects have means near zero (even though the remaining 7 rows of `z_1`, and particularly the final 6 rows, have sample means near zero as expected).

I’m happy to share the fitted model and a description of what it is trying to accomplish with anybody who wants to help me spelunk here. But if anybody can tell me how what we’re seeing is at all possible, irrespective of the details of this particular model, I would be very grateful.

To make sense of the code below as it pertains to my question, focus on the structure of the linear predictors and ignore almost everything in the functions block. The likelihood is a function of some data and the four distributional parameters that are given linear predictors in the model: `mu`, `occ`, `det`, and `colo`.

``````// generated with brms 2.17.0
functions {
/* compute correlated group-level effects
* Args:
*   z: matrix of unscaled group-level effects
*   SD: vector of standard deviation parameters
*   L: cholesky factor correlation matrix
* Returns:
*   matrix of scaled group-level effects
*/
matrix scale_r_cor(matrix z, vector SD, matrix L) {
// r is stored in another dimension order than z
return transpose(diag_pre_multiply(SD, L) * z);
}
// emission likelihood given that state equals one
real emission_1_fp(array[] real fp, row_vector det) {
// fp gives the probability that the true data is a one.
// det gives logit-detection probabilities

int n = size(fp); // number of reps

int pow_2_n = 1;
for (k in 1 : n) {
pow_2_n = pow_2_n * 2;
}
vector[pow_2_n] lps; // log-probabilities
array[n] int yx; // true history
vector[n] fpx; // prior probability of true history

for (i in 1 : pow_2_n) {
int pow_2_j = 1;
for (j in 1 : n) {
pow_2_j = 2 * pow_2_j;
if ((i % pow_2_j) >= (2 ^ (j - 1))) {
yx[j] = 0;
fpx[j] = 1 - fp[j];
} else {
yx[j] = 1;
fpx[j] = fp[j];
}
}
lps[i] = sum(log(fpx))
+ // the log-probability that this is the true history
bernoulli_logit_lpmf(yx | det); // the log-probability associated with this detection history
}
return log_sum_exp(lps);
}
// Emission likelihood given that the true state is zero
real emission_0_fp(array[] real fp) {
return sum(log1m(fp));
}
// functions for the yearwise likelihood components corresponding to each
// possible pair of true states in that year and the previous year.
real zero_zero(real colo, real e0) {
// e0 is the emission likelihood conditional on state 0
return bernoulli_logit_lpmf(0 | colo) + e0;
}
real one_zero(real ex, real e0) {
return bernoulli_logit_lpmf(1 | ex) + e0;
}
real zero_one(real colo, real e1) {
// e1 is the emission likelihood conditional on state 1
return bernoulli_logit_lpmf(1 | colo) + e1;
}
real one_one(real ex, real e1) {
return bernoulli_logit_lpmf(0 | ex) + e1;
}
// Forward algorithm implementation
real forward_colex_fp(int n_year, // number of years
array[] int Q,
// At least one detection in year?
array[] int n_obs,
// number of visits in year
array[,] real fp,
// probability that true datum is one: rows are units, columns visits
real occ_initial,
// initial occupancy logit-probability
vector colo,
// colonization logit-probabilities (with initial dummy)
vector ex,
// extinction logit-probabilities (with initial dummy)
array[] row_vector det
// detection logit-probabilities
) {
real alpha_0;
real alpha_1;
real alpha_0_prev;
real alpha_1_prev;
real e0; // emission probability conditional on latent state being 0.
real e1; // emission probability conditional on latent state being 1.

// Year 1
if (n_obs[1] == 0) {
alpha_0 = bernoulli_logit_lpmf(0 | occ_initial);
alpha_1 = bernoulli_logit_lpmf(1 | occ_initial);
} else {
e1 = emission_1_fp(fp[1, 1 : n_obs[1]], det[1, 1 : n_obs[1]]);
alpha_1 = bernoulli_logit_lpmf(1 | occ_initial) + e1;
if (Q[1] == 0) {
e0 = emission_0_fp(fp[1, 1 : n_obs[1]]);
alpha_0 = bernoulli_logit_lpmf(0 | occ_initial) + e0;
}
}

// Recursion for subsequent years
if (n_year > 1) {
// alpha_0  is not computed if Q[i] == 1
// alpha_1 is always computed.
for (i in 2 : n_year) {
// Store alpha_0 and alpha_1 for the next round
alpha_0_prev = alpha_0;
alpha_1_prev = alpha_1;

if (n_obs[i] == 0) {
// year with no observations
if (Q[i - 1] == 0) {
alpha_0 = log_sum_exp(alpha_0_prev
+ bernoulli_logit_lpmf(0 | colo[i]),
alpha_1_prev
+ bernoulli_logit_lpmf(1 | ex[i]));
alpha_1 = log_sum_exp(alpha_0_prev
+ bernoulli_logit_lpmf(1 | colo[i]),
alpha_1_prev
+ bernoulli_logit_lpmf(0 | ex[i]));
} else if (Q[i - 1] == 1) {
alpha_0 = alpha_1_prev + bernoulli_logit_lpmf(1 | ex[i]);
alpha_1 = alpha_1_prev + bernoulli_logit_lpmf(0 | ex[i]);
}
} else {
// year with observations
e1 = emission_1_fp(fp[i, 1 : n_obs[i]], det[i, 1 : n_obs[i]]);
e0 = emission_0_fp(fp[i, 1 : n_obs[i]]);
if ((Q[i - 1] == 0) && (Q[i] == 0)) {
// now years with at least one observation
alpha_0 = log_sum_exp(alpha_0_prev + zero_zero(colo[i], e0),
alpha_1_prev + one_zero(ex[i], e0));
alpha_1 = log_sum_exp(alpha_0_prev + zero_one(colo[i], e1),
alpha_1_prev + one_one(ex[i], e1));
} else if ((Q[i - 1] == 0) && (Q[i] == 1)) {
alpha_1 = log_sum_exp(alpha_0_prev + zero_one(colo[i], e1),
alpha_1_prev + one_one(ex[i], e1));
} else if ((Q[i - 1] == 1) && (Q[i] == 0)) {
alpha_0 = alpha_1_prev + one_zero(ex[i], e0);
alpha_1 = alpha_1_prev + one_one(ex[i], e1);
} else if ((Q[i - 1] == 1) && (Q[i] == 1)) {
alpha_1 = alpha_1_prev + one_one(ex[i], e1);
}
}
}
}

// Return
real out;
if (Q[n_year] == 0) {
out = log_sum_exp(alpha_0, alpha_1);
} else if (Q[n_year] == 1) {
out = alpha_1;
}
return out;
}
real occupancy_multi_colex_fp_lpdf(vector fp, // fp data
vector mu,
// linear predictor for detection
vector occ,
// linear predictor for initial occupancy. Only the first vint1[1] elements matter.
vector colo,
// linear predictor for colonization. Elements after vint2[1] irrelevant.
vector ex,
// linear predictor for extinction. Elements after vint2[1] irrelevant.
array[] int vint1,
// # of series (# of HMMs). Elements after 1 irrelevant.
array[] int vint2,
// # units (series-years). Elements after 1 irrelevant.
array[] int vint3,
// # years per series. Elements after vint1[1] irrelevant.
array[] int vint4,
// # sampling events per unit (n_rep). Elements after vint2[1] irrelevant.
array[] int vint5,
// Indicator for > 0 certain detections (Q). Elements after vint2[1] irrelevant.
// indices for jth unit (first rep) for each series. Elements after vint1[1] irrelevant.
array[] int vint6, array[] int vint7,
// indices for jth repeated sampling event to each unit (elements after vint2[1] irrelevant):
array[] int vint8, array[] int vint9,
array[] int vint10, array[] int vint11,
array[] int vint12) {
// Create array of the unit indices that correspond to each series.
array[vint1[1], 2] int unit_index_array;
unit_index_array[ : , 1] = vint6[1 : vint1[1]];
unit_index_array[ : , 2] = vint7[1 : vint1[1]];

// Create array of the rep indices that correspond to each unit.
array[vint2[1], 5] int visit_index_array;
visit_index_array[ : , 1] = vint8[1 : vint2[1]];
visit_index_array[ : , 2] = vint9[1 : vint2[1]];
visit_index_array[ : , 3] = vint10[1 : vint2[1]];
visit_index_array[ : , 4] = vint11[1 : vint2[1]];
visit_index_array[ : , 5] = vint12[1 : vint2[1]];

// Initialize and compute log-likelihood
real lp = 0;
for (i in 1 : vint1[1]) {
int n_year = vint3[i];
array[n_year] int Q = vint5[unit_index_array[i, 1 : n_year]];
array[n_year] int n_obs = vint4[unit_index_array[i, 1 : n_year]];
int max_obs = max(n_obs);
array[n_year, max_obs] real fp_i;
real occ_i = occ[unit_index_array[i, 1]];
vector[n_year] colo_i = to_vector(colo[unit_index_array[i, 1 : n_year]]);
vector[n_year] ex_i = to_vector(ex[unit_index_array[i, 1 : n_year]]);
array[n_year] row_vector[max_obs] det_i;

for (j in 1 : n_year) {
if (n_obs[j] > 0) {
fp_i[j, 1 : n_obs[j]] = to_array_1d(fp[visit_index_array[unit_index_array[i, j], 1 : n_obs[j]]]);
det_i[j, 1 : n_obs[j]] = to_row_vector(mu[visit_index_array[unit_index_array[i, j], 1 : n_obs[j]]]);
}
}
lp += forward_colex_fp(n_year, Q, n_obs, fp_i, occ_i, colo_i, ex_i,
det_i);
}
return lp;
}
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
// data for custom integer vectors
array[N] int vint1;
// data for custom integer vectors
array[N] int vint2;
// data for custom integer vectors
array[N] int vint3;
// data for custom integer vectors
array[N] int vint4;
// data for custom integer vectors
array[N] int vint5;
// data for custom integer vectors
array[N] int vint6;
// data for custom integer vectors
array[N] int vint7;
// data for custom integer vectors
array[N] int vint8;
// data for custom integer vectors
array[N] int vint9;
// data for custom integer vectors
array[N] int vint10;
// data for custom integer vectors
array[N] int vint11;
// data for custom integer vectors
array[N] int vint12;
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
int<lower=1> K_colo; // number of population-level effects
matrix[N, K_colo] X_colo; // population-level design matrix
int<lower=1> K_ex; // number of population-level effects
matrix[N, K_ex] X_ex; // population-level design matrix
// data for group-level effects of ID 1
int<lower=1> N_1; // number of grouping levels
int<lower=1> M_1; // number of coefficients per level
array[N] int<lower=1> J_1; // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_1;
vector[N] Z_1_2;
vector[N] Z_1_3;
vector[N] Z_1_occ_4;
vector[N] Z_1_colo_5;
vector[N] Z_1_colo_6;
vector[N] Z_1_ex_7;
vector[N] Z_1_ex_8;
int<lower=1> NC_1; // number of group-level correlations
int prior_only; // should the likelihood be ignored?
}
transformed data {
int Kc = K - 1;
matrix[N, Kc] Xc; // centered version of X without an intercept
vector[Kc] means_X; // column means of X before centering
int Kc_colo = K_colo - 1;
matrix[N, Kc_colo] Xc_colo; // centered version of X_colo without an intercept
vector[Kc_colo] means_X_colo; // column means of X_colo before centering
int Kc_ex = K_ex - 1;
matrix[N, Kc_ex] Xc_ex; // centered version of X_ex without an intercept
vector[Kc_ex] means_X_ex; // column means of X_ex before centering
for (i in 2 : K) {
means_X[i - 1] = mean(X[ : , i]);
Xc[ : , i - 1] = X[ : , i] - means_X[i - 1];
}
for (i in 2 : K_colo) {
means_X_colo[i - 1] = mean(X_colo[ : , i]);
Xc_colo[ : , i - 1] = X_colo[ : , i] - means_X_colo[i - 1];
}
for (i in 2 : K_ex) {
means_X_ex[i - 1] = mean(X_ex[ : , i]);
Xc_ex[ : , i - 1] = X_ex[ : , i] - means_X_ex[i - 1];
}
}
parameters {
vector[Kc] b; // population-level effects
real Intercept; // temporary intercept for centered predictors
real Intercept_occ; // temporary intercept for centered predictors
vector[Kc_colo] b_colo; // population-level effects
real Intercept_colo; // temporary intercept for centered predictors
vector[Kc_ex] b_ex; // population-level effects
real Intercept_ex; // temporary intercept for centered predictors
vector<lower=0>[M_1] sd_1; // group-level standard deviations
matrix[M_1, N_1] z_1; // standardized group-level effects
cholesky_factor_corr[M_1] L_1; // cholesky factor of correlation matrix
}
transformed parameters {
matrix[N_1, M_1] r_1; // actual group-level effects
// using vectors speeds up indexing in loops
vector[N_1] r_1_1;
vector[N_1] r_1_2;
vector[N_1] r_1_3;
vector[N_1] r_1_occ_4;
vector[N_1] r_1_colo_5;
vector[N_1] r_1_colo_6;
vector[N_1] r_1_ex_7;
vector[N_1] r_1_ex_8;
real lprior = 0; // prior contributions to the log posterior
// compute actual group-level effects
r_1 = scale_r_cor(z_1, sd_1, L_1);
r_1_1 = r_1[ : , 1];
r_1_2 = r_1[ : , 2];
r_1_3 = r_1[ : , 3];
r_1_occ_4 = r_1[ : , 4];
r_1_colo_5 = r_1[ : , 5];
r_1_colo_6 = r_1[ : , 6];
r_1_ex_7 = r_1[ : , 7];
r_1_ex_8 = r_1[ : , 8];
lprior += normal_lpdf(b | 0, 2);
lprior += normal_lpdf(Intercept | 0, 2);
lprior += normal_lpdf(Intercept_occ | 0, 2);
lprior += std_normal_lpdf(b_colo);
lprior += normal_lpdf(Intercept_colo | 0, 2);
lprior += std_normal_lpdf(b_ex);
lprior += normal_lpdf(Intercept_ex | 0, 2);
lprior += student_t_lpdf(sd_1 | 3, 0, 2.5)
- 8 * student_t_lccdf(0 | 3, 0, 2.5);
lprior += lkj_corr_cholesky_lpdf(L_1 | 1);
}
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = Intercept + Xc * b;
// initialize linear predictor term
vector[N] occ = Intercept_occ + rep_vector(0.0, N);
// initialize linear predictor term
vector[N] colo = Intercept_colo + Xc_colo * b_colo;
// initialize linear predictor term
vector[N] ex = Intercept_ex + Xc_ex * b_ex;
for (n in 1 : N) {
// add more terms to the linear predictor
mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_1_2[J_1[n]] * Z_1_2[n]
+ r_1_3[J_1[n]] * Z_1_3[n];
}
for (n in 1 : N) {
// add more terms to the linear predictor
occ[n] += r_1_occ_4[J_1[n]] * Z_1_occ_4[n];
}
for (n in 1 : N) {
// add more terms to the linear predictor
colo[n] += r_1_colo_5[J_1[n]] * Z_1_colo_5[n]
+ r_1_colo_6[J_1[n]] * Z_1_colo_6[n];
}
for (n in 1 : N) {
// add more terms to the linear predictor
ex[n] += r_1_ex_7[J_1[n]] * Z_1_ex_7[n]
+ r_1_ex_8[J_1[n]] * Z_1_ex_8[n];
}
target += occupancy_multi_colex_fp_lpdf(Y | mu, occ, colo, ex, vint1, vint2, vint3, vint4, vint5, vint6, vint7, vint8, vint9, vint10, vint11, vint12);
}
// priors including constants
target += lprior;
target += std_normal_lpdf(to_vector(z_1));
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept - dot_product(means_X, b);
// actual population-level intercept
real b_occ_Intercept = Intercept_occ;
// actual population-level intercept
real b_colo_Intercept = Intercept_colo - dot_product(means_X_colo, b_colo);
// actual population-level intercept
real b_ex_Intercept = Intercept_ex - dot_product(means_X_ex, b_ex);
// compute group-level correlations
corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
vector<lower=-1, upper=1>[NC_1] cor_1;
// extract upper diagonal of correlation matrix
for (k in 1 : M_1) {
for (j in 1 : (k - 1)) {
cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
}
}
}
``````
1 Like

Could you send me the fitted model and such a short description? My first guess is that it could be a consequence of what we discussed at Sum-to-zero intercepts and terminology, but I’m not sure if I got the details of your problem correctly.