Sample mean of random effect is not even approximately zero

I’m fitting model brms. The model includes 8 correlated random effects that share a grouping factor. The bizarre thing is that despite decent sampler diagnostics and Rhats, and despite no obvious prior-data conflict, the sample means of some of the random effects (i.e. from brms::ranef) are not even approximately zero. Here’s the posterior margin for the sample mean of one of the random effect terms:

hist(rowMeans(ranef(fm, summary = FALSE)$species[,,"Intercept"]))

The corresponding fixed effect is estimated to be -1.1 and it’s prior is zero-centered, so it should be no problem to estimate the fixed effect to be 1.1 rather than -1.1, which should allow the sampled levels of the random effect to be approximately zero-centered like you’d expect.

I’m at a total loss for how this can come to pass. In the generated Stan code (full Stan model at end of post), we have

functions {
  /* compute correlated group-level effects
   * Args:
   *   z: matrix of unscaled group-level effects
   *   SD: vector of standard deviation parameters
   *   L: cholesky factor correlation matrix
   * Returns:
   *   matrix of scaled group-level effects
   */
  matrix scale_r_cor(matrix z, vector SD, matrix L) {
    // r is stored in another dimension order than z
    return transpose(diag_pre_multiply(SD, L) * z);
  }
...
}

parameters {
...
  vector<lower=0>[M_1] sd_1; // group-level standard deviations
  matrix[M_1, N_1] z_1; // standardized group-level effects
  cholesky_factor_corr[M_1] L_1; // cholesky factor of correlation matrix
}

transformed parameters {
  matrix[N_1, M_1] r_1; // actual group-level effects
  vector[N_1] r_1_1;
...
  // compute actual group-level effects
  r_1 = scale_r_cor(z_1, sd_1, L_1);
  r_1_1 = r_1[ : , 1];
...
}

model {
...
  // initialize linear predictor term
  vector[N] mu = Intercept + Xc * b;
...
  for (n in 1 : N) {
    // add more terms to the linear predictor
    mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + ... ;
    }
...
  THERE IS A LIKELIHOOD HERE
...
  target += std_normal_lpdf(to_vector(z_1));
}

The likelihood depends on mu in the way you’d expect, with no direct dependence on z_1 or r_1.

And the crazy thing is that after fitting, the sample mean of the first row of z_1 isn’t even anywhere close to zero!

samps <- fm$fit@sim$samples
the_names <- names(samps[[1]])
the_names_1 <- the_names[grepl("^z_1\\[1", the_names)]

hist(
  c(
    rowMeans(samps[[1]][3001:4000, the_names_1]),
    rowMeans(samps[[2]][3001:4000, the_names_1]),
    rowMeans(samps[[3]][3001:4000, the_names_1]),
    rowMeans(samps[[4]][3001:4000, the_names_1])
  )
)

Once this occurs, then in general none of the columns of r_1 will have means near zero because of the correlations implied by L_1. And that’s what I see; none of my random effects have means near zero (even though the remaining 7 rows of z_1, and particularly the final 6 rows, have sample means near zero as expected).

I’m happy to share the fitted model and a description of what it is trying to accomplish with anybody who wants to help me spelunk here. But if anybody can tell me how what we’re seeing is at all possible, irrespective of the details of this particular model, I would be very grateful.

To make sense of the code below as it pertains to my question, focus on the structure of the linear predictors and ignore almost everything in the functions block. The likelihood is a function of some data and the four distributional parameters that are given linear predictors in the model: mu, occ, det, and colo.

// generated with brms 2.17.0
functions {
  /* compute correlated group-level effects
   * Args:
   *   z: matrix of unscaled group-level effects
   *   SD: vector of standard deviation parameters
   *   L: cholesky factor correlation matrix
   * Returns:
   *   matrix of scaled group-level effects
   */
  matrix scale_r_cor(matrix z, vector SD, matrix L) {
    // r is stored in another dimension order than z
    return transpose(diag_pre_multiply(SD, L) * z);
  }
  // emission likelihood given that state equals one
  real emission_1_fp(array[] real fp, row_vector det) {
    // fp gives the probability that the true data is a one.
    // det gives logit-detection probabilities
    
    int n = size(fp); // number of reps
    
    int pow_2_n = 1;
    for (k in 1 : n) {
      pow_2_n = pow_2_n * 2;
    }
    vector[pow_2_n] lps; // log-probabilities
    array[n] int yx; // true history
    vector[n] fpx; // prior probability of true history
    
    for (i in 1 : pow_2_n) {
      int pow_2_j = 1;
      for (j in 1 : n) {
        pow_2_j = 2 * pow_2_j;
        if ((i % pow_2_j) >= (2 ^ (j - 1))) {
          yx[j] = 0;
          fpx[j] = 1 - fp[j];
        } else {
          yx[j] = 1;
          fpx[j] = fp[j];
        }
      }
      lps[i] = sum(log(fpx))
               + // the log-probability that this is the true history
               bernoulli_logit_lpmf(yx | det); // the log-probability associated with this detection history
    }
    return log_sum_exp(lps);
  }
  // Emission likelihood given that the true state is zero
  real emission_0_fp(array[] real fp) {
    return sum(log1m(fp));
  }
  // functions for the yearwise likelihood components corresponding to each
  // possible pair of true states in that year and the previous year.
  real zero_zero(real colo, real e0) {
    // e0 is the emission likelihood conditional on state 0
    return bernoulli_logit_lpmf(0 | colo) + e0;
  }
  real one_zero(real ex, real e0) {
    return bernoulli_logit_lpmf(1 | ex) + e0;
  }
  real zero_one(real colo, real e1) {
    // e1 is the emission likelihood conditional on state 1
    return bernoulli_logit_lpmf(1 | colo) + e1;
  }
  real one_one(real ex, real e1) {
    return bernoulli_logit_lpmf(0 | ex) + e1;
  }
  // Forward algorithm implementation
  real forward_colex_fp(int n_year, // number of years
                        array[] int Q,
                        // At least one detection in year?
                        array[] int n_obs,
                        // number of visits in year
                        array[,] real fp,
                        // probability that true datum is one: rows are units, columns visits
                        real occ_initial,
                        // initial occupancy logit-probability
                        vector colo,
                        // colonization logit-probabilities (with initial dummy)
                        vector ex,
                        // extinction logit-probabilities (with initial dummy)
                        array[] row_vector det
                        // detection logit-probabilities
                        ) {
    real alpha_0;
    real alpha_1;
    real alpha_0_prev;
    real alpha_1_prev;
    real e0; // emission probability conditional on latent state being 0.
    real e1; // emission probability conditional on latent state being 1.
    
    // Year 1
    if (n_obs[1] == 0) {
      alpha_0 = bernoulli_logit_lpmf(0 | occ_initial);
      alpha_1 = bernoulli_logit_lpmf(1 | occ_initial);
    } else {
      e1 = emission_1_fp(fp[1, 1 : n_obs[1]], det[1, 1 : n_obs[1]]);
      alpha_1 = bernoulli_logit_lpmf(1 | occ_initial) + e1;
      if (Q[1] == 0) {
        e0 = emission_0_fp(fp[1, 1 : n_obs[1]]);
        alpha_0 = bernoulli_logit_lpmf(0 | occ_initial) + e0;
      }
    }
    
    // Recursion for subsequent years
    if (n_year > 1) {
      // alpha_0  is not computed if Q[i] == 1
      // alpha_1 is always computed.
      for (i in 2 : n_year) {
        // Store alpha_0 and alpha_1 for the next round
        alpha_0_prev = alpha_0;
        alpha_1_prev = alpha_1;
        
        if (n_obs[i] == 0) {
          // year with no observations
          if (Q[i - 1] == 0) {
            alpha_0 = log_sum_exp(alpha_0_prev
                                  + bernoulli_logit_lpmf(0 | colo[i]),
                                  alpha_1_prev
                                  + bernoulli_logit_lpmf(1 | ex[i]));
            alpha_1 = log_sum_exp(alpha_0_prev
                                  + bernoulli_logit_lpmf(1 | colo[i]),
                                  alpha_1_prev
                                  + bernoulli_logit_lpmf(0 | ex[i]));
          } else if (Q[i - 1] == 1) {
            alpha_0 = alpha_1_prev + bernoulli_logit_lpmf(1 | ex[i]);
            alpha_1 = alpha_1_prev + bernoulli_logit_lpmf(0 | ex[i]);
          }
        } else {
          // year with observations
          e1 = emission_1_fp(fp[i, 1 : n_obs[i]], det[i, 1 : n_obs[i]]);
          e0 = emission_0_fp(fp[i, 1 : n_obs[i]]);
          if ((Q[i - 1] == 0) && (Q[i] == 0)) {
            // now years with at least one observation
            alpha_0 = log_sum_exp(alpha_0_prev + zero_zero(colo[i], e0),
                                  alpha_1_prev + one_zero(ex[i], e0));
            alpha_1 = log_sum_exp(alpha_0_prev + zero_one(colo[i], e1),
                                  alpha_1_prev + one_one(ex[i], e1));
          } else if ((Q[i - 1] == 0) && (Q[i] == 1)) {
            alpha_1 = log_sum_exp(alpha_0_prev + zero_one(colo[i], e1),
                                  alpha_1_prev + one_one(ex[i], e1));
          } else if ((Q[i - 1] == 1) && (Q[i] == 0)) {
            alpha_0 = alpha_1_prev + one_zero(ex[i], e0);
            alpha_1 = alpha_1_prev + one_one(ex[i], e1);
          } else if ((Q[i - 1] == 1) && (Q[i] == 1)) {
            alpha_1 = alpha_1_prev + one_one(ex[i], e1);
          }
        }
      }
    }
    
    // Return
    real out;
    if (Q[n_year] == 0) {
      out = log_sum_exp(alpha_0, alpha_1);
    } else if (Q[n_year] == 1) {
      out = alpha_1;
    }
    return out;
  }
  real occupancy_multi_colex_fp_lpdf(vector fp, // fp data
                                     vector mu,
                                     // linear predictor for detection
                                     vector occ,
                                     // linear predictor for initial occupancy. Only the first vint1[1] elements matter.
                                     vector colo,
                                     // linear predictor for colonization. Elements after vint2[1] irrelevant.
                                     vector ex,
                                     // linear predictor for extinction. Elements after vint2[1] irrelevant.
                                     array[] int vint1,
                                     // # of series (# of HMMs). Elements after 1 irrelevant.
                                     array[] int vint2,
                                     // # units (series-years). Elements after 1 irrelevant.
                                     array[] int vint3,
                                     // # years per series. Elements after vint1[1] irrelevant.
                                     array[] int vint4,
                                     // # sampling events per unit (n_rep). Elements after vint2[1] irrelevant.
                                     array[] int vint5,
                                     // Indicator for > 0 certain detections (Q). Elements after vint2[1] irrelevant.
                                     // indices for jth unit (first rep) for each series. Elements after vint1[1] irrelevant.
                                     array[] int vint6, array[] int vint7,
                                     // indices for jth repeated sampling event to each unit (elements after vint2[1] irrelevant):
                                     array[] int vint8, array[] int vint9,
                                     array[] int vint10, array[] int vint11,
                                     array[] int vint12) {
    // Create array of the unit indices that correspond to each series.
    array[vint1[1], 2] int unit_index_array;
    unit_index_array[ : , 1] = vint6[1 : vint1[1]];
    unit_index_array[ : , 2] = vint7[1 : vint1[1]];
    
    // Create array of the rep indices that correspond to each unit.
    array[vint2[1], 5] int visit_index_array;
    visit_index_array[ : , 1] = vint8[1 : vint2[1]];
    visit_index_array[ : , 2] = vint9[1 : vint2[1]];
    visit_index_array[ : , 3] = vint10[1 : vint2[1]];
    visit_index_array[ : , 4] = vint11[1 : vint2[1]];
    visit_index_array[ : , 5] = vint12[1 : vint2[1]];
    
    // Initialize and compute log-likelihood
    real lp = 0;
    for (i in 1 : vint1[1]) {
      int n_year = vint3[i];
      array[n_year] int Q = vint5[unit_index_array[i, 1 : n_year]];
      array[n_year] int n_obs = vint4[unit_index_array[i, 1 : n_year]];
      int max_obs = max(n_obs);
      array[n_year, max_obs] real fp_i;
      real occ_i = occ[unit_index_array[i, 1]];
      vector[n_year] colo_i = to_vector(colo[unit_index_array[i, 1 : n_year]]);
      vector[n_year] ex_i = to_vector(ex[unit_index_array[i, 1 : n_year]]);
      array[n_year] row_vector[max_obs] det_i;
      
      for (j in 1 : n_year) {
        if (n_obs[j] > 0) {
          fp_i[j, 1 : n_obs[j]] = to_array_1d(fp[visit_index_array[unit_index_array[i, j], 1 : n_obs[j]]]);
          det_i[j, 1 : n_obs[j]] = to_row_vector(mu[visit_index_array[unit_index_array[i, j], 1 : n_obs[j]]]);
        }
      }
      lp += forward_colex_fp(n_year, Q, n_obs, fp_i, occ_i, colo_i, ex_i,
                             det_i);
    }
    return lp;
  }
}
data {
  int<lower=1> N; // total number of observations
  vector[N] Y; // response variable
  // data for custom integer vectors
  array[N] int vint1;
  // data for custom integer vectors
  array[N] int vint2;
  // data for custom integer vectors
  array[N] int vint3;
  // data for custom integer vectors
  array[N] int vint4;
  // data for custom integer vectors
  array[N] int vint5;
  // data for custom integer vectors
  array[N] int vint6;
  // data for custom integer vectors
  array[N] int vint7;
  // data for custom integer vectors
  array[N] int vint8;
  // data for custom integer vectors
  array[N] int vint9;
  // data for custom integer vectors
  array[N] int vint10;
  // data for custom integer vectors
  array[N] int vint11;
  // data for custom integer vectors
  array[N] int vint12;
  int<lower=1> K; // number of population-level effects
  matrix[N, K] X; // population-level design matrix
  int<lower=1> K_colo; // number of population-level effects
  matrix[N, K_colo] X_colo; // population-level design matrix
  int<lower=1> K_ex; // number of population-level effects
  matrix[N, K_ex] X_ex; // population-level design matrix
  // data for group-level effects of ID 1
  int<lower=1> N_1; // number of grouping levels
  int<lower=1> M_1; // number of coefficients per level
  array[N] int<lower=1> J_1; // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  vector[N] Z_1_2;
  vector[N] Z_1_3;
  vector[N] Z_1_occ_4;
  vector[N] Z_1_colo_5;
  vector[N] Z_1_colo_6;
  vector[N] Z_1_ex_7;
  vector[N] Z_1_ex_8;
  int<lower=1> NC_1; // number of group-level correlations
  int prior_only; // should the likelihood be ignored?
}
transformed data {
  int Kc = K - 1;
  matrix[N, Kc] Xc; // centered version of X without an intercept
  vector[Kc] means_X; // column means of X before centering
  int Kc_colo = K_colo - 1;
  matrix[N, Kc_colo] Xc_colo; // centered version of X_colo without an intercept
  vector[Kc_colo] means_X_colo; // column means of X_colo before centering
  int Kc_ex = K_ex - 1;
  matrix[N, Kc_ex] Xc_ex; // centered version of X_ex without an intercept
  vector[Kc_ex] means_X_ex; // column means of X_ex before centering
  for (i in 2 : K) {
    means_X[i - 1] = mean(X[ : , i]);
    Xc[ : , i - 1] = X[ : , i] - means_X[i - 1];
  }
  for (i in 2 : K_colo) {
    means_X_colo[i - 1] = mean(X_colo[ : , i]);
    Xc_colo[ : , i - 1] = X_colo[ : , i] - means_X_colo[i - 1];
  }
  for (i in 2 : K_ex) {
    means_X_ex[i - 1] = mean(X_ex[ : , i]);
    Xc_ex[ : , i - 1] = X_ex[ : , i] - means_X_ex[i - 1];
  }
}
parameters {
  vector[Kc] b; // population-level effects
  real Intercept; // temporary intercept for centered predictors
  real Intercept_occ; // temporary intercept for centered predictors
  vector[Kc_colo] b_colo; // population-level effects
  real Intercept_colo; // temporary intercept for centered predictors
  vector[Kc_ex] b_ex; // population-level effects
  real Intercept_ex; // temporary intercept for centered predictors
  vector<lower=0>[M_1] sd_1; // group-level standard deviations
  matrix[M_1, N_1] z_1; // standardized group-level effects
  cholesky_factor_corr[M_1] L_1; // cholesky factor of correlation matrix
}
transformed parameters {
  matrix[N_1, M_1] r_1; // actual group-level effects
  // using vectors speeds up indexing in loops
  vector[N_1] r_1_1;
  vector[N_1] r_1_2;
  vector[N_1] r_1_3;
  vector[N_1] r_1_occ_4;
  vector[N_1] r_1_colo_5;
  vector[N_1] r_1_colo_6;
  vector[N_1] r_1_ex_7;
  vector[N_1] r_1_ex_8;
  real lprior = 0; // prior contributions to the log posterior
  // compute actual group-level effects
  r_1 = scale_r_cor(z_1, sd_1, L_1);
  r_1_1 = r_1[ : , 1];
  r_1_2 = r_1[ : , 2];
  r_1_3 = r_1[ : , 3];
  r_1_occ_4 = r_1[ : , 4];
  r_1_colo_5 = r_1[ : , 5];
  r_1_colo_6 = r_1[ : , 6];
  r_1_ex_7 = r_1[ : , 7];
  r_1_ex_8 = r_1[ : , 8];
  lprior += normal_lpdf(b | 0, 2);
  lprior += normal_lpdf(Intercept | 0, 2);
  lprior += normal_lpdf(Intercept_occ | 0, 2);
  lprior += std_normal_lpdf(b_colo);
  lprior += normal_lpdf(Intercept_colo | 0, 2);
  lprior += std_normal_lpdf(b_ex);
  lprior += normal_lpdf(Intercept_ex | 0, 2);
  lprior += student_t_lpdf(sd_1 | 3, 0, 2.5)
            - 8 * student_t_lccdf(0 | 3, 0, 2.5);
  lprior += lkj_corr_cholesky_lpdf(L_1 | 1);
}
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] mu = Intercept + Xc * b;
    // initialize linear predictor term
    vector[N] occ = Intercept_occ + rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] colo = Intercept_colo + Xc_colo * b_colo;
    // initialize linear predictor term
    vector[N] ex = Intercept_ex + Xc_ex * b_ex;
    for (n in 1 : N) {
      // add more terms to the linear predictor
      mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_1_2[J_1[n]] * Z_1_2[n]
               + r_1_3[J_1[n]] * Z_1_3[n];
    }
    for (n in 1 : N) {
      // add more terms to the linear predictor
      occ[n] += r_1_occ_4[J_1[n]] * Z_1_occ_4[n];
    }
    for (n in 1 : N) {
      // add more terms to the linear predictor
      colo[n] += r_1_colo_5[J_1[n]] * Z_1_colo_5[n]
                 + r_1_colo_6[J_1[n]] * Z_1_colo_6[n];
    }
    for (n in 1 : N) {
      // add more terms to the linear predictor
      ex[n] += r_1_ex_7[J_1[n]] * Z_1_ex_7[n]
               + r_1_ex_8[J_1[n]] * Z_1_ex_8[n];
    }
    target += occupancy_multi_colex_fp_lpdf(Y | mu, occ, colo, ex, vint1, vint2, vint3, vint4, vint5, vint6, vint7, vint8, vint9, vint10, vint11, vint12);
  }
  // priors including constants
  target += lprior;
  target += std_normal_lpdf(to_vector(z_1));
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept - dot_product(means_X, b);
  // actual population-level intercept
  real b_occ_Intercept = Intercept_occ;
  // actual population-level intercept
  real b_colo_Intercept = Intercept_colo - dot_product(means_X_colo, b_colo);
  // actual population-level intercept
  real b_ex_Intercept = Intercept_ex - dot_product(means_X_ex, b_ex);
  // compute group-level correlations
  corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
  vector<lower=-1, upper=1>[NC_1] cor_1;
  // extract upper diagonal of correlation matrix
  for (k in 1 : M_1) {
    for (j in 1 : (k - 1)) {
      cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
    }
  }
}
1 Like

Could you send me the fitted model and such a short description? My first guess is that it could be a consequence of what we discussed at Sum-to-zero intercepts and terminology, but I’m not sure if I got the details of your problem correctly.