Using design matrices in stan (based on brms code)

I know I shouldn’t treat this forum like a free course in coding stan but, in order to improve my modeling in stan, I am trying to understand some of the stan code generated by the brm function.

I have a very simple dataset, 20 flips each of two biased coins. Here is the toy data.

set.seed(1234)
df <- data.frame(coins = factor(rep(letters[1:2],each=20)),
                 flips = c(rbinom(n = 20,
                                  size = 1, 
                                  prob = 0.2),
                           rbinom(n = 20,
                                  size = 1,
                                  prob = 0.7)))

I want to estimate the difference in bias of the two coins so I run a bernoulli regression.

brmMod <- brms::brm(formula = flips ~ coins,
                    data = df,
                    family = bernoulli(link = "logit"))

brmMod

# output
# Population-Level Effects: 
#           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept    -1.86      0.64    -3.21    -0.72 1.00     2316     2754
# coinsb        2.75      0.82     1.21     4.45 1.00     2526     2378

Now when I look at the underlying stan code generated by brm() I get the following

brms::stancode(brmMod)

functions {
}
data {
  int<lower=1> N;  // total number of observations
  int Y[N];  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  int Kc = K - 1;
  matrix[N, Kc] Xc;  // centered version of X without an intercept
  vector[Kc] means_X;  // column means of X before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }
}
parameters {
  vector[Kc] b;  // population-level effects
  real Intercept;  // temporary intercept for centered predictors
}
transformed parameters {
}
model {
  // likelihood including all constants
  if (!prior_only) {
    target += bernoulli_logit_glm_lpmf(Y | Xc, Intercept, b);
  }
  // priors including all constants
  target += student_t_lpdf(Intercept | 3, 0, 2.5);
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept - dot_product(means_X, b);
}

Now I think I understand most of this code and I tried transposing it using rstan like so

# create data
data <- list(N = nrow(df),
             Y = df$flips,
             K = 2,
             X = model.matrix(flips ~ coins, data = df))

# create model string 
ms <- "
  data{
    int<lower=1> N; // total number of observations
    int Y[N];       // response variable
    int<lower=1> K; // number of population-level effects
    matrix[N,K] X;  // population-level design matrix
  }
  
  transformed data{
    int Kc = K-1;
    matrix[N, Kc] Xc;   //centered version of X without an intercept
    vector[Kc] means_X; //column means of X before centering
    for (i in 2:K) {
      means_X[i-1] = mean(X[,i]);
      Xc[,i-1] = X[,i] - means_X[i-1];
    }
  }
  
  parameters{
  
    vector[Kc] b;    // population-level effects
    real Intercept;  // temporary intercept for centered predictors
    
  }
  
  model{
  
    // likelihood including all constants
    target += bernoulli_logit_glm_lpmf(Y | Xc, Intercept, b);
    
    // priors including all constants
    target += student_t_lpdf(Intercept | 3, 0, 2.5);
    
  }
  
  generated quantities{
  
    //actual population-level intercept
    real b_Intercept = Intercept - dot_product(means_X, b);
    
  }
"

# fit the model
stanMod_binom <- stan(model_code = ms,
                      data = data,
                      warmup = 1e3,
                      iter = 2e3,
                      chains = 4)

stanMod_binom

# output
#               mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
# b[1]          2.74    0.02 0.83   1.22   2.18   2.70   3.28   4.52  2388    1
# Intercept    -0.46    0.01 0.42  -1.31  -0.72  -0.45  -0.16   0.32  2598    1
# b_Intercept  -1.83    0.01 0.66  -3.25  -2.23  -1.78  -1.37  -0.70  2090    1
# lp__        -23.66    0.02 1.03 -26.45 -24.05 -23.35 -22.93 -22.64  1759    1

As you can see the output is similar, with similar estimates for the intercept and b effects, except there is an extra term in the output. I don’t know why this is happening. I assume it is something to do with the way I specified the design matrix. Could someone help me out?

The Intercept entry in your final output is actually the intercept on a standardized scale while the b_Intercept entry is what brms is showing you as Intercept. They are simply multiples of one another.

1 Like

Thanks @mike-lawrence. Embarrassed to say I don’t know where in the code I specified the standardising of the intercept (this is what comes of rote copying). Where in the model string does that happen? Does the brm() function remove the standardised version automatically?

That happens in the transformed data section.

You might want to check out my lectures here, maybe starting with this one which covers contrast matrices & standardization.

2 Likes

Thanks @mike-lawrence. I have actually worked through all of them, slowly and carefully. They are a brilliant resource and so helpful. From memory you tend to standardise outside stan? Anyway my desire to unpack brms code is mainly to try and get my head around non-centered parameterisation. It doesn’t apply to this example obviously but I thought I would start with the simplest models, get a grasp on the underlying code for those, then move onto hierarchical models.

If you start watching at 1:21:30 you’ll see an example of the standardization in Stan rather than R.

1 Like

Oh, and see also my post showing how to code a model that has both centered and non-centered parameterizations, with selection of which to run done at runtime via data.

2 Likes

Oh yes that’s right you did do it in stan. I need to watch your lectures again. Thanks again for posting them.

1 Like