Using design matrices in stan (based on brms code)

llewmills · January 1, 2021, 11:05pm

I know I shouldn’t treat this forum like a free course in coding stan but, in order to improve my modeling in stan, I am trying to understand some of the stan code generated by the brm function.

I have a very simple dataset, 20 flips each of two biased coins. Here is the toy data.

set.seed(1234)
df <- data.frame(coins = factor(rep(letters[1:2],each=20)),
                 flips = c(rbinom(n = 20,
                                  size = 1, 
                                  prob = 0.2),
                           rbinom(n = 20,
                                  size = 1,
                                  prob = 0.7)))

I want to estimate the difference in bias of the two coins so I run a bernoulli regression.

brmMod <- brms::brm(formula = flips ~ coins,
                    data = df,
                    family = bernoulli(link = "logit"))

brmMod

# output
# Population-Level Effects: 
#           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept    -1.86      0.64    -3.21    -0.72 1.00     2316     2754
# coinsb        2.75      0.82     1.21     4.45 1.00     2526     2378

Now when I look at the underlying stan code generated by brm() I get the following

brms::stancode(brmMod)

functions {
}
data {
  int<lower=1> N;  // total number of observations
  int Y[N];  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  int Kc = K - 1;
  matrix[N, Kc] Xc;  // centered version of X without an intercept
  vector[Kc] means_X;  // column means of X before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }
}
parameters {
  vector[Kc] b;  // population-level effects
  real Intercept;  // temporary intercept for centered predictors
}
transformed parameters {
}
model {
  // likelihood including all constants
  if (!prior_only) {
    target += bernoulli_logit_glm_lpmf(Y | Xc, Intercept, b);
  }
  // priors including all constants
  target += student_t_lpdf(Intercept | 3, 0, 2.5);
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept - dot_product(means_X, b);
}

Now I think I understand most of this code and I tried transposing it using rstan like so

# create data
data <- list(N = nrow(df),
             Y = df$flips,
             K = 2,
             X = model.matrix(flips ~ coins, data = df))

# create model string 
ms <- "
  data{
    int<lower=1> N; // total number of observations
    int Y[N];       // response variable
    int<lower=1> K; // number of population-level effects
    matrix[N,K] X;  // population-level design matrix
  }
  
  transformed data{
    int Kc = K-1;
    matrix[N, Kc] Xc;   //centered version of X without an intercept
    vector[Kc] means_X; //column means of X before centering
    for (i in 2:K) {
      means_X[i-1] = mean(X[,i]);
      Xc[,i-1] = X[,i] - means_X[i-1];
    }
  }
  
  parameters{
  
    vector[Kc] b;    // population-level effects
    real Intercept;  // temporary intercept for centered predictors
    
  }
  
  model{
  
    // likelihood including all constants
    target += bernoulli_logit_glm_lpmf(Y | Xc, Intercept, b);
    
    // priors including all constants
    target += student_t_lpdf(Intercept | 3, 0, 2.5);
    
  }
  
  generated quantities{
  
    //actual population-level intercept
    real b_Intercept = Intercept - dot_product(means_X, b);
    
  }
"

# fit the model
stanMod_binom <- stan(model_code = ms,
                      data = data,
                      warmup = 1e3,
                      iter = 2e3,
                      chains = 4)

stanMod_binom

# output
#               mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
# b[1]          2.74    0.02 0.83   1.22   2.18   2.70   3.28   4.52  2388    1
# Intercept    -0.46    0.01 0.42  -1.31  -0.72  -0.45  -0.16   0.32  2598    1
# b_Intercept  -1.83    0.01 0.66  -3.25  -2.23  -1.78  -1.37  -0.70  2090    1
# lp__        -23.66    0.02 1.03 -26.45 -24.05 -23.35 -22.93 -22.64  1759    1

As you can see the output is similar, with similar estimates for the intercept and b effects, except there is an extra term in the output. I don’t know why this is happening. I assume it is something to do with the way I specified the design matrix. Could someone help me out?

mike-lawrence · January 4, 2021, 1:11am

The Intercept entry in your final output is actually the intercept on a standardized scale while the b_Intercept entry is what brms is showing you as Intercept. They are simply multiples of one another.

llewmills · January 4, 2021, 4:11am

Thanks @mike-lawrence. Embarrassed to say I don’t know where in the code I specified the standardising of the intercept (this is what comes of rote copying). Where in the model string does that happen? Does the brm() function remove the standardised version automatically?

mike-lawrence · January 4, 2021, 2:22pm

That happens in the transformed data section.

You might want to check out my lectures here, maybe starting with this one which covers contrast matrices & standardization.

llewmills · January 4, 2021, 2:46pm

Thanks @mike-lawrence. I have actually worked through all of them, slowly and carefully. They are a brilliant resource and so helpful. From memory you tend to standardise outside stan? Anyway my desire to unpack brms code is mainly to try and get my head around non-centered parameterisation. It doesn’t apply to this example obviously but I thought I would start with the simplest models, get a grasp on the underlying code for those, then move onto hierarchical models.

mike-lawrence · January 4, 2021, 3:11pm

If you start watching at 1:21:30 you’ll see an example of the standardization in Stan rather than R.

mike-lawrence · January 4, 2021, 3:15pm

Oh, and see also my post showing how to code a model that has both centered and non-centered parameterizations, with selection of which to run done at runtime via data.

llewmills · January 4, 2021, 10:45pm

Oh yes that’s right you did do it in stan. I need to watch your lectures again. Thanks again for posting them.

Topic		Replies	Views
Reproducing brms results in rstan yields different estimates brms	7	847	July 30, 2020
Brms: non-centered to centered parameterization brms	8	933	October 3, 2023
Understand how `brms` parametrizes random effects brms	7	3206	February 10, 2019
Stan and brms output comparison brms fitting-issues , specification	2	516	November 11, 2021
Survival analysis with "brms" - what's the design matrix? Modeling brms	3	892	April 18, 2021

Using design matrices in stan (based on brms code)

Related topics