# Using design matrices in stan (based on brms code)

I know I shouldn’t treat this forum like a free course in coding stan but, in order to improve my modeling in stan, I am trying to understand some of the stan code generated by the brm function.

I have a very simple dataset, 20 flips each of two biased coins. Here is the toy data.

``````set.seed(1234)
df <- data.frame(coins = factor(rep(letters[1:2],each=20)),
flips = c(rbinom(n = 20,
size = 1,
prob = 0.2),
rbinom(n = 20,
size = 1,
prob = 0.7)))
``````

I want to estimate the difference in bias of the two coins so I run a bernoulli regression.

``````brmMod <- brms::brm(formula = flips ~ coins,
data = df,

brmMod

# output
# Population-Level Effects:
#           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept    -1.86      0.64    -3.21    -0.72 1.00     2316     2754
# coinsb        2.75      0.82     1.21     4.45 1.00     2526     2378
``````

Now when I look at the underlying stan code generated by brm() I get the following

``````brms::stancode(brmMod)

functions {
}
data {
int<lower=1> N;  // total number of observations
int Y[N];  // response variable
int<lower=1> K;  // number of population-level effects
matrix[N, K] X;  // population-level design matrix
int prior_only;  // should the likelihood be ignored?
}
transformed data {
int Kc = K - 1;
matrix[N, Kc] Xc;  // centered version of X without an intercept
vector[Kc] means_X;  // column means of X before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
parameters {
vector[Kc] b;  // population-level effects
real Intercept;  // temporary intercept for centered predictors
}
transformed parameters {
}
model {
// likelihood including all constants
if (!prior_only) {
target += bernoulli_logit_glm_lpmf(Y | Xc, Intercept, b);
}
// priors including all constants
target += student_t_lpdf(Intercept | 3, 0, 2.5);
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept - dot_product(means_X, b);
}
``````

Now I think I understand most of this code and I tried transposing it using rstan like so

``````# create data
data <- list(N = nrow(df),
Y = df\$flips,
K = 2,
X = model.matrix(flips ~ coins, data = df))

# create model string
ms <- "
data{
int<lower=1> N; // total number of observations
int Y[N];       // response variable
int<lower=1> K; // number of population-level effects
matrix[N,K] X;  // population-level design matrix
}

transformed data{
int Kc = K-1;
matrix[N, Kc] Xc;   //centered version of X without an intercept
vector[Kc] means_X; //column means of X before centering
for (i in 2:K) {
means_X[i-1] = mean(X[,i]);
Xc[,i-1] = X[,i] - means_X[i-1];
}
}

parameters{

vector[Kc] b;    // population-level effects
real Intercept;  // temporary intercept for centered predictors

}

model{

// likelihood including all constants
target += bernoulli_logit_glm_lpmf(Y | Xc, Intercept, b);

// priors including all constants
target += student_t_lpdf(Intercept | 3, 0, 2.5);

}

generated quantities{

//actual population-level intercept
real b_Intercept = Intercept - dot_product(means_X, b);

}
"

# fit the model
stanMod_binom <- stan(model_code = ms,
data = data,
warmup = 1e3,
iter = 2e3,
chains = 4)

stanMod_binom

# output
#               mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
# b[1]          2.74    0.02 0.83   1.22   2.18   2.70   3.28   4.52  2388    1
# Intercept    -0.46    0.01 0.42  -1.31  -0.72  -0.45  -0.16   0.32  2598    1
# b_Intercept  -1.83    0.01 0.66  -3.25  -2.23  -1.78  -1.37  -0.70  2090    1
# lp__        -23.66    0.02 1.03 -26.45 -24.05 -23.35 -22.93 -22.64  1759    1
``````

As you can see the output is similar, with similar estimates for the intercept and b effects, except there is an extra term in the output. I don’t know why this is happening. I assume it is something to do with the way I specified the design matrix. Could someone help me out?

The `Intercept` entry in your final output is actually the intercept on a standardized scale while the `b_Intercept` entry is what `brms` is showing you as `Intercept`. They are simply multiples of one another.

1 Like

Thanks @mike-lawrence. Embarrassed to say I don’t know where in the code I specified the standardising of the intercept (this is what comes of rote copying). Where in the model string does that happen? Does the brm() function remove the standardised version automatically?

That happens in the transformed data section.

You might want to check out my lectures here, maybe starting with this one which covers contrast matrices & standardization.

2 Likes

Thanks @mike-lawrence. I have actually worked through all of them, slowly and carefully. They are a brilliant resource and so helpful. From memory you tend to standardise outside stan? Anyway my desire to unpack brms code is mainly to try and get my head around non-centered parameterisation. It doesn’t apply to this example obviously but I thought I would start with the simplest models, get a grasp on the underlying code for those, then move onto hierarchical models.

If you start watching at 1:21:30 you’ll see an example of the standardization in Stan rather than R.

1 Like
2 Likes

Oh yes that’s right you did do it in stan. I need to watch your lectures again. Thanks again for posting them.

1 Like