Suppose I had a linear model: y_i \sim N(\alpha + \beta x_i, \sigma_y). I want to fix the intercept to 5, implying the posterior E(y|x=0) = 5 with unknown variance. Below I show 3 different ways one might think about doing this.
- Approach 1 enters
5
as a value directly in the likelihood (model block). - Approach 2 declares and assigns a parameter the value
5
in transformed parameters then includes that parameter in the likelihood. - Approach 3 treats the value as transformed data and adds it to the likelihood.
Are the 3 approaches below equivalent (and correct in not requiring Jacobians)? Is one of them preferred to the other computationally?
Approach 1: Included directly in the likelihood (model block)
data{
vector x;
vector y;
}
parameters{
real beta;
real sigma;
}
model{
y ~ normal(5 + beta*x, sigma); // hard-coded, where alpha = 5
sigma ~ cauchy(0,1);
beta ~ normal(0,1);
}
Approach 2: Using transformed parameters block
data{
vector x;
vector y;
}
parameters{
real beta;
real sigma;
}
transformed parameters{
real alpha = 5; // declare and assign fixed value
model{
y ~ normal(alpha + beta*x, sigma); // includes alpha
sigma ~ cauchy(0,1);
beta ~ normal(0,1);
}
Approach 3: Using transformed data block
data{
vector x;
vector y;
}
transformed data{
real alpha = 5; // declare and assign fixed value
}
parameters{
real beta;
real sigma;
}
model{
y ~ normal(alpha + beta*x, sigma); // includes alpha
sigma ~ cauchy(0,1);
beta ~ normal(0,1);
}
How brms
does it:
Based on a test of how brms
implemented its “constant” prior to fix coefficients, it looks like it uses approach 2. For example, the below shows the underlying stan code for a brms
model y \sim N(\alpha + \beta_1 x + \beta_2 z + \beta_3 w, \sigma) with the “constant” priors:
functions {
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
int prior_only; // should the likelihood be ignored?
}
transformed data {
int Kc = K - 1;
matrix[N, Kc] Xc; // centered version of X without an intercept
vector[Kc] means_X; // column means of X before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
parameters {
real Intercept; // temporary intercept for centered predictors
real<lower=0> sigma; // dispersion parameter
}
transformed parameters {
vector[Kc] b; // population-level effects
real lprior = 0; // prior contributions to the log posterior
b[1] = 0;
b[2] = 1;
b[3] = 2;
lprior += student_t_lpdf(Intercept | 3, 0, 2.5);
lprior += student_t_lpdf(sigma | 3, 0, 2.5)
- 1 * student_t_lccdf(0 | 3, 0, 2.5);
}
model {
// likelihood including constants
if (!prior_only) {
target += normal_id_glm_lpdf(Y | Xc, Intercept, b, sigma);
}
// priors including constants
target += lprior;
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept - dot_product(means_X, b);
}
Here’s the R code to reproduce this:
library(brms)
# Example data
N <- 1000
df <- data.frame(
y = rnorm(N),
x = sample(1:10, size = N, replace = T),
z = sample(1:2, size = N, replace = T),
w = runif(N)
)
# User-specified constants as priors
ex_priors <- c( prior(constant(0), class = "b", coef = "x"),
prior(constant(1), class = "b", coef = "z"),
prior(constant(2), class = "b", coef = "w") )
# Fit model (ignore warnings, just an example)
mod <- brms::brm(
formula = y ~ x + z + w,
data = df, iter = 10, chains = 1,
prior = ex_priors)
# Check priors
prior_summary(mod)
# Inspect Stan code
stancode(mod)