# Three different ways of trying to fix a parameter value in Stan

Suppose I had a linear model: y_i \sim N(\alpha + \beta x_i, \sigma_y). I want to fix the intercept to 5, implying the posterior E(y|x=0) = 5 with unknown variance. Below I show 3 different ways one might think about doing this.

• Approach 1 enters 5 as a value directly in the likelihood (model block).
• Approach 2 declares and assigns a parameter the value 5 in transformed parameters then includes that parameter in the likelihood.
• Approach 3 treats the value as transformed data and adds it to the likelihood.

Are the 3 approaches below equivalent (and correct in not requiring Jacobians)? Is one of them preferred to the other computationally?

# Approach 1: Included directly in the likelihood (model block)

data{
vector x;
vector y;
}

parameters{
real beta;
real sigma;
}

model{
y ~ normal(5 + beta*x, sigma); // hard-coded, where alpha = 5
sigma ~ cauchy(0,1);
beta ~ normal(0,1);
}


# Approach 2: Using transformed parameters block

data{
vector x;
vector y;
}

parameters{
real beta;
real sigma;
}

transformed parameters{
real alpha = 5; // declare and assign fixed value

model{
y ~ normal(alpha + beta*x, sigma); // includes alpha
sigma ~ cauchy(0,1);
beta ~ normal(0,1);
}


# Approach 3: Using transformed data block

data{
vector x;
vector y;
}

transformed data{
real alpha = 5; // declare and assign fixed value
}

parameters{
real beta;
real sigma;
}

model{
y ~ normal(alpha + beta*x, sigma);  // includes alpha
sigma ~ cauchy(0,1);
beta ~ normal(0,1);
}


# How brms does it:

Based on a test of how brms implemented its â€śconstantâ€ť prior to fix coefficients, it looks like it uses approach 2. For example, the below shows the underlying stan code for a brms model y \sim N(\alpha + \beta_1 x + \beta_2 z + \beta_3 w, \sigma) with the â€śconstantâ€ť priors:

functions {
}
data {
int<lower=1> N;  // total number of observations
vector[N] Y;  // response variable
int<lower=1> K;  // number of population-level effects
matrix[N, K] X;  // population-level design matrix
int prior_only;  // should the likelihood be ignored?
}
transformed data {
int Kc = K - 1;
matrix[N, Kc] Xc;  // centered version of X without an intercept
vector[Kc] means_X;  // column means of X before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
parameters {
real Intercept;  // temporary intercept for centered predictors
real<lower=0> sigma;  // dispersion parameter
}
transformed parameters {
vector[Kc] b;  // population-level effects
real lprior = 0;  // prior contributions to the log posterior
b[1] = 0;
b[2] = 1;
b[3] = 2;
lprior += student_t_lpdf(Intercept | 3, 0, 2.5);
lprior += student_t_lpdf(sigma | 3, 0, 2.5)
- 1 * student_t_lccdf(0 | 3, 0, 2.5);
}
model {
// likelihood including constants
if (!prior_only) {
target += normal_id_glm_lpdf(Y | Xc, Intercept, b, sigma);
}
// priors including constants
target += lprior;
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept - dot_product(means_X, b);
}


Hereâ€™s the R code to reproduce this:

library(brms)

# Example data
N <- 1000
df <- data.frame(
y = rnorm(N),
x = sample(1:10, size = N, replace = T),
z = sample(1:2, size = N, replace = T),
w = runif(N)
)

# User-specified constants as priors
ex_priors <- c( prior(constant(0), class = "b", coef = "x"),
prior(constant(1), class = "b", coef = "z"),
prior(constant(2), class = "b", coef = "w") )

# Fit model (ignore warnings, just an example)
mod <- brms::brm(
formula = y ~ x + z + w,
data = df, iter = 10, chains = 1,
prior = ex_priors)

# Check priors
prior_summary(mod)

# Inspect Stan code
stancode(mod)

1 Like

While all three approaches imply an identical likelihood, Approaches 1 and 3 are the most computationally-efficient. When constructing something in the transformed parameters block, that quantity is recomputed/reconstructed at every iteration.

Note that brms is coded to allow for maximum flexibility in constructing models, rather than maximum performance, so itâ€™s not always the best guide for maximally-optimised model construction

5 Likes