Brms: input scaling clarification

I’m trying to understand whether/how brms scales inputs by default. From the generated stan code for a simple model (see below) it looks like by default it does center the inputs, but does not divide them by the standard deviation, is this correct?

I tried looking at the docstring for the parameter in brms(..., normalize) but this didn’t clarify this for me.
(see my comment on an old github issue)

library(tidyverse)
library(brms)
#> Loading required package: Rcpp
#> Loading 'brms' package (version 2.15.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#> 
#> Attaching package: 'brms'
#> The following object is masked from 'package:stats':
#> 
#>     ar
options(brms.backend = "cmdstanr")

dat <- tibble(
  x = rnorm(100, 1:100, seq(0, 10, length.out = 100)),
  y = rnorm(100, -x, 20))

xbar <- mean(dat$x)
xsd <- sd(dat$x)

dats <- dat %>%
  # we scale the parameters by subtracting the mean and dividing by the standard deviation
  # base::scale does this
  mutate(xs = scale(x)[, 1], xs_man = (x - xbar) / xsd,
         ys = scale(y)[, 1], ys_man = (y - mean(y)) / sd(y))

# base::scale is identical to manually doing this
identical(dats$xs, dats$xs_man)
#> [1] TRUE

fit_raw <- brm(data = dat, family = gaussian,
               y ~ x,
               # for simplicity, no priors
               iter = 2000, warmup = 1000, chains = 4, cores = 4,
               save_model = "/tmp/noscaling.stan")
#> Start sampling
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup) 
# …
#> Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling) 
#> Chain 1 finished in 0.0 seconds.
#> Chain 2 finished in 0.0 seconds.
#> Chain 3 finished in 0.0 seconds.
#> Chain 4 finished in 0.0 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.0 seconds.
#> Total execution time: 0.3 seconds.
plot(fit_raw)

fit_scale <- brm(data = dats, family = gaussian,
                 ys ~ xs,
                 # for simplicity, no priors
                 iter = 2000, warmup = 1000, chains = 4, cores = 4,
                 save_model = "/tmp/scaling.stan")
#> Start sampling
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup) 
# …
#> Chain 1 finished in 0.0 seconds.
#> Chain 2 finished in 0.0 seconds.
#> Chain 3 finished in 0.0 seconds.
#> Chain 4 finished in 0.0 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.0 seconds.
#> Total execution time: 0.2 seconds.
plot(fit_scale)

Created on 2021-07-22 by the reprex package (v2.0.0)

The /tmp/scaling.stan file and the /tmp/noscaling.stan file both have this identical section:

transformed data {
  int Kc = K - 1;
  matrix[N, Kc] Xc;  // centered version of X without an intercept
  vector[Kc] means_X;  // column means of X before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }
}

which seems to subtract the means to me?

Am I interpreting this correctly?

If I now have the fitted model to the scaled outputs, it seems like I should be able to convert back to the original scale using something like:

X <- scaled_X * xsd + xbar

R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux
brms_2.15.0

1 Like

Bump: can anybody help me clarify what scaling/centering brms does by default, and what the normalize argument means?

Hi,
sorry for not getting to you earlier.

The transformation is described under “Parameterization of the population-level intercept” at the docs for brmsformula

That is IMHO correct, but note that it centers only the predictors, not the response data.

Note that centering the design matrix (except for intercept) does NOT change the values of any model coefficients except for the intercept - the coefficients for non-intercept predictors represent change in prediction per unit change in the predictor and thus are invariant to any shifts of the predictor values.

The b_Intercept reported in the model is already shifted back, so it has to be interpreted as if centering did not take place (you can also extract the Intercept parameter which is what the model actually works with)

Also note that for most tasks, you can use posterior_predict/posterior_linpred / posterior_epred that handle a lot of this stuff for you.

The normalize option is completely unrelated to data pre-processing (it selects whether normalized or unormalized density functions are used e.g. normal_lpdf vs normal_lupdf).

Best of luck wtih your model!

1 Like

Thank you so much for the elaborate and complete reply! :)