I’m trying to understand whether/how brms scales inputs by default. From the generated stan code for a simple model (see below) it looks like by default it does center the inputs, but does not divide them by the standard deviation, is this correct?
I tried looking at the docstring for the parameter in brms(..., normalize)
but this didn’t clarify this for me.
(see my comment on an old github issue)
library(tidyverse)
library(brms)
#> Loading required package: Rcpp
#> Loading 'brms' package (version 2.15.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#>
#> Attaching package: 'brms'
#> The following object is masked from 'package:stats':
#>
#> ar
options(brms.backend = "cmdstanr")
dat <- tibble(
x = rnorm(100, 1:100, seq(0, 10, length.out = 100)),
y = rnorm(100, -x, 20))
xbar <- mean(dat$x)
xsd <- sd(dat$x)
dats <- dat %>%
# we scale the parameters by subtracting the mean and dividing by the standard deviation
# base::scale does this
mutate(xs = scale(x)[, 1], xs_man = (x - xbar) / xsd,
ys = scale(y)[, 1], ys_man = (y - mean(y)) / sd(y))
# base::scale is identical to manually doing this
identical(dats$xs, dats$xs_man)
#> [1] TRUE
fit_raw <- brm(data = dat, family = gaussian,
y ~ x,
# for simplicity, no priors
iter = 2000, warmup = 1000, chains = 4, cores = 4,
save_model = "/tmp/noscaling.stan")
#> Start sampling
#> Running MCMC with 4 parallel chains...
#>
#> Chain 1 Iteration: 1 / 2000 [ 0%] (Warmup)
# …
#> Chain 4 Iteration: 2000 / 2000 [100%] (Sampling)
#> Chain 1 finished in 0.0 seconds.
#> Chain 2 finished in 0.0 seconds.
#> Chain 3 finished in 0.0 seconds.
#> Chain 4 finished in 0.0 seconds.
#>
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.0 seconds.
#> Total execution time: 0.3 seconds.
plot(fit_raw)
fit_scale <- brm(data = dats, family = gaussian,
ys ~ xs,
# for simplicity, no priors
iter = 2000, warmup = 1000, chains = 4, cores = 4,
save_model = "/tmp/scaling.stan")
#> Start sampling
#> Running MCMC with 4 parallel chains...
#>
#> Chain 1 Iteration: 1 / 2000 [ 0%] (Warmup)
# …
#> Chain 1 finished in 0.0 seconds.
#> Chain 2 finished in 0.0 seconds.
#> Chain 3 finished in 0.0 seconds.
#> Chain 4 finished in 0.0 seconds.
#>
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.0 seconds.
#> Total execution time: 0.2 seconds.
plot(fit_scale)
Created on 2021-07-22 by the reprex package (v2.0.0)
The /tmp/scaling.stan
file and the /tmp/noscaling.stan
file both have this identical section:
transformed data {
int Kc = K - 1;
matrix[N, Kc] Xc; // centered version of X without an intercept
vector[Kc] means_X; // column means of X before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
which seems to subtract the means to me?
Am I interpreting this correctly?
If I now have the fitted model to the scaled outputs, it seems like I should be able to convert back to the original scale using something like:
X <- scaled_X * xsd + xbar
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux
brms_2.15.0