Discrepancy between nominal and realised prior predictive distributions

Dear all, I noticed a considerable discrepancy between nominal priors (intended priors, theoretical or generated with RNG) and the actually sampled priors. I understand how both methods can differ numerically, but what I observe seems excessive. Perhaps I am misunderstanding something.
The following MRE illustrates the issue well, I hope:

library(brms, quietly = TRUE)
#> Loading 'brms' package (version 2.13.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#> 
#> Attaching package: 'brms'
#> The following object is masked from 'package:stats':
#> 
#>     ar
library(tidyverse, quietly = TRUE)

dummy_dat <- with(
  list(N = 5e3),

  data.frame(
    mu = 1,
    x = gl(2, N/2),
    y = 0   # irrelevant, but necessary
  )
)

pred_priors <-
  brm(y ~ 1 + x,
      sample_prior = "only",
      prior = c(
        set_prior("student_t(3, 0, 2.5)", class = "Intercept"),
        set_prior("skew_normal(-1, 1, -4)", class = "b")
      ),
      data = dummy_dat,
      iter = 5e3,
      refresh = FALSE
  )
#> Compiling the C++ model
#> Start sampling


pred_priors %>%
  posterior_samples(pars = "b_") %>%
  add_column(
    nominal_int = rstudent_t(nrow(.), 3, 0, 2.5),
    nominal_x = rskew_normal(nrow(.), -1, 1, -4)
  ) %>%
  gather(sample, value) %>%
  ggplot(aes(value, colour = sample)) +
  geom_vline(xintercept = 0, colour = "grey80") +
  geom_density() +
  scale_x_continuous(limits = c(-10, 10))
#> Warning: Removed 556 rows containing non-finite values (stat_density).

Created on 2020-06-22 by the reprex package (v0.3.0)

I expect the curves would match more closely. In particular, for the Skew-Normal prior, I expected to get some probability above 0 but the support of the realised prior is entirely below 0. This raises the question of which prior I am really using: the one that I specified or the one that has been realised (I guess the latter?).

Thanks in advance for your insights.
Ć’acu.-

1 Like

There could be some centering and such of the data going in to the model.

To see exactly what brms is doing with its priors/data centering/whatnot, you can use make_stancode(...) where the ... are the arguments you’d normally pass to brm. make_stancode is super-useful for understanding things like this.

Dear Ben,

thanks for your response! I didn’t know about make_stancode() but I had checked the stancode() of the output of brm(), which is the same, ultimately. I paste it below for reference.

Indeed, there is a centering of the design matrix (without intercept) performed. However, I fail to see how that causes the shift. The interpretation of the coefficients does not change by centering the design matrix. The intercept does, however. But the priors for the various coefficients are independent.

In any case, I am confused as to which priors I am actually using. This is not generally a problem with vague priors, but it becomes very important when you are carefully crafting prior distributions based on prior knowledge!

// generated with brms 2.13.0
functions {
}
data {
  int<lower=1> N;  // number of observations
  vector[N] Y;  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  int Kc = K - 1;
  matrix[N, Kc] Xc;  // centered version of X without an intercept
  vector[Kc] means_X;  // column means of X before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }
}
parameters {
  vector[Kc] b;  // population-level effects
  real Intercept;  // temporary intercept for centered predictors
  real<lower=0> sigma;  // residual SD
}
transformed parameters {
}
model {
  // priors including all constants
  target += skew_normal_lpdf(b | -1, 1, -4);
  target += student_t_lpdf(Intercept | 3, 0, 2.5);
  target += student_t_lpdf(sigma | 3, 0, 2.5)
    - 1 * student_t_lccdf(0 | 3, 0, 2.5);
  // likelihood including all constants
  if (!prior_only) {
    target += normal_id_glm_lpdf(Y | Xc, Intercept, b, sigma);
  }
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept - dot_product(means_X, b);
}

I figured that a way of testing for this candidate explanation (i.e. internal centering of covariables in Stan) was to provide a centered design matrix in the first place. To gain further insight I included a second covariable (also centered) but with a Student-t prior centered on 1.

The results are interesting. Centering the covariables in advance fixed the lag between the nominal and realised priors for the Intercept but not for the covariable with Skew-Normal prior. The second covariable with Student-t prior shows no lag, despite not being centered at 0.

This seems to suggest an issue with the Skew-Normal prior.

Here is the full code for reproducibility, including the part already posted before.

library(brms, quietly = TRUE)
#> Loading 'brms' package (version 2.13.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#> 
#> Attaching package: 'brms'
#> The following object is masked from 'package:stats':
#> 
#>     ar
library(tidyverse, quietly = TRUE)

dummy_dat <- with(
  list(N = 5e3),

  data.frame(
    mu = 1,
    x = gl(2, N/2),
    y = 0   # irrelevant, but necessary
  )
)

pred_priors <-
  brm(
    y ~ 1 + x,
    sample_prior = "only",
    prior = c(
      set_prior("student_t(3, 0, 2.5)", class = "Intercept"),
      set_prior("skew_normal(-1, 1, -4)", class = "b")
    ),
    data = dummy_dat,
    iter = 5e3,
    refresh = FALSE
  )
#> Compiling the C++ model
#> Start sampling


pred_priors %>%
  posterior_samples(pars = "b_") %>%
  add_column(
    nominal_int = rstudent_t(nrow(.), 3, 0, 2.5),
    nominal_x = rskew_normal(nrow(.), -1, 1, -4)
  ) %>%
  gather(sample, value) %>%
  ggplot(aes(value, colour = sample)) +
  geom_vline(xintercept = 0, colour = "grey80") +
  geom_density() +
  scale_x_continuous(limits = c(-10, 10))
#> Warning: Removed 591 rows containing non-finite values (stat_density).




centered_dat <-
  dummy_dat %>%
  mutate(
    xc = scale(as.numeric(dummy_dat$x), center = TRUE, scale = FALSE),
    xc2 = xc
  )


pred_priorsc <-
  brm(
    y ~ 1 + xc + xc2,
    sample_prior = "only",
    prior = c(
      set_prior("student_t(3, 0, 2.5)", class = "Intercept"),
      set_prior("skew_normal(-1, 1, -4)", class = "b", coef = "xc"),
      set_prior("student_t(3, 1, 1.5)", class = "b", coef = "xc2")
    ),
    data = centered_dat,
    iter = 5e3,
    refresh = FALSE
  )
#> Compiling the C++ model
#> Start sampling


pred_priorsc %>%
  posterior_samples(pars = "b_") %>%
  add_column(
    nominal_int = rstudent_t(nrow(.), 3, 0, 2.5),
    nominal_xc = rskew_normal(nrow(.), -1, 1, -4),
    nominal_xc2 = rstudent_t(nrow(.), 3, 1, 1.5)
  ) %>%
  gather(sample, value) %>%
  ggplot(aes(value, colour = sample)) +
  geom_vline(xintercept = 0, colour = "grey80") +
  geom_density() +
  scale_x_continuous(limits = c(-10, 10))
#> Warning: Removed 743 rows containing non-finite values (stat_density).

Created on 2020-06-23 by the reprex package (v0.3.0)

1 Like

I’ll have a look at this tomorrow! Thanks for getting back on it.

Thanks @famuvie, this definitely looks like a bug in however the rskew_normal function works. I made a bug report for brms over here: https://github.com/paul-buerkner/brms/issues/936

Thank you Ben.

To wrap up the discussion at GitHub, there is no bug, but a confusion in parameterisation. One has to be aware that brms::set_prior() uses the parameterisation in the Stan language (as stated in the doc, by the way), not the default parameterisation in brms itself. Brief illustration below.

library(brms, quietly = TRUE)
library(tidyverse, quietly = TRUE)

pred_sn <-
  brm(
    y ~ 1,
    sample_prior = "only",
    prior = c(
      ## Watch out! parameterisation skew_normal: xi, omega, alpha
      set_prior("skew_normal(-1, 1, -4)", class = "Intercept")
    ),
    data = data.frame(y = rep(0, 5e3)),
    iter = 5e3,
    refresh = FALSE
  )
#> Compiling the C++ model
#> Start sampling


pred_sn %>%
  posterior_samples(pars = "b_Intercept") %>%
  add_column(
    nominal_off = rskew_normal(nrow(.), -1, 1, alpha = -4),
    nominal_ok = rskew_normal(nrow(.), xi = -1, omega = 1, alpha = -4)
  ) %>%
  gather(sample, value) %>%
  ggplot(aes(value, colour = sample)) +
  geom_vline(xintercept = 0, colour = "grey80") +
  geom_density() +
  scale_x_continuous(limits = c(-10, 10))

Created on 2020-06-25 by the reprex package (v0.3.0)

1 Like

There is, however, the other issue concerning the centering of the design matrix. While it is not a bug, it doesn’t feel right or at least, unclear.

I specify my prior for the intercept under the interpretation of the “Control” level and my prior for the treatment coefficient under the interpretation of the effect of the treatment.
\mu = \beta_0 + \beta_t t

However, brms quietly reparameterises by centering the design matrix, but without changing the priors accordingly.
\mu = (\beta_0 + \beta_t \bar t) + \beta_t (t - \bar t)
As a result, the actual prior for the Control level is shifted if the prior mean of \beta_t is not 0 (which explains the results observed with the Skew-Normal prior) and has an
increased variance than expected, as shown below.

I’m not sure what is the solution here. I understand that centering by default can be useful, but at the very least the help page of brms::brm should explicitly warn about the centering of the design matrix. Or maybe in the doc for brms::set_prior, or both. I don’t know.

Thanks again for your help and insight.
Ć’acu.-

library(brms, quietly = TRUE)
library(tidyverse, quietly = TRUE)

dummy_dat <- with(
  list(N = 5e3),

  data.frame(
    treatment = rep(0:1, each = N/2),
    y = 0   # irrelevant, but necessary
  )
)

pred_int <-
  brm(
    y ~ 1 + treatment,
    sample_prior = "only",
    prior = c(
      set_prior("normal(0, 1)", class = "Intercept"),
      set_prior("normal(0, 5)", class = "b")
    ),
    data = dummy_dat,
    iter = 5e3,
    refresh = FALSE
  )
#> Compiling the C++ model
#> Start sampling


pred_int %>%
  posterior_samples(pars = "b_Intercept") %>%
  add_column(
    nominal_int = rnorm(nrow(.), 0, 1)
  ) %>%
  gather(sample, value) %>%
  ggplot(aes(value, colour = sample)) +
  geom_vline(xintercept = 0, colour = "grey80") +
  geom_density() +
  scale_x_continuous(limits = c(-5, 5))
#> Warning: Removed 634 rows containing non-finite values (stat_density).

Created on 2020-06-25 by the reprex package (v0.3.0)

Since this is fresh on your mind, probably the best thing to do would just be write a couple sentences that would help someone coming at this in the future, decide where they should go in the brms doc, make an issue on the brms Github (here), and then talk it through with @paul.buerkner. There’s probably something that can be added somewhere to make life easier for future-you (and if you’re having the problems, other people probably are probably having them as well).

Conclusions

In my original post I wondered why realised priors (i.e. actual samples from the prior distributions using the compiled Stan model) did not match more closely a random sample from the nominal priors (i.e. the specified priors) generated manually.

As a simple illustration, I used a Student’s t prior for an intercept coefficient and a Skew-Normal prior for a indicator covariate x \in \{0, 1\}.
image

The two mismatches were due to different reasons.

  1. Intercept: By default, brms generates Stan code that centers the design matrix. This is documented in the help page for brmsformula, sub-section Parameterization of the population-level intercept.

    This effectively changed the interpretation of the intercept. I set a prior for the value of the linear predictor for the control group (x = 0). But brms applied the prior to the value of the population mean. As a result, the realised prior for the control group was shifted (toward higher values, because the prior effect of treatment x = 1 was negative in mean).

    The reason for the reparameterisation if sampling efficiency. Furthermore, Paul argues that it is easier to specify priors for a population mean than for a specific group. However, in the context of control and treatment groups, I find it more natural to specify the prior for the control group and expected effects of treatments rather than a “population” mean, which depends in particular on the relative sample sizes of the groups.

    Solution: Be aware of this and either specify the prior for the population mean, or avoid the centering by specifying the formula in the brm() call as bf(y ~ 1 + x, center = FALSE).

  2. Effect of binary covariate: The Skew-Normal distribution is parameterised differently (by default) in brms::set_prior() and in brms::rskew_normal(). The former uses Stan’s parameterisation in terms of location \xi, scale \omega and shape \alpha (this is well documented in its help page) while the latter uses brms’ default parameterisation in terms of mean \mu, standard deviation \sigma and shape \alpha.

    Solution: Pay particular attention to parameterisation of probability distributions. Don’t rely on defaults and use named arguments. Specifying brms::rskew_normal(nrow(.), xi = -1, omega = 1, alpha = -4) may be longer, but it is safer and more explicit.

3 Likes