Hey Eric,
Welcome to Stan Forums!
(1) Should I include this prior information?
That’s a big question and depends on some combination of the quality of the prior data with respect to the data you intend to collect, the way you view the role of priors, and how you suspect your intended audience will react to priors.
To the extent the prior data are unrepresentative of those you’d like to collect, I’d be weary. The converse is true if you anticipate collecting similar data.
If you view priors as the ability to make a cumulative science, using them is great! If you view priors as encapsulating your subjective beliefs, I suppose that depends on how effectively the prior data swayed your research intuitions. If you view priors as nuisances, just use flat ones! And if you like the notion of priors as regularizing devices, use them that way (e.g., \text{Normal} (0, 1) on beta coefficients).
Your answers to the first two questions will be tempered by how you suspect priors will be received by those you’d like to present your findings to. It’ll also depend on how willing you are to go a couple rounds with Reviewer #2. Some of this might could be avoided with a prior sensitivity analysis.
(2) What are some ways to go about it?
Let’s presume you intend to collect very similar data in the future. You could use the lmer()
point estimates and standard errors directly as the priors for the next round of data. I ran the model on my computer. Here’s the summary output I got:
Random effects:
Groups Name Variance Std.Dev.
school (Intercept) 3.741 1.934
Residual 30.240 5.499
Number of obs: 218, groups: school, 14
Fixed effects:
Estimate Std. Error t value
(Intercept) 3.4682 0.7818 4.436
y_male -1.0545 0.7581 -1.391
y_age_c 0.5251 0.3419 1.536
y_male:y_age_c -0.2635 0.4836 -0.545
Correlation of Fixed Effects:
(Intr) y_male y_ag_c
y_male -0.459
y_age_c 0.023 -0.004
y_mal:y_g_c 0.020 -0.007 -0.697
Let’s go in order, starting with the level-2 variances. Unlike lme4, brms parameterizes those as standard deviations. If you’d like your priors to be precise, you’ll have to work a bit. So first steps first. Based on the summary information alone, we don’t know the exact shape of your 3.74 (1.93) school
intercept. But since we do know variances are bound to zero and above and are continuous, gamma seems like a natural starting point. If you’re willing to use 3.74 as the mode and 1.93 as the standard deviation, here’s how to get the corresponding gamma hyperparameters.
gamma_s_and_r_from_mode_sd <- function(mode, sd) {
if (mode <= 0) stop("mode must be > 0")
if (sd <= 0) stop("sd must be > 0")
rate <- (mode + sqrt(mode^2 + 4 * sd^2)) / (2 * sd^2)
shape <- 1 + mode * rate
return(list(shape = shape, rate = rate))
}
gamma_s_and_r_from_mode_sd(mode = 3.741, sd = 1.934)
That returned:
$shape
[1] 5.561848
$rate
[1] 1.219419
My math is not good enough to know how to convert that into a standard deviation metric. But I can simulate.
set.seed(1)
tibble(sigma_2 = rgamma(1e5, shape = 5.561848, rate = 1.219419)) %>%
mutate(sigma = sqrt(sigma_2)) %>%
summarise(mean = mean(sigma),
sd = sd(sigma))
That returned:
mean sd
<dbl> <dbl>
1 2.09 0.447
If you look at sigma
in a plot, it’s just a little skewed. So if you wanted to use a gamma prior with hyperparameters based on those values, you would just be lazy and use the mean as a proxy for the mode and use reuse the gamma_s_and_r_from_mode_sd()
function.
gamma_s_and_r_from_mode_sd(mode = 2.088724, sd = 0.4468259)
Which returned:
$shape
[1] 23.80971
$rate
[1] 10.9204
If you wanted to be less shameful, you could use this function to compute a gamma based on a mean and an sd.
gamma_s_and_r_from_mean_sd <- function(mean, sd) {
if (mean <= 0) stop("mean must be > 0")
if (sd <= 0) stop("sd must be > 0")
shape <- mean^2 / sd^2
rate <- mean / sd^2
return(list(shape = shape, rate = rate))
}
To give credit, both these gamma_s_r_from...()
functions are based on Kruschke’s work (see here). Anyway, you could encode this information in your brms model with something like prior(gamma(23.80971, 10.9204), class = sd)
. You could take a similar approach with the lme4 Residual
summary. Only that’d be for the brms sigma parameter, which you’d encode in some prior statement like prior(gamma(xxx, xxx), class = sigma)
.
The priors corresponding to the summary output from the Fixed effects:
section are easier. These are often modeled with members of the Student-t family. Since your y_male
covariate is not centered at zero, it might be best to avoid the default brms intercept syntax and use, instead, the as_areaK2 ~ 0 + Intercept + ...
syntax for the formula. That means that when you set the prior for the intercept, you’d use class = b
. If you went bold and used a Gaussian prior for the intercept, it could look like this: prior(normal(3.4682, 0.7818), class = b, coef = Intercept)
. The priors for the other fixed effects would be similar. You’d just use the name of the variable in the coef = xxx
part.
As to the final section, the Correlation of Fixed Effects:
, your options are limited. To my knowledge, brms does not support modeling the individual correlations directly. But you do have the lkj
prior. Setting it to 1
leaves it flat. Setting it to a value like 2
or 4
gently pushes your level-2 correlations away from the boundaries.
This is all assuming you leave the data in an unstandardized metric and assuming you want to directly encode the information from the prior model into the future one. One way to use the information in a more casual way would be to multiply all the standard deviation hyperparameters by a factor of 2 or so (e.g., prior(normal(3.4682, 1.5636), class = b, coef = Intercept)
. You could also take a weakly-regularizing approach by standardizing all the data beforehand and just slap something like normal(0, 1)
on all parameters other than the level-2 correlations.
For more general ideas on priors, check out the Stan prior wiki.