# model

Consider a standard linear mixed/multilevel model

y_{ij} = \mu + \nu_i + \epsilon_{ij}
\nu_i \sim N(0, \sigma_{\nu}^2)
\epsilon_{ij} \sim N(0, \sigma_{\epsilon}^2)

with appropriate priors on \mu and the \sigma's.

i = 1 … k
j = 1 … l

# question

If I had prior information about one of the random effects, say that
\nu_4 was certainly less than 4, how could I change the model to incorporate this information?

If \nu_i were a fixed effect, this would be more straightforward for
me. However, this is a case where I want to allow the \nu's to shrink
towards the group mean (be a random effect), but I also have some
prior information and want to use it. Is this possible?

What sort of modeling approach could handle this situation?

I’m currently implementing this via brms. Ideally, I’d be able to adjust for this in brms, but I’d also be happy to directly modify the stan code. Details and code below:

# simulate data

(r code)

  set.seed(1)
mu <- 0
sigma_nu <- 3
sigma_epsilon <- .3
k <- 6
l <- 5

nu  <- rnorm(k, mu, sigma_nu)
y <- as.vector(sapply(nu, function(nu_j) rnorm(l, nu_j, sigma_epsilon)))
d <- expand.grid(j = 1:l, i = factor(1:k))
d\$y <- y



# fit “fixed effects” or separate intercepts with brms

library(brms)

form <- formula(y ~ i - 1)
bm1 <- brm(form, d)
bm1

 Family: gaussian
Links: mu = identity; sigma = identity
Formula: y ~ i - 1
Data: d (Number of observations: 30)
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 4000

Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
i1    -1.70      0.13    -1.96    -1.45       5158 1.00
i2     0.47      0.13     0.21     0.73       5759 1.00
i3    -2.31      0.13    -2.57    -2.05       5592 1.00
i4     4.75      0.12     4.51     5.00       5524 1.00
i5     0.97      0.13     0.70     1.22       5350 1.00
i6    -2.56      0.13    -2.81    -2.30       5660 1.00

Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma     0.29      0.04     0.22     0.39       3646 1.00

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
is a crude measure of effective sample size, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).


#### Add prior to an individual group mean.

Say I know the mean of group 4, i4, is almost certainly less than 4.

  bm2 <- update(bm1, prior = prior(skew_normal(3.9, 1, -10), class = "b", coef = "i4"))
bm2

 Family: gaussian
Links: mu = identity; sigma = identity
Formula: y ~ i - 1
Data: d (Number of observations: 30)
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 4000

Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
i1    -1.70      0.20    -2.10    -1.30       5266 1.00
i2     0.47      0.20     0.08     0.86       4573 1.00
i3    -2.31      0.21    -2.72    -1.88       5206 1.00
i4     4.04      0.11     3.81     4.25       4592 1.00
i5     0.97      0.20     0.58     1.37       5445 1.00
i6    -2.55      0.20    -2.96    -2.15       4743 1.00

Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma     0.44      0.08     0.31     0.63       3138 1.00

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
is a crude measure of effective sample size, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).


This works as I would expect. “i4” mean estimate decreased from 4.75 to 4.04.

# thoughts/questions on how to include prior on an individual (or several) random effects

The trouble is that I want to include my prior information about the mean of i4 (and other groups as well), but I also want to share information across groups and shrink them towards the grand mean.

It doesn’t make sense to model \nu_4 as both:

\nu_4 \sim N(\mu, \sigma_\nu)

and

\nu_4 \sim skew Normal(\theta)

where \theta are some hyper priors.

Maybe I need to add a parameter to my model. Something like this:

y_{ij} = \mu + \nu_i + \epsilon_{ij}
\nu_i = \delta_i + \gamma_i
\delta_i \sim N(0, \sigma_{\delta}^2)
\gamma_i \sim p(\theta_i)
\epsilon_{ij} \sim N(0, \sigma_{\epsilon}^2)

Here gamma is an additional parameter that could have prior information \theta_i. But how would I ensure \nu_i is constrained below a certain value? There seem to be issues with identifiability. Any clues or pointers you could give me would be greatly appreciated. I put this in the brms category, but maybe it is more general. Many thanks!

In brms you cannot set priors on specific random effects. Not sure how the canonical solution would look like if coded directly in Stan.

OK, I figured as much. Thanks for the information, Paul.