Fitting a multi-level model to a univariate normal

So I have two normal distributions, y_1 and y_2. I want to figure out their “average” distribution. I figure the best way to do that isn’t to average all the data (or the parameters), but to perform partial pooling in a multi-level model and figure out the distribution of their hyperparameters.

I know that sound a bit wacky, but it’s for a larger project. I need to figure out how to code this up first.

y_1 <- rnorm(30, 10, 2)
y_2 <- rnorm(25, 20, 3)
N_1 = length(y_1)
N_2 = length(y_2)
G = 2
ret <- stanc(file = "mlmnormal.stan") 
ret_sm <- stan_model(stanc_ret = ret) 
fit <- sampling(ret_sm, warmup=100, iter=600, seed=1, data=list(y_1=y_1, 
                                                                G = G))

Below is my model “mlmnormal.stan”

data {
  int N_1;
  int N_2;
  vector[N_1] y_1;
  vector[N_2] y_2;
  int G; //number of groups
parameters {
  real mu[G];
  real<lower = 0> sigma[G];
  real mu_h; 
  real<lower = 0>sigma_h;
model {
  for (i in 1:G) {
    mu[i] ~ normal(mu_h, sigma_h);
    y_1 ~ normal(mu[1], sigma[1]);
    y_2 ~ normal(mu[2], sigma[2]);
  mu_h ~ normal(0, 1);
  sigma_h ~ cauchy(0, 1);

Things are starting to look reasonable. The variance for the 2nd distribution is wider, but the mean for the 2nd distribution is smaller (it should be larger).


I’ve updated the code as I’m working through the problem. Any feedback is appreciated.

Hi @blakeobeans

From your data simulation, it looks like what you want could be analysed using a univariate linear model (non-hierarchical) with a single predictor variable indicating the group (G) the observation comes from for the mean and standard deviation. If you create an indicator variable for G the same length as your data, and code the indicator to be -1 and 1 (or mean-centered), the intercept of this linear model will provide the value of y on average across the two groups. In brms, it would look something like:

fit <- brm( 
       y ~ 1 + g,          # mean model
       sigma ~ 1 + g,      # standard deviation model 
       data=d,             # your data in long format
       family = gaussian() # distribution

You could then use stancode(fit) to see the underlying Stan code, and go from there.

If you need a hierarchical model, you probably want 1) more groups, and 2) repeated observations for at least some of the groups.

1 Like

Hi @cgoold, thank you for your feedback. I’m trying to build a simple model as a pathway to a more complex one, as Stan is difficult enough for me already. The model I’m working on uses a generalized pareto distribution with 22 groups, and I don’t think brms supports that.

In a partial pooling case, you assume the groups come from the same distribution, and estimate the standard deviation of that distribution. As @cgoold mentioned, in this case, a fixed effect seems more appropriate for what you want to do, which assumes they don’t share the distribution.

Hi Dilsher, thanks for your comment. I do want to assume that they groups come from the same distribution. The model above does estimate the standard deviation of that distribution: sigma_h. I don’t think I want a fixed effects model: although the code above only includes 2 groups, I plan on scaling it up to 22 groups.


Maybe adding a bit more detail about your specific modelling problem would be helpful here, as well as looking at the Stan Reference Manual’s examples of hierarchical regression. I’d also recommend Gelman & Hill’s 2007 multi-level modelling book (

Your data simulation doesn’t produce data that are from a common distribution (i.e. your y variables are independent of one another). Also, you mention that you will scale the model up to 22 groups, but this isn’t a particularly large number of groups for multi-level modelling.

The ‘prototypical’ case of a multi-level regression is that we start with a simple linear model, e.g.

y_i \sim Normal(\mu_i, \sigma)
\mu_i = \alpha + \beta x_i

and then add group-specific parameters to that linear model, e.g. \mu in the above equation could include group-varying intercepts as:

\mu_{ij} = \alpha + \nu_j + \beta x_{ij}
\nu_j \sim normal(0, \sigma_j)

where j indicates the group index. Now, the \boldsymbol{\nu} parameters (i.e. \nu_1, \nu_2, ..., \nu_J) will experience partial-pooling because they are also described by the common normal distribution. There are different but equivalent ways of writing out these models (see Gelman & Hill’s 2007 multi-level regression book) but all assume that the group-varying parameters form a higher-level regression, and thus could include their own predictor variables. Of course, the above model could be extended to include group-varying parameters for \beta or \sigma as well.

It might be worth checking out the resources above to help formulate your model.

Good luck!

1 Like

Hmmm - in that case, you still set up the data as groups (22 groups), and assign a prior like ~ N(0, sigma) and assign a prior to sigma as well. In brms that would be

fit <- brm(
y ~ 1 + (1|g), 
data = d)

Hi Chris, thank you so much for your time, comments and recommendations. I think the fissure in our dialogue comes from misunderstanding that from my perspective, I am not trying to perform a regression on y. If I were, I could use RStanARM or BRMS as recommended. If I were to regress the groups on y, I agree that I should perform an ANOVA or a regression as you and Dilsher recommend, but that is not my intent. I do not want to measure the betas for each group. I do not want a linear model.

Rather, I want to infer the parameters for each group according to their modeled normal distributions, while also estimating the overall hyperparameters. I am aware that I did not simulate the data as being from the same distribution. I don’t know how to do that, but I consider it an unnecessary step because this is a simple simulation for the purposes of getting the Stan coding right.

My question is- is the code above the correct way to model multiple univariate normals (one for each group) while also estimating the hyperparameters?

Again, many many thanks for your time and comments.

Hi @blakeobeans

I’m still not 100% sure what your final analysis will look like (there’s a lot of room for misinterpretation in terms such as multi-level, hyperparameters etc.), but see if the attached code is useful.

normal_hyperparameters.R (974 Bytes)
normal_hyperparameters.stan (661 Bytes)

In the R code, I first generate some data like so:

# sample size
N <- 1000

# number of y variables
N_y <- 2

# common distribution hyperparameters
mu <- 25       # overal mean
sigma_mu <- 5  # standard deviation of the mean
sigma <- 5     # overall standard deviation

# y-specific parameters
mu_y <- rnorm(N_y, mu, sigma_mu)
sigma_y <- runif(N_y, 0, sigma)

# generate y variables
ys <- sapply(1:N_y, function(x) rnorm(N, mu_y[x], sigma_y[x]))

and I run the model on the Stan code in the file normal_hyperparameters.stan. You can make this code more generalisable by, for instance, standardising the y variables first, and putting a generic prior on their mu and sigma parameters.

The output of one representative run is:

               mean se_mean   sd     2.5%      25%      50%      75%    97.5% n_eff Rhat
mu            23.36    0.06 3.56    15.85    21.44    23.47    25.39    30.65  3494    1
sigma_mu       5.26    0.04 2.30     2.14     3.53     4.78     6.51    10.80  2806    1
sigma          3.12    0.03 1.84     1.08     1.83     2.61     3.84     8.27  4414    1
mu_y[1]       27.22    0.00 0.08    27.07    27.17    27.22    27.28    27.38  6816    1
mu_y[2]       20.48    0.00 0.02    20.43    20.46    20.48    20.49    20.52  7209    1
sigma_y[1]     2.46    0.00 0.06     2.35     2.42     2.46     2.50     2.57  6069    1
sigma_y[2]     0.77    0.00 0.02     0.74     0.76     0.77     0.78     0.80  6762    1
lp__       -1644.69    0.04 1.84 -1649.02 -1645.65 -1644.40 -1643.34 -1642.08  1690    1

And just to show that the difference between what you are doing and a regression is minimal, the R code also includes a brms model:

# make the bf formula
#   : heteronegeous variances by group
bf_formula <- brms::bf(
  y ~ 1 + g, 
  sigma ~ 1 + g

fit2 <- brms::brm(
  data = d2,
  cores = 4, 
  family = gaussian()

The priors in the two models are a bit different, but the results are comparable:

Family: gaussian 
  Links: mu = identity; sigma = log 
Formula: y ~ 1 + g 
         sigma ~ 1 + g
   Data: d2 (Number of observations: 2000) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Population-Level Effects: 
                Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept          23.85      0.04    23.77    23.93       2167 1.00
sigma_Intercept     0.32      0.02     0.29     0.35       3511 1.00
g                  -3.37      0.04    -3.46    -3.30       2211 1.00
sigma_g            -0.58      0.02    -0.61    -0.55       4105 1.00

The two group’s specific y means are estimated at 23.85 + (-3.37*1) = 20.48 and 23.85 + (-3.37*-1) = 27.2, which are exactly the same to the first model.

In the brms model, the sigma regression uses a log link function, so the sigma_y values are exp(0.32 + (-0.52*1)) and exp(0.32 + (-0.52*-1)), respectively, which are also very similar to the first model above.

Don’t get caught up in the terms regression, ANOVA etc. Many models are very similar under-the-hood in terms of their mathematical representation, which can be fit using more standard methods.

I hope this was helpful!

1 Like