Hierarchial models: internal inconsistency in estimates of standard deviation

saratirvenracs · August 11, 2022, 4:54am

Hello to all,

I am working on simulated data to arrive at the most efficient formulation of a hierarchial model. This is a general observation I made and I’d be glad to find an explanation:

Assume I have an observation-level parameter theta[i] that varies for each observation i (1 to 100). I assume theta[i] is drawn from population-level parameters mu_theta, sigma_theta. Since 100 is a large enough number and I am working on an extremely basic model, the estimate of sigma_theta should roughly be equal to the standard deviation of the vector (theta[1], theta[2], … , theta[100]). However, the latter is consistently much smaller than the former, i.e. the estimated observation-level parameters are more closely spaced than indicated by the estimates of the population-level parameters (hence I call it an “internal inconsistency”)

Is there an explanation for this phenomenon? I also observe that the estimate of sigma_theta is consistently correct- I can confirm this since my data is simulated. However, the observation estimates theta[i] are closer together, compared to sigma_theta and also the true values. I can understand why the estimates of theta[i] are closer together (due to the disciplining by priors, correlations in hierarchial models, etc.), compared to the true parameters. But, ought they not be consistent with sigma_theta which is being estimated at the same time?

Lastly, I observe this both for centred and non-centred parametrizations and different choices of priors.

jsocolar · August 12, 2022, 2:36pm

Since you are working on simulated data, can you show your code for data simulation and model fitting in a reproducible example?

saratirvenracs · August 12, 2022, 11:50pm

Thanks for your reply. I have just discovered a theoretical reason behind this:

As compared to the population distribution of a parameter, the posterior distribution of the parameter at observation level is different, depending on the data collected for each observation. In the extreme cases of large observation-level data (no observation-level data), we should observe that the observation-level posterior converges to a distribution with “true” value as mean (coincides with the population-level distribution).

In case of the former, the standard deviation (SD) of observation-level parameters coincides with that of the population distribution. In case of the latter, SD should be ~zero (since all observations share the same population distribution). Hence, realistic estimates of SD should lie between 0 and SD of the population (this indicates goodness of fit in some way). I’m now at peace with the phenomenon :)

Bob_Carpenter · August 15, 2022, 9:31pm

I’m a little confused by the language you’re using to talk about these parameters but I’m assuming what you are calling “population level” is a prior. In general, posteriors are usually more concentrated than priors in well-formulated models. If posteriors matched priors, we wouldn’t need to perform posterior inference!

saratirvenracs · September 14, 2022, 2:32am

Thank you for the comment. I shall try to explain myself better.

In fitting a hierarchial model, let theta be a univariate parameter of interest. To model heterogeneity across individuals (say 100 individuals), assume that theta_i (parameter theta for individual i) is drawn from a normal with parameters mu_theta and sigma_theta, i.e.:

theta_i ~ Normal(mu_theta, sigma_theta)

Assume we have uninformative priors for mu_theta and sigma_theta.

Ultimately, in a hierarchial model, we derive posteriors for mu_theta, sigma_theta, and theta_i (i varying from 1 to 100). I am now focusing on the mean of the posteriors- call them mu_theta_hat, sigma_theta_hat, theta_i_hat. My observation was the following:

SD(theta_i_hat) < sigma_i_hat

The standard deviation across individual estimates is significantly lower than the estimated standard deviation of the distribution from which these parameters are drawn.

Hope that makes it clearer.

Bob_Carpenter · September 20, 2022, 7:52pm

Can you be more specific about what you mean by “uninformative”? If you take very broad but proper priors like BUGS’s inv_gamma(epsilon, epsilon), you will see just the effect you mention, because most of the prior mass is well above small data values. @andrewgelman talks about this in the following paper and also in the Bayesian Data Analysis book:

Topic		Replies	Views
Posterior distributions in hierarchical model problems Modeling fitting-issues , specification , cmdstanpy , hierarchical-model	6	445	February 5, 2024
Large estimated standard deviation in multilevel model despite weakly informative prior Modeling	7	661	March 24, 2019
Hierarchical model of antibody levels Modeling biology	4	548	July 12, 2021
Hierarchical model with some very small parameter values Modeling fitting-issues	7	692	June 7, 2021
Very simple hierarchical model divergencies - am I missing something obvious? Modeling fitting-issues	1	433	August 11, 2020

Hierarchial models: internal inconsistency in estimates of standard deviation

Related topics