Nested hierarchical model - does it make sense in this case?

Hi all,

Let’s say I have a dataset with several observations taken during some sort of test from different people (sorry for the vagueness). For the great majority of people I only have observations from one test but for some (like 5%) I have observations from 2 tests (and for very few people even more than 2). It is expected that the observations vary quite a bit between tests for the same person.

For now, I am using a model where I don’t include a person variable - so I am treating the tests as independent from each other. However, this feels wrong even though it is not expected that person itself is a big contributing factor when it comes to the observed values.

I was thinking that it would make most sense to build a nested hierarchical model (I hope this is the right term), so having a variable for person and within that a variable for test (basically something like the example given here.)

However, I am wondering if a model like this will even converge as I mostly have only one test per person. My concern is that there’d be non-identifiability issues because the model wouldn’t know whether the person or the test is contributing more to what’s observed. Would it be ok to have the model not include a person variable (as it is now) or would that be a great violation of the underlying assumptions? Should I use just one test per person? Or will hierarachical modeling sort it out for me?

I guess I just have to try things out but was wondering if anyone has been in a situation like this and maybe has some tips on how to best approach this kind of data.


Bayesian hierarchical modeling is used when you are working with data sets including multiple subjects in which each data set is data poor. The advantage of Bayesian hierarchical modeling is that it accounts for between subject variance as well as within subject variance, which will get you every drop of information you can out of your limited data sets. Another way to think about it is that your group level prior appropriately constrains the parameter space of your priors and likelihood model by taking advantage of the normative test performance of the group as represented by your hyperprior. So whether or not you use Bayesian hierarchical modeling is a question of whether or not its worth the computational expense, which could be great depending on your model, to extract every drop of information you can out of your limited data sets. An example, though not an exhaustive one, is when you have many subjects, but only a few trials per subject (due to experimental or logistical constraints).

This doesn’t directly answer all of your questions, but hopefully it is still helpful in deciding whether or not to employ a Bayesian hierarchical model

1 Like

Thank you for the reply! That definitely helps. I am already including test as a random effects parameter but I think I’ll try to include a person parameter as well, just because it’s more comprehensive. Thanks!

What is the advantage of adding a person parameter to your likelihood function over looping a prior and likelihood over each person (as you would in a hierarchical model)?

I’m planning on looping over each test and within each loop adding the person parameter (modeled as random effects). I hope this makes sense. I’m not really interested in the person estimates but rather in the test estimates and a lot of predictors I am including are given for each test rather than for each person.

You’re thinking about this in exactly the right way. In your system, there’s some variance V_p across people, and some variance V_t across test-taking replicates (within a person). If you treat the every test-taking event as independent, then you are conflating V_p + V_t (the variance in tests taken by different people) with just V_t (the variance in tests taken by one person), which probably isn’t a great thing to do. On the other hand, with very few people taking multiple tests, it might be difficult or impossible to reliably partition the total variance V_{all} = V_p + V_t into its person- and test-specific components, which could lead to computational difficulties in your model. A couple of points:

  • You don’t know for sure until you try–I suggest giving the nested model you propose a try, and seeing if it seems to work computationally
  • What matters for the non-degeneracy of the two variance components is not the fraction of people who take multiple tests, but rather the total number of people who take multiple tests. If the dataset is large, such that the latter is high even as the former is low, you shouldn’t have a problem.
  • If you do have a problem, there is a reparameterization that works well specifically in cases where there are two mechanistically distinct variance components that are difficult to distinguish in the model. In particular, you can parameterize directly in terms of V_{all} and a parameter \rho, bounded inside the unit interval, that gives the proportion of the V_{all} that is attributable to one component or the other. So for example:
  real<lower = 0> V_all;
  real<lower = 0, upper = 1> rho;
transformed parameters{
  real V_p = rho * V_all;
  real V_t = V_all - V_p;

It would also be fine to parameterize in terms of the square root of V_all if you find it easier to think about a prior on a standard deviation rather than a variance.


Great, thanks! I will try things out and come back to this in case I’m running into problems.