Finite population correction

Does bayesian inference require finite population correction?

Consider the case where we have a finite population of N individuals with iid values drawn from \textrm{Normal}(\mu, \sigma). Let the mean and standard deviation of this finite population be M, s.

From this finite population we sample n individuals without replacement, and model the values as iid draws from a normal distribution. The resulting posterior for the parameters of the modeled normal distribution will be the posterior distribution for \mu, \sigma, and will overestimate the uncertainty in M, s. This is true regardless of whether we analyze the model in a frequentist or Bayesian mode of inference with reasonable priors.

To see this, consider what happens when n approaches N. The posterior uncertainty in M, s should approach zero (i.e. in the limit where we sample every individual), but the model that we’ve fit recovers the nonzero posterior uncertainty in \mu, \sigma.

3 Likes

It is safe to say that Bayesian inference will deal with this uncertainty in its own way without need of these frequentist corrections. Or we might need to write these correction as generated quantities, in cases where sampling is done without replacement from a finite population of known size N.

The point here is that if you are interested in M, s but write down a model for \mu, \sigma you need to do some math to derive the posterior for M, s from the combination of \mu, \sigma and the observed data. I think a valid Bayesian procedure would be to impute values for the unsampled individuals at each iteration based on the hyperparameters \mu, \sigma, and then derive M, s, but I haven’t thought very carefully about it. Sufficient statistics should be able to speed this computation and may reveal a fundamental connection to the frequentist correction.

I just chatted about it with ChatGPT and under flat priors it seems that the above procedure yields something numerically very close to the frequentist correction.

Here’s that chat, if it’s of interest ChatGPT - FPC vs Imputation Estimation. Note that I have not thoroughly reviewed GPT’s responses for correctness.

1 Like

To answer directly, yes, for reasons @jsocolar mentioned.

The bigger context is that we are often interested in “superpopulation” inference. That is, if I’m doing a medical study, do I care about the people living right now at their current ages or do I care about the population in the future when a drug or procedure is used in the field? For example, consider @andrewgelman’s pet example, Eight Schools. There is a “population” of schools somewhere, but the interest is in global estimation, not estimation in the 273 schools in the Boston area or nationwide or whatever.

The other context for this is that as long as you aren’t sampling a substantial part of the population, the adjustments don’t amount to much. When you get to sampling 50% of the population, then you’ll see the difference.

This is tricky. When you sample conditionally on the values you’re estimating, they don’t change the estimates in simple cases like @jsocoloar is considering unless there are covariates coming into play. That’s because if y^\text{obs} is the observed data then p(\mu, \sigma \mid y^\text{obs}) will be the imputation, which isn’t going to lead you away from \mu, \sigma as estimates.

My advice, like always, is to do the simulation and see what happens! I always find this the best way to learn about this kind of stuff.

1 Like

We discuss the finite-population correction using Bayesian inference in chapter 8 of BDA3 (I think it’s in chapter 7 of the first two editions), first in general on pages 201-202 (“Finite-population and superpopulation inference”) and then in more detail in section 8.3 (“Sample surveys”). Equation (8.6) is the classical formula which we derive using Bayesian inference. On p. 215 we apply these ideas to causal inference.

We also talk about finite-population inference in Data Analysis Using Multilevel/Hierarchical Models in sections 21.2 (“Superpopulation and finite-population variances”) and 21.3 (“Contrasts and comparisons of multilevel coefficients”).

These analyses can get tricky! The key is to clearly specify what is your estimation goal of inference, which will typically depend on unobserved or hypothetical new cases which can be modeled as latent data.

2 Likes

For me, the term “finite population correction” sounds strange. In my Bayesian Data Analysis course second lecture, I ask students to guess how many left-handed students (would mostly use the left hand to write with a pen) there are in the lecture hall. I then start asking each student whether they are left-handed, and use a Bayesian model to predict the total number of left-handed in the lecture hall. I am doing finite population prediction, but I don’t need to do any finite population correction, as I wrote the model and predictive distributions directly taking into account the information I have (easy for binomial model). Since in general we never have truly infinite population, we should call infinite population models as “finite population approximations”! So you may need correction only if you start with approximation.

3 Likes