Does bayesian inference require finite population correction?
Consider the case where we have a finite population of N individuals with iid values drawn from \textrm{Normal}(\mu, \sigma). Let the mean and standard deviation of this finite population be M, s.
From this finite population we sample n individuals without replacement, and model the values as iid draws from a normal distribution. The resulting posterior for the parameters of the modeled normal distribution will be the posterior distribution for \mu, \sigma, and will overestimate the uncertainty in M, s. This is true regardless of whether we analyze the model in a frequentist or Bayesian mode of inference with reasonable priors.
To see this, consider what happens when n approaches N. The posterior uncertainty in M, s should approach zero (i.e. in the limit where we sample every individual), but the model that weâve fit recovers the nonzero posterior uncertainty in \mu, \sigma.
It is safe to say that Bayesian inference will deal with this uncertainty in its own way without need of these frequentist corrections. Or we might need to write these correction as generated quantities
, in cases where sampling is done without replacement from a finite population of known size N.
The point here is that if you are interested in M, s but write down a model for \mu, \sigma you need to do some math to derive the posterior for M, s from the combination of \mu, \sigma and the observed data. I think a valid Bayesian procedure would be to impute values for the unsampled individuals at each iteration based on the hyperparameters \mu, \sigma, and then derive M, s, but I havenât thought very carefully about it. Sufficient statistics should be able to speed this computation and may reveal a fundamental connection to the frequentist correction.
I just chatted about it with ChatGPT and under flat priors it seems that the above procedure yields something numerically very close to the frequentist correction.
Hereâs that chat, if itâs of interest ChatGPT - FPC vs Imputation Estimation. Note that I have not thoroughly reviewed GPTâs responses for correctness.
To answer directly, yes, for reasons @jsocolar mentioned.
The bigger context is that we are often interested in âsuperpopulationâ inference. That is, if Iâm doing a medical study, do I care about the people living right now at their current ages or do I care about the population in the future when a drug or procedure is used in the field? For example, consider @andrewgelmanâs pet example, Eight Schools. There is a âpopulationâ of schools somewhere, but the interest is in global estimation, not estimation in the 273 schools in the Boston area or nationwide or whatever.
The other context for this is that as long as you arenât sampling a substantial part of the population, the adjustments donât amount to much. When you get to sampling 50% of the population, then youâll see the difference.
This is tricky. When you sample conditionally on the values youâre estimating, they donât change the estimates in simple cases like @jsocoloar is considering unless there are covariates coming into play. Thatâs because if y^\text{obs} is the observed data then p(\mu, \sigma \mid y^\text{obs}) will be the imputation, which isnât going to lead you away from \mu, \sigma as estimates.
My advice, like always, is to do the simulation and see what happens! I always find this the best way to learn about this kind of stuff.
We discuss the finite-population correction using Bayesian inference in chapter 8 of BDA3 (I think itâs in chapter 7 of the first two editions), first in general on pages 201-202 (âFinite-population and superpopulation inferenceâ) and then in more detail in section 8.3 (âSample surveysâ). Equation (8.6) is the classical formula which we derive using Bayesian inference. On p. 215 we apply these ideas to causal inference.
We also talk about finite-population inference in Data Analysis Using Multilevel/Hierarchical Models in sections 21.2 (âSuperpopulation and finite-population variancesâ) and 21.3 (âContrasts and comparisons of multilevel coeďŹcientsâ).
These analyses can get tricky! The key is to clearly specify what is your estimation goal of inference, which will typically depend on unobserved or hypothetical new cases which can be modeled as latent data.
For me, the term âfinite population correctionâ sounds strange. In my Bayesian Data Analysis course second lecture, I ask students to guess how many left-handed students (would mostly use the left hand to write with a pen) there are in the lecture hall. I then start asking each student whether they are left-handed, and use a Bayesian model to predict the total number of left-handed in the lecture hall. I am doing finite population prediction, but I donât need to do any finite population correction, as I wrote the model and predictive distributions directly taking into account the information I have (easy for binomial model). Since in general we never have truly infinite population, we should call infinite population models as âfinite population approximationsâ! So you may need correction only if you start with approximation.