I think that is is a really deep and really important question, and one that is answered poorly in many popular references! My best attempt to clarify what’s going on is in Section 1 of Probabilistic Modeling and Statistical Inference, although keep in mind that this piece is long overdue for some minor edits.

The one unifying assumption that every statistical analysis makes is the existence of some true data generating process which, abstractly, is just a probability distribution over the *observational space* that quantifies the fundamental lack of predicability of a given measurement. If we were able to make perfect measurements with no noise then this distribution would be singular, concentrating entirely on the deterministic state of the world. Note that there is no parameter space here, just a single probability distribution over the space of possible measurement outcomes.

If we knew the true data generating process then we could quantify all of the possible outcomes of any process that consumes data – I refer to this as “calibration”. In particular if we have a black box algorithm that takes a measurement as input and returns a singe real number then we can

- Simulate possible measurements from the true data generating process \pi^{\dagger}(y),

\tilde{y}_{s} \sim \pi^{\dagger}(y).

- Evaluate the black box algorithm f on those inputs

\tilde{o}_{s} = f(\tilde{y}_{s}).

- Communicate the distribution of outputs, for example with a histogram summary or moments.

In any real analysis, however, we don’t know what the true data generating process is so we can’t do this kind of calibration just yet. First we have to find the true data generating process.

Unfortunately the space of all possible data generating processes is mathematically nasty. Just really disgusting stuff like probability distributions that can’t be represented or can’t be evaluated in finite time or are just too complex to work with in practice. Consequently we have to some how restrict our search to a subset of possible data generating processes.

Following Dennis Lindley I will refer to this subset as a “small world”. Small worlds are often chosen at least partially for their mathematical convenience. Most small worlds that you’ll encounter can be *coordinated*, that is we can identify each data generating process with a sequence of numbers that allow us to quickly look them up like addresses in a city grid. These numerical labels are also known as *parameters*. Until we choose a small world there is no notion of parameters!

If the small world contains the true data generating process ( see the first figure of Probabilistic Modeling and Statistical Inference) then this restriction doesn’t actually cost us anything. So long as we can exhaustively search the small world then we can find that true data generating process. It also means that there is some parameter configuration \theta^{\dagger} that identifies the true data generating process.

Given how complex the world, and hence any true data generation process, is we are unlikely to be so lucky as to have a mathematically convenient small world that contains the true data generating process. In that case we the situation shown in the second figure of Probabilistic Modeling and Statistical Inference. Here there is *no* parameter configuration that identifies the true data generating process. At best the elements of the small world *approximate* the relevant features of the true data generating process. This is the meaning of Box’s famous quote that “all models are wrong but some are useful”. For more discussion see Section 1.4.1 of Towards A Principled Bayesian Workflow.

So in the ideal circumstance where our model contains the true data generating process then there is a “true” parameter configuration and we can ask whether our inferences – a point estimator or a set estimator or a posterior distribution – are able to recover that true value under various circumstances.

In the more realistic case where our model does not contain the true data generating process then there is not a “true” parameter configuration. The best we can ask is whether our not our inferences identify data generating processes in our model that well-approximate certain features of the true data generating process.

Either way we can calibrate procedures that take in data as inputs using the small world. Instead of looking at what happens from data simulated from a single data generating process, however, we have to look at simulations from *all* of the data generating processes in the small world and them summarize the corresponding distribution of outputs.

Let’s take the gravity example that @Bob_Carpenter mentioned. If the world were perfectly described by Newtonian gravity from the earth then one of the parameters that determined the outcome of measurements sensitive to gravity would be the gravitational acceleration and there would be some true value of that parameter corresponding to real life.

But that isn’t real life. Objects are affected not just by the gravity of the Earth but also the gravity of the moon, not to mention all of the other planets. Sometimes these influences are so weak that they can be safely ignored but sometimes they can’t (tides!). Even worse we know that Newtonian physics is only an approximation to the more general relativity theory so even if we considered all of the planets a Newtonian model could not contain the true data generation process! That said for measurements on Earth of relatively low mass objects that aren’t going too fast a small world based on a Newtonian model is probably good enough to be “useful”.

The “power” analysis shown in Kruschke is also done in the context of a small world. After a small world is chosen one isolates a single data generating process as the “null hypothesis” and sequesters the rest as “alternative hypotheses”. The black box tries to decide if the null hypothesis is inconsistent with a given observation. By simulating data from the assumed null hypotheses over and over again we can see how often the black box decides on the null hypothesis and how often it makes the wrong decision (false positive rate). We can then repeat and ask the same for all of the alternate hypotheses (true positive rates).

Critical this analysis assumes that the true data generating process is either the null hypotheses or one of those alternative hypotheses. If the small world doesn’t contain the true data generation process then the false and true positive rates won’t quantify what happens when the black box is evaluated on real observations. When the small world contains only bad approximations to the true data generating process these rates can be arbitrarily wrong, but if we our small world contains decent approximations then the rates might do a decent job of quantifying what we would actually see.

Sorry that was so long but you hit on some deep points. The references I linked to above try to encapsulate the insights of George Box, I.J. Good, Dennis Lindley, L.J. Savage and more but with as many figures as I could fit in.