Firstly I think one has to be careful to differentiate covariates and factor level occupancies; these are often treated interchangeably but there are some subtle but important differences in the assumptions that go into them. In particular factor models make the assumption that the contributions from each factor add linearly which is only an approximately consistent joint model for all of the factors (because the the various orders of interactions are not included).

In any case if one assumes no confounders then the behavior of covariates and factor level occupancies have *zero* interactions with the conditional behavior between the variates and the covariates and factor levels. It doesnâ€™t matter how much two populations of covariates and/or factor level occupancies overlap, or how much they donâ€™t overlap, inferences for the condition behavior will be consistent.

This is what theoretically allows one to learn the conditional behavior from one heavily biased population of covariates and/or factor level occupancies and then use that conditional behavior to reconstruct variate behavior for hypothetical or â€ścounter factualâ€ť populations. The assumption of no confounding is extremely strong, and often canâ€™t be taken for granted, but if it does hold then it has very strong consequences.

The worst thing that can happen in the no confounder case is that the populations used for inference concentrates on a narrow set of behaviors. In this case one has to rely on the model to extrapolate those inferences to the unobserved behaviors.

For example in a linear model

\pi(y \mid x, \alpha, \beta) = \text{normal}(y \mid \alpha + \beta \, x)

we might have to learn \alpha and \beta from data where x is always negative. Applying it to external circumstances where x is positive relies on the rigidity of the linear model. This can be a problem if the linear model is only meant to be a local approximation â€“ see for example Taylor Regression Models.

Likewise in a simple factor model

\ pi(y \mid \alpha_{1}, \ldots, \alpha_{K}, k) = \text{normal}(y \mid \alpha_{k})

our complete data set might not have any observations for the level k = 2. In this case any predictions that rely on \alpha_{2} will be based entirely on the prior model. This may not give sufficient precision for the given inferential/predictive goals, but it will at least avoid inconsistent predictions.