Bayesian logistics regression, solving dependency problems


I have two experiment replications R1 and R2, (with some modifications in R2), deployed to intersecting (but not fully the same) populations of participants at two different time points.

I fit a Bayesian logistic regression to R1 data and want to improve our estimates by fitting another regression with R1 posteriors as R2 priors for shared factors in R1 and R2.

As R1 and R2 populations intersect, what are the potential consequences of that for R2 posteriors? I assume we might have an overly narrow CI? Anything else? Are there ways to estimate that?

What are good ways to alleviate the problem?

E.g. does it make sense to resample R2 dataset to increase variability before fitting the model on R2?

Fit only to R2 participants who didn’t participate in R1 (provided this is a high enough number)?

Or increase priors weight?

Potential specifications for a multilevel version are also appreciated, but the sample size in R1/R2 is limited and I’m afraid the multilevel solution might be underpowered, undermining the idea of aggregating two studies.

Thank you so much!

The main concern here would be confounding – i.e that the differences in the R1 and R2 populations is accompanied by differences in the conditional relationship between variates and covariates. If there is no confounding then sequential fitting, or alternatively and more easily implementable in Stan fitting both data sets at the same time, will give consistent inferences.

If there is potential confounding then there’s no really no way to construct consistent inferences from R1 and R2 other than explicitly (or implicitly as many “casual inference” methods do) modeling the coupling between \pi(y \mid \theta, x) and \pi(x \mid \theta).

1 Like

Thank you so much. Say, there are no confounders and >50% of the participants participated in both R1 and R2.

Both experiments are 2^k factorial, so likely some share of participants get the same levels for at least some of the factors.

What would potential effects be?

Firstly I think one has to be careful to differentiate covariates and factor level occupancies; these are often treated interchangeably but there are some subtle but important differences in the assumptions that go into them. In particular factor models make the assumption that the contributions from each factor add linearly which is only an approximately consistent joint model for all of the factors (because the the various orders of interactions are not included).

In any case if one assumes no confounders then the behavior of covariates and factor level occupancies have zero interactions with the conditional behavior between the variates and the covariates and factor levels. It doesn’t matter how much two populations of covariates and/or factor level occupancies overlap, or how much they don’t overlap, inferences for the condition behavior will be consistent.

This is what theoretically allows one to learn the conditional behavior from one heavily biased population of covariates and/or factor level occupancies and then use that conditional behavior to reconstruct variate behavior for hypothetical or “counter factual” populations. The assumption of no confounding is extremely strong, and often can’t be taken for granted, but if it does hold then it has very strong consequences.

The worst thing that can happen in the no confounder case is that the populations used for inference concentrates on a narrow set of behaviors. In this case one has to rely on the model to extrapolate those inferences to the unobserved behaviors.

For example in a linear model

\pi(y \mid x, \alpha, \beta) = \text{normal}(y \mid \alpha + \beta \, x)

we might have to learn \alpha and \beta from data where x is always negative. Applying it to external circumstances where x is positive relies on the rigidity of the linear model. This can be a problem if the linear model is only meant to be a local approximation – see for example Taylor Regression Models.

Likewise in a simple factor model

\ pi(y \mid \alpha_{1}, \ldots, \alpha_{K}, k) = \text{normal}(y \mid \alpha_{k})

our complete data set might not have any observations for the level k = 2. In this case any predictions that rely on \alpha_{2} will be based entirely on the prior model. This may not give sufficient precision for the given inferential/predictive goals, but it will at least avoid inconsistent predictions.