Gut check---correct model for bayesian meta-analysis of two proportions?

I want to estimate (something like) the proportion of U.S. adults who use mobile banking. And suppose I try to do it like this:

P(mobile banking) = P(smartphone owner) x P(mobile banking | smartphone owner)

I have two surveys for each component. Each gives a proportion and SE. I convert to log-odds via the delta method to avoid proportions going out of 0 and 1:

  • Log-odds: l_hat = log(p / (1 - p))
  • SE of log-odds: SE(l_hat) = SE(p) / (p * (1 - p))

SEs come from surveys that reported the MoE (due to weighting) and then I divided by 1.96.

Data:

# Surveys of smartphone users that ask if they do mobile banking
mobile_banking = [
    {"name": "Survey A (Mar 2024)", "n": 1200, "p": 0.61,  "se": 0.01408},
    {"name": "Survey B (Sep 2024)", "n": 850,  "p": 0.58,  "se": 0.01692},
]
# Surveys of gen pop that ask if own a smartphone
smartphone_usage = [
    {"name": "Poll C (Jan 2024)",   "n": 600,  "p": 0.53,  "se": 0.01939},
    {"name": "Poll D (Jun 2024)",   "n": 5200, "p": 0.512, "se": 0.01071},
]

Model:

data {
  int<lower=1> N_behavior;
  vector[N_behavior] behavior_logodds;
  vector<lower=0>[N_behavior] behavior_logodds_se;

  int<lower=1> N_smartphone;
  vector[N_smartphone] smartphone_logodds;
  vector<lower=0>[N_smartphone] smartphone_logodds_se;
}

parameters {
  real<lower=0, upper=1> theta_behavior;
  real<lower=0, upper=1> theta_smartphone;
}

transformed parameters {
  real alpha_behavior   = logit(theta_behavior);
  real alpha_smartphone = logit(theta_smartphone);
}

model {
  theta_behavior   ~ beta(2, 2);
  theta_smartphone ~ beta(2, 2);

  behavior_logodds   ~ normal(alpha_behavior,   behavior_logodds_se);
  smartphone_logodds ~ normal(alpha_smartphone, smartphone_logodds_se);
}

generated quantities {
  real<lower=0, upper=1> p_population = theta_smartphone * theta_behavior;
}

theta_smartphone and theta_behavior come from separate surveys measuring a marginal and a conditional probability so in theory multiplying them applies the law of total probability.

Questions:

  1. Is this a valid way to propagate uncertainty from both components into the final estimate?
  2. Is there anything wrong with placing a Beta(2,2) prior on the bounded parameter and transforming to log-odds, rather than placing an unconstrained prior on the log-odds and transforming back?
  3. Is there a better way to do this kind of “conditional” analysis?

Given that you have surveys that you trust to yield well calibrated, independent Gaussian posteriors for the log-odds, it seems to me that you can combine these posteriors simply by multiplying them. The product of two Gaussian PDFs will itself be proportional to another Gaussian, with well known formulas for the mean and variance.

So you can directly and analytically find the posterior that combines the information from the two smartphone surveys, and the posterior that combines the information from the two mobile banking surveys. If you also want to include additional prior information, it might be worth noting that beta(2,2) is very similar in its pdf to the inverse logit of an appropriately chosen Gaussian (ask an LLM to find the Gaussian for you by moment matching to the logit of beta(2,2)). If you’re willing to substitute this logit-Gaussian for your beta(2,2) then on the logit scale you again have a product of Gaussians whose PDF you can find analytically. Looks like the Gaussian would be N(0, 1.29). The match is pretty good:

Once you have analytical posteriors for the two probabiltiies of interest, then you can simply monte-carlo sample from these posteriors and multiply the samples. This can be done just with RNGs, with no need for fitting a Stan model.

To answer your questions more specifically:

  1. This looks valid.
  2. Place your prior on whatever you declare as a parameter, not on whatever you declare as a transformed parameter. The latter would require a Jacobian adjustment.
  3. See above.

@jsocolar thank you so much! This is an amazing reply. And your idea on Question 2 is very smart—I am going to use this. Once again, thanks a lot!

Yes, this is the natural way to do this. In Bayes, you just want to follow the generative story. What’s the probability of being a mobile banking user? Be a mobile user who does banking, which is exactly the probability you wrote down.

No, but I’d be inclined to either make it uniform (beta(1, 1), which gives you a logistic(0, 1) on the log odds scale) or make it more informative. The beta(2, 2) is only going to affect the third decimal place if you have 1000 observations. You can say it’s the prior the Laplace liked :-).

I don’t think so—you’re using the standard measurement error model and combining in the natural way. And @jsocolar had a great suggestion here to just compose the normals. You wouldn’t be able to do that with a logistic prior, but at the same time, this model should be very easy to sample by brute force and you’ll still get the right answer. So you might want to think more about how to write the model in such a way that it’s easy to generalize going forward.