I want to estimate (something like) the proportion of U.S. adults who use mobile banking. And suppose I try to do it like this:
P(mobile banking) = P(smartphone owner) x P(mobile banking | smartphone owner)
I have two surveys for each component. Each gives a proportion and SE. I convert to log-odds via the delta method to avoid proportions going out of 0 and 1:
- Log-odds:
l_hat = log(p / (1 - p)) - SE of log-odds:
SE(l_hat) = SE(p) / (p * (1 - p))
SEs come from surveys that reported the MoE (due to weighting) and then I divided by 1.96.
Data:
# Surveys of smartphone users that ask if they do mobile banking
mobile_banking = [
{"name": "Survey A (Mar 2024)", "n": 1200, "p": 0.61, "se": 0.01408},
{"name": "Survey B (Sep 2024)", "n": 850, "p": 0.58, "se": 0.01692},
]
# Surveys of gen pop that ask if own a smartphone
smartphone_usage = [
{"name": "Poll C (Jan 2024)", "n": 600, "p": 0.53, "se": 0.01939},
{"name": "Poll D (Jun 2024)", "n": 5200, "p": 0.512, "se": 0.01071},
]
Model:
data {
int<lower=1> N_behavior;
vector[N_behavior] behavior_logodds;
vector<lower=0>[N_behavior] behavior_logodds_se;
int<lower=1> N_smartphone;
vector[N_smartphone] smartphone_logodds;
vector<lower=0>[N_smartphone] smartphone_logodds_se;
}
parameters {
real<lower=0, upper=1> theta_behavior;
real<lower=0, upper=1> theta_smartphone;
}
transformed parameters {
real alpha_behavior = logit(theta_behavior);
real alpha_smartphone = logit(theta_smartphone);
}
model {
theta_behavior ~ beta(2, 2);
theta_smartphone ~ beta(2, 2);
behavior_logodds ~ normal(alpha_behavior, behavior_logodds_se);
smartphone_logodds ~ normal(alpha_smartphone, smartphone_logodds_se);
}
generated quantities {
real<lower=0, upper=1> p_population = theta_smartphone * theta_behavior;
}
theta_smartphone and theta_behavior come from separate surveys measuring a marginal and a conditional probability so in theory multiplying them applies the law of total probability.
Questions:
- Is this a valid way to propagate uncertainty from both components into the final estimate?
- Is there anything wrong with placing a
Beta(2,2)prior on the bounded parameter and transforming to log-odds, rather than placing an unconstrained prior on the log-odds and transforming back? - Is there a better way to do this kind of “conditional” analysis?
