Meta-Analysis Model for Aggregating Survey Estimates with Uncertainty?

Hello,

I’ve been working on a simple meta-analysis model for aggregating subjective probability estimates from a survey. While this seems like a simple model consistent with the Stan documentation and the treatment in BDA3 chapter 5.6, I haven’t seen any examples of this model structure applied to this domain, or to data imaged to be a beta-distributed random variable. A sanity check would be greatly appreciated!

Data

Each respondent gave three estimates for the probability of a certain event:

  • 5th percentile,
  • Best guess (median),
  • 95th percentile.

The goal is to aggregate these responses in a way that captures respondents’ uncertainty in a principled way.

Model Overview:

I think it’s just a basic meta-analysis model: I treat each respondent’s “best guess” as a beta_proportion() distributed variable. I then use Python to pre-compute a beta_proportion() kappa parameter for each person based on their 5th and 95th percentiles to reflect their subjective uncertainty, and pass those kappa parameters to the model as data. The parameters of inferential interest are the mu and kappa of the ‘aggregation distribution’.

Current Stan Model

data {
  int<lower=1> N;
  array[N] real<lower=0, upper=1> median_respondent;
  array[N] real<lower=0> kappa_respondent;
}
parameters {
  array[N] real<lower=0, upper=1> mu_respondent;
  real<lower=0, upper=1> mu_aggregation;
  real<lower=0> kappa_aggregation;
}
model {
  kappa_aggregation ~ gamma(2, 0.1);
  median_respondent ~ beta_proportion(mu_respondent, kappa_respondent);
  mu_respondent ~ beta_proportion(mu_aggregation, kappa_aggregation);
}

Questions

I’m confused about whether this is conceptually appropriate for the task at hand: to get a picture of respondents’ best-guesses in a way that accurately takes into account their subjective uncertainty. Specifically, I’m wondering:

  1. While the kappa parameters I pass to the model do come from a distribution found by optimization to capture each respondent’s 5th and 95th quantiles, I’m confused about whether this remains true even when separated from the mu parameter of that optimized distribution. Is it a face-validity issue if the distribution with mu = best_guess and kappa = kappa_respondent does not have the same median and 90% interval as what the respondent gave? Does this problem exist in meta-analysis with the classic Normal model as well?
  2. Might it be better to directly model the 5th and 95th percentiles directly, rather than using them to compute the scale parameter? I’ve never seen a model of this kind, but would be curious if there’s anything out there.

Thank you in advance for any advice!

Hi, @alex.b.r, and sorry this didn’t get answered sooner.

In case you haven’t found it, here’s the reference: Stan User’s Guide: meta-analysis section

Here’s the eight schools model, @andrewgelman’s go-to example. It’s a normal distribution meta-analysis, but we can just copy it line for line to see where your example went slightly wrong. For eight schools, each of eight schools reports a test-coaching effect on test scores as a mean and standard deviation per school. The model looks like this (centered posteriordb code, which is not a good way to implement this, but is a good way to understand the model):

data {
  int<lower=0> J;            // number of schools
  vector[J] y;               // estimated treatment
  vector<lower=0>[J] sigma;  // std dev of estimated effect
}
parameters {
  vector[J] theta;           // treatment effect in school j
  real mu;                   // hyper-parameter of mean
  real<lower=0> tau;         // hyper-parameter of sdv
}
model {
  mu ~ normal(0, 5);
  tau ~ cauchy(0, 5);
  theta ~ normal(mu, tau);
  y ~ normal(theta, sigma);
}

I’m not sure what you did, but it’s not generally possible to find a beta distribution that matches arbitrary 0.05, 0.50, 0.95 quantiles, because the beta distribution only has two degrees of freedom. So I’m not sure how you get \kappa to proceed. But rather than worry about that problem now, let’s assume that instead of median and two quantile boundaries, you are given \mu_n \in (0, 1) and \kappa_n \in (0, \infty) for each subject n. Then we can just translate the eight schools code line for line switching to mean-parameterized beta distributions.

data {
  int<lower=0> N;                       // number of respondents
  vector<lower=0, upper=1>[N] theta_r;  // respondent medians
  vector<lower=0>[N] kappa_r;           // respondent concentrations
}
parameters {
  vector<lower=0, upper=1>[N] theta;    // effect for respondent n
  real<lower=0, upper=1> mu;            // hyperparameter for mean
  real<lower=2> kappa;                  // hyper parameter for concentration
}
model {
  mu ~ beta(2, 2); 
  kappa ~ pareto(2, 1.5);
  theta ~ beta(mu, kappa);
  theta_r ~ beta(theta, kappa_r);

It’s subtly different than yours in that it reverses the kappa variables. In the code I wrote, you draw the true theta values form their prior, then draw each of the respondents’ values around the true value given their reported noise kappa_r. This is because the respondent’s concentration value determines the noise with which their median response is observed.

I also added a prior for mu and chose a power-law prior for kappa. The latter is the suggestion of @andrewgelman’s in chapter 5 of BDA3, but he doesn’t call it a Pareto distribution, he just derives it using an implicit change-of-variables formula.

Alternatively, you can just transform the (0, 1) values to the log odds scale with the logit transform and work on the unconstrained scale, then transform back. There, you could work with a more flexible distribution to match all three moments, but I’m not entirely sure how to do that. You can always patch together two half normals around the median, but I don’t think a skew normal is flexible enough.

Hi @Bob_Carpenter, thanks so much for your reply.

To clarify, for each respondent I found a beta distribution to match only their 0.05 and 0.95 quantiles, then used the \kappa from that distribution as kappa_r for that respondent. Does this make it directly analogous to the eight schools example in the way you illustrated, despite the fact that we’re not using normal distributions?

More generally, my main confusion with this model is a seeming lack of face validity: we’re passing the model a \kappa that together with a particular \mu accurately encodes a person’s subjective uncertainty (.05 and .95 quantiles). But then we throw away that \mu, only passing \kappa to the model. So the prior predictive distribution for that person doesn’t match their stated quantiles, and even after conditioning on the data (the person’s 0.50 quantile) the posterior predictive distribution for that person doesn’t have the right quantiles either.

I get that a person’s \kappa will generally capture their degree of uncertainty so that less confident responses get less weight when estimating \mu and \tau. But is there a way to quantify the degree to which this weighting is accurately capturing each person’s actual given uncertainty?

It was actually @betanalpha who first suggested I try this approach in this domain, so maybe he has thoughts as well.

The problem isn’t normal distributions, it’s that your respondents seem to have provided 0.05, 0.5, and 0.95 quantiles. That’s three degrees of freedom in their response. There are only two parameters in a beta distribution. So it’s almost certain that the 0.5 quantile they gave you is not going to be consistent with the 0.05 and 0.95 quantile they gave you.

To mash this into being analogous with normal distributions and use the code I provided, you have to throw away one of the degrees of freedom and only use two of the quantiles they gave you. To check robustness, you can fit three ways: using 0.05 and 0.5 quantile, using 0.05 and 0.95 quantile, and with 0.5 and 0.95 quantile. Each pair will determine a unique beta distribution.

I think what you were suggesting is to use (0.05, and 0.95) to fit kappa (with some unknown median), then use that along with the provided median to model the user. That’s another alternative that could work. I start worrying about methods that are going to require a lot of footnotes to explain.

The alternative, if you want to use all three degrees of freedom, is to find a richer set of distributions over (0, 1) that can model those three degrees of freedom. A logistic pasted-together-half-normal distribution would be flexible enough to fit three quantiles, but it won’t be differentiable at the median, which might cause a problem for Stan if there’s too much asymmetry. Skew normal is a bit constrained in its quantiles, even though it has three parameters.

Thanks as ever Bob.

In my above model l’ve been treating the given .50 quantile as \mu_n, then I think doing exactly what you mentioned in your line-by-line conversion of the eight schools example. The relevant line in your model is theta_r ~ beta(theta, kappa_r). I’ve been treating each person’s .50 quantile as their theta_r.

Is it a bad idea to treat the person’s .50 quantile as analogous to a school-specific point estimate in this way?

I don’t know as I’m not sure what exactly you’re modeling or why you collected three quantiles. As I mentioned above, taking three quantiles from a respondent is almost certainly going to result in values that cannot be reproduced by a beta distribution. The only way to fix that is to go to richer distributions.

1 Like