Meta-regression model for proportions with standard errors

It is quite common for providers of official statistics (at least in Sweden) to provide standard errors (or more often margin of errors) for proportions. I like to play around with that kind of data e.g. estimating time trends and such. But I want to discuss how to model data of this kind in a good way (mainly with regards to range of domain of proportions).

Let say I have a small dataset like this:

tibble::tribble(~year, ~proportion,                ~se,
                2009L,       0.249, 0.0168367346938776,
                2011L,       0.306, 0.0178571428571429,
                2013L,       0.369, 0.0168367346938776,
                2015L,       0.418, 0.0178571428571429,
                2017L,       0.495, 0.0214285714285714)

(standard errors calculated from the margin of error by dividing by 1.96)

What I have had in mind recently is a Gaussian family with a logit link, which is not supported by default in brms, so I have to resort to a non-linear formula:

brm(data = data_object,
    formula = bf(proportion|se(se) ~ inv_logit(eta),
                 eta ~ 1 + s(year, k = 4), 
                 nl = TRUE), 
    control = list(adapt_delta=0.99),
    family = gaussian()) -> 
  meta_proportion_model

This approach seems to work with regards to the domain of proportion (between 0 and 1), since fitted values far outside of the data range is between 0 and 1 (albeit not predictions, of course) but I fear it is a bit sub-optimal in comparison to a Beta regression model. I don’t know the Beta distribution parametrization used in brms by heart, but I have modeled data of this kind before by deriving the dispersion parameter and used an offset to fix it. I am also a bit unsure whether this is the same thing as using a logit link, but I suspect it is.

How would you approach “meta-data” like this?

EDIT: The discussion if of course extendable to other domain restricted statistics with standard errors.

1 Like

Hi Staffan!

Short answer: I don’t know. Longer answer: I don’t know, but let me try something:

Let’s say you were given a proportion p=0.25 and a standard error \text{s.e.}=0.015. The standard error is calculated by \text{s.e.}=\sqrt{\frac{p(1-p)}{n}}, which is 0.015^2=0.1875/n, and we are left with n=0.1875/0.000225\approx833.

Now let \alpha=0.25\times833=208.25 and \beta=833-208.25=624.75 and then (roughly) P \sim \text{Beta}(\alpha,\beta), and \text{E}(P)=p=0.25 and \text{sd}(P)=\text{s.e.}=0.015.

> alpha <- 208.25
> beta <- 624.75
> draws <- rbeta(1e6,alpha,beta)
> mean(draws)
[1] 0.249989
> sd(draws)
[1] 0.01498363

Now, I don’t know about brms but this is fairly easy to implement in Stan. You could even use Stan’s build in reparameterization of the Beta, i.e. beta_proportion, which is parameterized by a mean \mu and a scale \kappa. I think \kappa=\alpha+\beta=n, but you’d have to verify this. A regression model in Stan could then use

...
model{
   // linear predictor
  vector[N] mu = inv_logit(X*b);

  // p is the proportion given as data
  // kappa is alpha + beta = n calculated from the s.e.
  p ~ beta_proportion(mu, kappa);
}

Note that n, or \kappa, does not have to be an integer here. I just rounded down out of convenience.

I hope this helps! Please feel free to ask questions.
Cheers!
Max

2 Likes

I have tried approaches like that before, and I can write out some relationships for clarity.
\mu = \frac{\alpha}{\alpha+\beta}
\kappa=\phi=\alpha+\beta=n

\phi is the parameter name for \kappa in brms, although brms currently uses the ordinary beta distribution parametrization \alpha = \mu\kappa, \beta=(1-\mu)\phi. Maybe beta_proportion is a later addition to the Stan language?

An additional useful relationship is \kappa=\frac{p(1-p)}{(\text{s.e.})^2}.

When comparing the fitted values from two analogous models (fixed standard deviation logit-link normal and fixed phi logit-link beta) I get essentially the same results. Would be interesting to try to find cases where they are very different. Maybe close to 0/1 estimates?

The beta distribution approach might be a bit more useful since we can do predictions within domain. Although I have a bit difficult to see the value of that, but there is certain some value of it that I don’t know! For e.g. temporal pooling of estimates I don’t really think it matters which approach one uses.

1 Like

I think so, yes. I don’t remember when it was added.

With the derived counts (or call them pseudo-counts or whatever) you could probably also use a Binomial model, right? This might be a bit easier to implement in brms.

I agree, if you are interested in expectations and the modeling is not too complex, it is often easier to just use a Normal model. Perhaps even without link function.

1 Like

Careful with binomial—if the pseudo-sample size is small I think you’ll end up with the wrong mean any time the observed proportion is between integer values. That makes the beta appealing at least at first glance, I’m thinking of values near 0 or 1 with larger standard errors where the normal may not be as great

1 Like

It all depends on what a large standard error is for p close to 0 or 1. A large standard error near 0 or 1 would require the sample size to be small. For a fixed n \text{s.e}=\sqrt{\frac{p(1-p)}{n}} is largest at p = 0.5, and approaches 0 when p goes towards 0 or 1. And we haven’t even considered the possible values of p given the sample size (if it is known) since x \in 1, ..., n-1 (the standard error collapses at the edges 0 and 1) if p=\frac{x}{n}.

The case of edge proportions is also interesting, but without known sample sizes there is not much to do I guess?

Hey, @StaffanBetner so I’m not following exactly so let me try to explain my own thoughts and see where that leads us. I actually have been thinking about this very same problem myself. I started with almost exactly what @Max_Mantei proposed except with the binomial instead of the beta distribution. I got some very skeptical responses on that idea and I do think it requires some careful thought, but I for now I think it does make sense.

I think your comment on possible values of p given sample size is important, its what I was trying to point to about the binomial—that creates a problem because the census estimates are going to be between those values.

I hope I’m not on the wrong track with this response but think of this: I collect a bunch of survey responses for you and I record a proportion of successes x and calculate its standard error se, but I forget to write down the sample size and then I disappear. Now if you want to find out what the sample size was, you should be able to plug in x and se to the formula for the standard error and solve for n exactly, just reversing the calculation I did basically.

Of course census data is not based on just sample data, they use modeling and a bunch of other information they have. So you’re right, we can never know the sample size and it also doesn’t really ‘exist’ either. For what its worth I think just using a normal distribution [0,1] is not bad; I think whatever you do it may feel imperfect but be far better than ignoring the uncertainty. But I don’t know about the smoothing over time, maybe a bigger question there.

I’m with you on what you mean!

Regarding using a normal logit-link approach or a beta approach I don’t think it would really matter much which one that one chooses, since the mean of the normal distribution is restricted by the logit-link. One thing I realized though is that we assume that the provided standard error is calculated with the normal approximation of a binomial distribution, which gives us slight overconfidence in our derived beta distribution.

If we instead want to match the moments of a beta distribution to the known moments (i.e. the mean and the standard deviation), we can derive that \kappa = \frac{p(1-p)}{\text{s.e}^2} - 1, i.e. a pseudo-sample size smaller with 1 compared to the normal approxmation approach.

When \kappa is one smaller, then we get a slightly higher standard deviation. The resulting beta distribution standard deviation from the normal approximation is smaller by a factor of \sqrt{\frac{\kappa+1}{\kappa+2}}, but as \kappa tends to infinity it goes to 1.

The standard deviation of a beta distribution parametrizied by \mu and \kappa is \sqrt{\frac{(1-\mu)\mu}{\kappa + 1}} (we can get the formula above by setting it equal to \text{s.e} and solve for \kappa)

I suppose that when we have no idea on how the standard error was calculated this is a slightly better one (although the difference would be very small).

One other thing is that we don’t have to interpret \kappa as a sample size, making my own argument about the restrictions of p given the (pseudo-)sample size a bit irrelevant and if there would be some kind of weighting in the estimation of p in the survey the pseudo-sample size could very well be a non-integer. We also have the problem with rounding of the numbers…

I think we can agree on that an binomial approach is the worst of the suggested approaches, but I am not sure whether it is bad in an absolute sense.

An interesting extension to this discussion is how to handle the situation when we have several proportions with standard errors (or maybe confidence intervals) which sums to one. I suppose that the dirichlet distribution would be involved in a solution to that.

Hah, yeah I think you’re hitting a bunch of questions that will be useful to answer because I found myself with the same questions, re. proportions that need to sum to one. I was looking at measures of residential segregation.

@StaffanBetner:
One thing I realized though is that we assume that the provided standard error is calculated with the normal approximation of a binomial distribution, which gives us slight overconfidence in our derived beta distribution.

I don’t know about other countries but the US Census Bureau uses variance replicate methods. I still have to better familiarize myself with this stuff but the general idea is the following:

Replicate methods for variance estimation are commonly used in large sample surveys with many variables. The procedure uses estimators computed on subsets of the sample, where subsets are selected in a way that reflects the sampling variability. Replication variance estimation is an appealing alternative to Taylor linearization variance estimation for nonlinear functions. Replicate methods have the advantage of transferring the complexity of variance estimation from data set end users to the statistician working on creating the output data set. By providing weights for each subset of the sample, called replication weights, end users can estimate the variance of a large variety of nonlinear estimators using standard weighted sums. Jackknife, balanced half-samples, and bootstrap methods are three main replication variance methods used in sample surveys.

I’m thinking in reality its like a sausage factory. There’s no probability density function in use here I don’t think, but they are presented to the public with the suggestion that we assign a Gaussian p.d.f to the sampling errors with given s.e. as \sigma (obviously I don’t see any reason why we should (here or anywhere) take the Gaussian to be the default answer). Since the variance replicate tables are available for select variables, it might be interesting to look at a histogram of the variance replicates for some variable(s) of interest…though I’m not sure if anything could be taken away from that exercise besides a little bit of perspective.

I’m finding this discussion and the paper to be relevant:

https://statmodeling.stat.columbia.edu/2020/05/29/in-bayesian-priors-why-do-we-use-soft-rather-than-hard-constraints/

http://www.stat.columbia.edu/~gelman/research/unpublished/specificity.pdf