Bayesian Quantile Matching Estimation (using Order Statistics)

StaffanBetner · November 10, 2020, 9:41am

Some interesting work I found recently, the Stan code is in the appendix.

Related GitHub repos:

bertschi · November 20, 2020, 9:11pm

Thanks for promoting our work. Hope you find it useful and would be happy about interesting application stories or any kind of questions.

StaffanBetner · November 20, 2020, 9:47pm

One thing I have been thinking of since I read the paper is that it would be useful to extend it to add the sample mean, and I have been thinking of several possible ways to do that. A couple of days ago I realized that the sample mean is just the sum of all observed quantiles divided by n, but I haven’t gotten any further. The paper certainly tickled my curiosity!

The reason why I was looking into this problem from the beginning and found your paper was that I looked into Swedish wage statistics (from unions), which is typically reported with means, some quantiles and an interval on sample size (e.g. sample size of 5-15) aggregated by sector, occupation, age group etc. But for groups with smaller sample sizes many of the cells in the tables were empty (due to privacy reasons?). Being a bit disappointed in that I thought that one could clearly do something way more informative with the publiced aggregated data (say regression models with pooled effects to predict the distribution for each group), and this is some very important results in the direction of doing something interesting with that data.

I also found some quantiles for wages in Swedish municipalities from Statistics Sweden, but no information on the sample size. I think I will email them and ask whether they could provide that information. Could be an interesting case study to model the pooled municipality effect on the mean and variance/dispersion (depending on distribution and parametrization) as correlated.

bertschi · November 21, 2020, 4:54pm

That’s an interesting problem. Basically, you would need to know the sampling distribution of the mean (or any other statistics that you want to consider alongside the quantiles). For the mean, one could probably invoke the central limit theorem – at least if the sample size is high enough – and assume a normal distribution. Unfortunately, that would ignore dependencies between the sampled mean and quantiles, but could work as a first approximation. I’m not aware of any computations giving the joint distribution of different statistics, e.g. quantiles, mean and variance.
The nice thing about quantiles is that their distribution – even jointly – is known and furthermore, can be expressed for any type of data distribution, i.e. by transforming into uniform random variables and then using their order statistics.

StaffanBetner:

The reason why I was looking into this problem from the beginning and found your paper was that I looked into Swedish wage statistics (from unions), which is typically reported with means, some quantiles and an interval on sample size (e.g. sample size of 5-15) aggregated by sector, occupation, age group etc. But for groups with smaller sample sizes many of the cells in the tables were empty (due to privacy reasons?). Being a bit disappointed in that I thought that one could clearly do something way more informative with the publiced aggregated data (say regression models with pooled effects to predict the distribution for each group), and this is some very important results in the direction of doing something interesting with that data.

I also found some quantiles for wages in Swedish municipalities from Statistics Sweden, but no information on the sample size. I think I will email them and ask whether they could provide that information. Could be an interesting case study to model the pooled municipality effect on the mean and variance/dispersion (depending on distribution and parametrization) as correlated.

Well, information on the sample size is often missing. Yet, it provides valuable information which could/should be used for inference. Hope you can obtain that information …

StaffanBetner · March 17, 2021, 1:35pm

I have realized that this could be approached with quantile data as censored observations and the likelihood weighted by the number of observations in each “bin”. I think that the approaches should be equal?

Topic		Replies	Views
(Pseudo) Bayesian Inference for Complex Survey Data Publicity rstan	4	910	June 10, 2020
arXiv:1808.10173v3 [stat.AP] Publicity rstan , fitting-issues , specification , loo	2	883	September 23, 2022
Revised Rhatv5 and ESS paper Publicity	0	812	January 17, 2020
Data for Bayesian hierarchical weighting adjustment and survey inference General	14	1868	February 24, 2024
Wanted: datasets & Stan models with many exchangeable observations for the Bayesian infinitesimal jackknife General	5	520	September 27, 2020

Bayesian Quantile Matching Estimation (using Order Statistics)

Related Topics