Bayesian Quantile Matching Estimation (using Order Statistics)

Some interesting work I found recently, the Stan code is in the appendix.

Related GitHub repos:


Thanks for promoting our work. Hope you find it useful and would be happy about interesting application stories or any kind of questions.


One thing I have been thinking of since I read the paper is that it would be useful to extend it to add the sample mean, and I have been thinking of several possible ways to do that. A couple of days ago I realized that the sample mean is just the sum of all observed quantiles divided by n, but I haven’t gotten any further. The paper certainly tickled my curiosity!

The reason why I was looking into this problem from the beginning and found your paper was that I looked into Swedish wage statistics (from unions), which is typically reported with means, some quantiles and an interval on sample size (e.g. sample size of 5-15) aggregated by sector, occupation, age group etc. But for groups with smaller sample sizes many of the cells in the tables were empty (due to privacy reasons?). Being a bit disappointed in that I thought that one could clearly do something way more informative with the publiced aggregated data (say regression models with pooled effects to predict the distribution for each group), and this is some very important results in the direction of doing something interesting with that data.

I also found some quantiles for wages in Swedish municipalities from Statistics Sweden, but no information on the sample size. I think I will email them and ask whether they could provide that information. Could be an interesting case study to model the pooled municipality effect on the mean and variance/dispersion (depending on distribution and parametrization) as correlated.

That’s an interesting problem. Basically, you would need to know the sampling distribution of the mean (or any other statistics that you want to consider alongside the quantiles). For the mean, one could probably invoke the central limit theorem – at least if the sample size is high enough – and assume a normal distribution. Unfortunately, that would ignore dependencies between the sampled mean and quantiles, but could work as a first approximation. I’m not aware of any computations giving the joint distribution of different statistics, e.g. quantiles, mean and variance.
The nice thing about quantiles is that their distribution – even jointly – is known and furthermore, can be expressed for any type of data distribution, i.e. by transforming into uniform random variables and then using their order statistics.

Well, information on the sample size is often missing. Yet, it provides valuable information which could/should be used for inference. Hope you can obtain that information …

I have realized that this could be approached with quantile data as censored observations and the likelihood weighted by the number of observations in each “bin”. I think that the approaches should be equal?