How to summarize posterior distribution of rankings?

I’m working on a method to use available data from a number of sources to help land managers triage sites to manage. This method involves estimating a few parameters that, along with site-specific data, get fed into a calculation of an index, and the ultimate goal is to rank the site-specific values for the index. The values of the index alone are not informative but comparing them across sites should help triage the management of sites. Thus, the decisions are made based on the ranking of this generated quantity (the index) calculated as a function of parameters and site-level data.

The rub here is that I’m not sure how to best report uncertainty for rankings. Each site’s index value has a posterior distribution, and many sites will likely have overlapping distributions of the index value. Thus, there is variation per posterior draw in the site ranking. I considered, at first, taking the mode of the distribution of rank values per site, and then reporting the probability of that mode (operationalized as the proportion of draws that result in the per site rank being equal to the mode of the draws). However, when the posteriors for the index are variable, the per-site rank distribution tends to have more than one mode. So this leads me to thinking that there might be a better way to summarize the posterior distribution of these site rankings.

Still further, I wonder if I’m going down a wrong path. I suppose that others might summarize the posterior distributions of the indexes, but I’m hesitant to do that based solely on my general desire to summarize at the very end of all of the calculations. Maybe still, there is a larger flaw in my approach?

Do you have insights into how decisions get made based on the ranking?

I think your desire to focus on the posterior distribution of ranks rather than indexes is reasonable. If, for example, the decision is something like “pick k sites to prioritize”, then the posterior quantity P(r_i \leq k) where r_i is the ranking of site i can be useful. This along with E[r_i] and quantiles help get the gist.

In general, ranks are not as straightforward to summarize as other posterior quantities, so if you can get as close to your decision process when trying to summarize, that can help.

1 Like

I know this isn’t exactly what you’re looking for but making decisions based on rank summaries is something that has been discussed a lot in the field of network meta-analysis more broadly. Would some of the ideas in this paper help?