Posterior median or simulations?

Is there an easy way to grab the posterior median?

For example, I had a fitted stan model called “fit” with a vector parameter called “p”, and I grabbed its median as follows:

sims <- extract(fit)
p_median <- apply(sims$p, 2, median)

But this is awkward because I would have to do it differently if p were a scalar or a 2-d array.

I’d like a function that would operate like this:

medians <- extract(fit, operation=“median”)
p_median <- medians$p

Similarly, I’d like to be able to grab a single posterior simulation, something like this:

sims <- extract(fit, operation=“simulations”)
p_first_draw <- sims[[1]]
p_second_draw <- sims[[2]]
. . .
p_4000th_draw <- sims[[4000]]

These sorts of things come up all the time, and I find it very awkward to have to sift through the parameters.

You might find it easier to get a vector medians from summary rather than extract with

medians <- summary(fit)$summary[ , "50%"]

but if you want the realization of particular iterations, then you should be working with it as a matrix

sims <- as.matrix(fit)

Ben:

I know about sims <- as.matrix(fit) but my problem with that is that it creates a combined list, e.g., a[1], a[2], …, a[50], b[1], b[2], …, b[50], c[1,1], … c[4,5], rho, sigma. Or whatever. But I’d like to access these (the medians or a single simulation draw) as a, b, c, rho, sigma, each with its correct dimensions.

That is not so easy the way things get stored now, but it is one of the things to be implemented for rstan3 once it becomes possible on the R side to figure out what type and dimensions the Stan declarations were.

But it can be done now, sort of. If you relist the medians (or anything else),

medians <- summary(fit)$summary[ , "50%"]
relist(medians, skeleton = get_inits(fit)[[1]])

using the list of initial values to provide the structure.

Just to be clear on the specs:

I will use the term “parameter” to refer to a named parameter. For example, if “a” is a vector of length 10, then “a” is one parameter. I would not refer to “a[3]” as a parameter but rather as an element of the parameter “a”. I think the integrity of the named parameter is a key part of Stan being a probabilistic programming language.

Suppose we fit a Stan model with 3 parameters:

parameters {
vector[10] a;
real b[3,5];
real<lower=0> sigma;
}

We fit the model and get a stanfit object, which I’ll call “fit”.

So here’s what I’d like:

We’d have a function called “extract” (or something like that) which would take 2 arguments: the stanfit object and an instruction of what to pull out. Thus, extract <- function(stanfit, operation, …), where … are arguments to operation.

For example:

foo <- extract(fit, operation=sample, 44)

would return a list of length 3. The first element of the list would be named “a” and would be a vector of length 10, with the values of the 44th draw of “a” from the posterior. The second element of the list would be named “b” and would be an array of dimension [3,5], etc.

For another example,

foo <- extract(fit, operation=“median”)

would return a list of length 3. The first element of the list would be named “a” and would be a vector of length 10, corresponding to the elementwise posterior median of this parameter. The second element of the list would be named “b” and would be an array of dimension [3,5], etc.

Or:

foo <- extract(fit, operation=“quantile”, 0.80)

would return a list where each item is the elementwise 80% quantile from the posterior simulation.

I’m no expert on what is good R practice. Perhaps it would make more sense to have two functions, one for posterior draws and another for posterior summaries. It could go either way.

One thing that seems clear, though, is that everything would be done in an elementwise way. This is the “probabilistic programming” thing where, after fitting the Stan model, we can directly grab items that are the same size and shape as the “a”, “b”, and “sigma” in the model.

This has been coming up in just about every example I’ve been doing.

It looks like from Ben’s last reply that there might be a way to hack together this function using relists of some sort? If so, I think it would be a good idea for us to create the function as then we can all use it while we await cleaner future versions.