Summarize draws from specific chains

Hi,

When summarizing model output (e.g., using “print” on an rstan object), is there a way to only summarize draws from specific chains / exclude specific chains from the summary & calculation of n_eff and r-hat?

1 Like

I don’t think it’s natively supported in rstan, but you can do it via the subset_draws function in the posterior package: https://github.com/stan-dev/posterior

library(posterior)

# Convert rstan object to array of samples
fit_draws = as_draws(fit)

# Extract first two chains
draws_12 = subset_draws(fit_draws,chain=1:2)

# Summarise first two chains
summarise_draws(draws_12)

Or in one line:

summarise_draws(subset_draws(as_draws(fit),chain=1:2))
4 Likes

@jonah Is this the most efficient way of summarising a subset of chains from a stanfit object with posterior? Still getting used to the package

1 Like

Thanks!

Is there a way to quickly extract specific parameters? e.g. in rstan can simply specify “pars” argument

Yep, using the variable = option of the subset_draws command:

library(posterior)

# Convert rstan object to array of samples
fit_draws = as_draws(fit)

# Extract first two chains, and only the "mu" & "tau" parameters
draws_12 = subset_draws(fit_draws,chain=1:2,variable=c("mu","tau"))

# Summarise first two chains
summarise_draws(draws_12)
3 Likes

Sorry for the slow reply! Yeah I think using subset_draws() is the way to go. Alternatively you could first just get the subset of chains from rstan using as.array(fit)[,1:2,] and then pass that to posterior, which saves you having to call subset_draws(). But subset_draws() shouldn’t be super inefficient or anything.

3 Likes

I am trying to focus on a subset of chains, as the OP was doing, and I am getting this error message:

> stan_mcmc_fpp_123 <- subset_draws( as_draws(stan_data_fpp), chain=1:3)
Error: All variables in all chains must have the same length.

(Chain 4 did not get anywhere close to mixing with others, it’s just a straight line on traceplots, and judging from the slope, it would take it at least 20,000 draws to get there; all draws were divergent after the warmup, so maybe rstan::sampling() did not really save that much. The successful chains took ~4 hours each to run, so at the moment I kinda want to make do with chains 1:3.)

I’d second an opinion that it is worth adding chains=numeric_vector to the methods like stanfit.summary, but I understand it is a lot of refactoring.