N_eff BDA3 vs. Stan

syclik · January 22, 2018, 1:43pm

It looks like RStan doesn’t: rstan/rstan/src/chains.cpp. I didn’t realize
it wasn’t using Stan’s structure. I think we should consolidate code if we
can.

ariddell · January 22, 2018, 3:58pm

Thanks for your work on chains.hpp, the changes look great.

Getting the interfaces to use the same code is or should be a top priority.

I tried at one point to have PyStan use chains.hpp but it turned out to
be really slow.

The main difficulty is exposing some sort function in the services
namespace which is easy and computationally convenient for the
interfaces to call. The function signature should only involve std data
structures, no Eigen data structures. You can, however, assume that any
interface interested in calling the chains functions will be able to
arrange the samples into contiguous memory. (The RStan people can
correct me if I’m wrong about this.)

avehtari · January 22, 2018, 4:32pm

Should I make pull requests for rstan/rstan/src/chains.cpp and pystan/pystan/_chains.pyx ?

bgoodri · January 22, 2018, 4:33pm

Yeah, although it may require more hacking to get it to work.

avehtari · January 26, 2018, 8:44pm

Why there is effective_sample_size and effective_sample_size2 ? I can read the comment but I don’t understan it. Is effective_sample_size2 really used?

sharan · January 27, 2018, 12:36am

Hey @avehtari,

I am trying to implement this computation in PyMC3.

I have a question for you. In your tests, what exactly is dimensions? Is it the length of the chains or number of chains or something else?

Thanks in advance!
Sharan

avehtari · January 27, 2018, 8:28pm

I don’t know what dimensions you are referring. There is not such variable in the parts of the code I modified.

avehtari · January 28, 2018, 3:54pm

The more hacking seems to refer to the fact that it’s really difficult to build rstan. There are not much instructions and even @bgoodri keeps guessing what I should do to get it built…

PyStan build was easier (only couple hours to figure how to install and run in place), but I haven’t been able to figure out how to run tests, or compare results with CmdStan (same random seeds doesn’t seem to provide same answer). I’m novice in Python, so I’m now stuck, and would appreciate help how to load values of chains from csv-files (one csv-file per chain) and run ess function for those chains. @ariddell can you help?

bgoodri · January 28, 2018, 5:19pm

If everything you need upstream has been merged into develop, try

source("https://raw.githubusercontent.com/stan-dev/rstan/develop/install_StanHeaders.R", echo = TRUE)

and then installing the modified rstan.

avehtari · January 28, 2018, 5:50pm

Where does this install those headers?

bgoodri · January 28, 2018, 6:43pm

.libPaths()[1]

avehtari · January 29, 2018, 9:31pm

Now CmdStan, RStan and PyStan N_eff computations all use Geyer’s initial monotone sequence and results match at least with 6 digit accuracy when using exactly same chains (blocker.1.csv and blocker.2.csv).

Based on the experience it would be great if this kind of computational code would be shared…

maydin · July 20, 2022, 5:57pm

Hi, cmdStan user on Matlab platform here. Sorry to get back to this after all this time. I’m trying to replicate the N_Eff and R_hat calculations of the stansummary with my own code. There are several practical reasons for wanting to do this, e.g. to easily display these stats on figures. R_hat is pretty straight forward. For N_Eff, it is difficult to exactly match the value from stansummary without knowing exactly how the truncation works. I know that an exact match is not necessarily all that important for inference, but for publication purposes, this kind of stuff matters. Is the code, or the formulas, to calculate the stats available somewhere? Perhaps I’m missing something, but did not have much luck with the user guide. I read through the above discussion but it is not clear to me what you all settled on. Thanks in advance.

avehtari · July 20, 2022, 7:10pm

Equations are in the paper Rank-Normalization, Folding, and Localization: An Improved Rˆ for Assessing Convergence of MCMC (with Discussion) (stansummary doesn’t use the rank normalization, but otherwise the details are the same).

You can also look at the code for the exact implementation

github.com

stan-dev/stan/blob/7b291dac8d7c9509bf11db540f5efde7a649d410/src/stan/analyze/mcmc/compute_effective_sample_size.hpp

#ifndef STAN_ANALYZE_MCMC_COMPUTE_EFFECTIVE_SAMPLE_SIZE_HPP
#define STAN_ANALYZE_MCMC_COMPUTE_EFFECTIVE_SAMPLE_SIZE_HPP

#include <stan/math/prim/fun/Eigen.hpp>
#include <stan/analyze/mcmc/autocovariance.hpp>
#include <stan/analyze/mcmc/split_chains.hpp>
#include <algorithm>
#include <cmath>
#include <vector>
#include <limits>

namespace stan {
namespace analyze {
/**
 * Computes the effective sample size (ESS) for the specified
 * parameter across all kept samples.  The value returned is the
 * minimum of ESS and the number_total_draws *
 * log10(number_total_draws).
 *
 * See more details in Stan reference manual section "Effective

This file has been truncated. show original

and

github.com

stan-dev/stan/blob/7b291dac8d7c9509bf11db540f5efde7a649d410/src/stan/analyze/mcmc/autocovariance.hpp

#ifndef STAN_ANALYZE_MCMC_AUTOCOVARIANCE_HPP
#define STAN_ANALYZE_MCMC_AUTOCOVARIANCE_HPP

#include <stan/math/prim.hpp>
#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics/stats.hpp>
#include <boost/accumulators/statistics/variance.hpp>
#include <unsupported/Eigen/FFT>
#include <complex>
#include <vector>

namespace stan {
namespace analyze {

/**
 * Write autocorrelation estimates for every lag for the specified
 * input sequence into the specified result using the specified FFT
 * engine. Normalizes lag-k autocorrelation estimators by N instead
 * of (N - k), yielding biased but more stable estimators as
 * discussed in Geyer (1992); see

This file has been truncated. show original

Or see R version ess_basic() function in posterior/convergence.R at master · stan-dev/posterior · GitHub

maydin · July 20, 2022, 8:09pm

Thank you.

Edouard2laire · July 20, 2022, 10:35pm

Also, you can look at the following Matlab implementation by Beth Baribault : matstanlib/ess.m at master · baribault/matstanlib · GitHub

maydin · July 21, 2022, 2:27am

I looked into this code and ran into problems:

You have to manually alter the method from within the code. The version on Git is set to ‘vehtari’ and it throws an error when ran that way. I tried with ‘BDA3’ and ‘BDA2’ methods and got back wildly different results. BDA3 returned a very small number= 7.2e-4. BDA2 returned an unexpectedly high one= 8.4e3.

The data has 2000 iterations on 4 chains. When I use stansummary, it calculates n_eff to be 4.1e3. My own code, in which I use the first four positive lags (after that autocorrelations look monotonic and alternate between positive and negative), I also get 4.1e3, although not an exact match to stansummary.

In short, I don’t understand the results from this Matlab code.

ahartikainen · July 23, 2022, 1:54am

Maybe run against R/Python implementions side-by-side and compare results for each step

maydin · July 26, 2022, 5:19pm

I want to do that, but do not have them installed on my computer right now and I’m not fully literate on either platform. For now, I’m sticking to comparison with cmdStan stansummary. Thanks

Topic		Replies	Views
Is the effective sample size estimator using the variogram in BDA3 always valid? General loo , model-comparison , time-series	4	2397	March 28, 2022
Stan for Bayesian Hierarchical Models Publicity	6	1405	July 20, 2018
Improving estimates for fast CAR parameters Modeling	2	453	September 9, 2019
Meaning of NaN for se_mean, n_eff and Rhat in Stan output General	2	1903	January 20, 2021
Stan's ESS (and monotone autocorrelations) Algorithms mcmc	5	897	March 13, 2020

N_eff BDA3 vs. Stan

Related topics