Effective sample size and sample size

Bobby · April 22, 2022, 9:30pm

Hi all,

I recently run an IRT model with 100 and 500 simulated data points. To my surprise, the effective sample sizes for key parameters were in fact larger in conditions with 100 simulated data points than those with 500 simulated data points. Does anyone know what may be the cause of this phenomenon? Is this something I should be concerned about?

Best,
Bo

sakrejda · April 23, 2022, 11:00am

You shouldn’t expect the two to be related. Effective sample size tells you approximately how many (uncorrelated) samples you’ve drawn from your posterior based on the (correlated) samples you did draw. Does that help?

Bobby · April 23, 2022, 4:15pm

Thank you very much @sakrejda That makes sense! But I was still wondering why large sample size comes with smaller effective sample size? Is it because larger sample size may require more iterations? For now, I just used 1000 iterations for both sample sizes.

caesoma · April 23, 2022, 6:49pm

I wouldn’t think there’s a monotonic relationship, and any relationship is probably model-specific. I think it’s a common misconception that a larger number of data points necessarily improves all aspects of inference – with more data you also have a more constrained problem, as well as increased computational demand. This may slow things down as a function of both time and iterations.

Finding out what is actually causing the “problem” is probably not trivial, but the bottom line is changing the data in any manner changes the likelihood in ways that may not be obvious.

Bobby · April 23, 2022, 7:10pm

@caesoma Thank you for the clarification!

sakrejda · April 26, 2022, 1:43am

Rather than trying to answer this question directly you might want to look at the computation that actually goes into HMC because it’ll help you get an intuition for the kinds of things that make it harder for the algorithm to traverse the posterior (and therefore often lower ESS).

Bobby · April 28, 2022, 12:31am

Thank you @sakrejda ! Could you please elaborate on what you mean by “look at the computation that actually goes into HMC”? I want to figure out why but not sure about where to start.

sakrejda · April 29, 2022, 3:55am

Well, for starters here’s a viz of the underlying simulation that happens within each iteration for some specific densities. It lets you vary the simulation stepsize and the number of steps (both are parameters that NUTS tunes). The likelihood (and it’s gradient) are calculated roughly once per step of the simulation.

https://chi-feng.github.io/mcmc-demo/app.html#HamiltonianMC,banana

Topic		Replies	Views
Low effective sample size after running Bayesian cognitive model in Stan Modeling rstan , fitting-issues	8	782	August 18, 2021
Question about Effective Sample Size Formulation from Bayesian Data Analysis 3rd edition General	2	459	November 22, 2018
Number of iterations General	13	7149	March 26, 2019
Asymptotic computational complexity of HMC - for a simple GMM tutorial - is it - in the first approximation - O(P^3*N) ? where : P- number of parameters, N - number of samples Modeling performance	16	1486	January 13, 2019
Low number of effective samples with 2PL latent space model / IRT Modeling fitting-issues , specification	9	1615	May 15, 2017

Effective sample size and sample size

Related topics