I recently run an IRT model with 100 and 500 simulated data points. To my surprise, the effective sample sizes for key parameters were in fact larger in conditions with 100 simulated data points than those with 500 simulated data points. Does anyone know what may be the cause of this phenomenon? Is this something I should be concerned about?
You shouldn’t expect the two to be related. Effective sample size tells you approximately how many (uncorrelated) samples you’ve drawn from your posterior based on the (correlated) samples you did draw. Does that help?
Thank you very much @sakrejda That makes sense! But I was still wondering why large sample size comes with smaller effective sample size? Is it because larger sample size may require more iterations? For now, I just used 1000 iterations for both sample sizes.
I wouldn’t think there’s a monotonic relationship, and any relationship is probably model-specific. I think it’s a common misconception that a larger number of data points necessarily improves all aspects of inference – with more data you also have a more constrained problem, as well as increased computational demand. This may slow things down as a function of both time and iterations.
Finding out what is actually causing the “problem” is probably not trivial, but the bottom line is changing the data in any manner changes the likelihood in ways that may not be obvious.
@caesoma Thank you for the clarification!
Rather than trying to answer this question directly you might want to look at the computation that actually goes into HMC because it’ll help you get an intuition for the kinds of things that make it harder for the algorithm to traverse the posterior (and therefore often lower ESS).
Thank you @sakrejda ! Could you please elaborate on what you mean by “look at the computation that actually goes into HMC”? I want to figure out why but not sure about where to start.
Well, for starters here’s a viz of the underlying simulation that happens within each iteration for some specific densities. It lets you vary the simulation stepsize and the number of steps (both are parameters that NUTS tunes). The likelihood (and it’s gradient) are calculated roughly once per step of the simulation.