`sample` object in the `transition` function

yizhang · February 25, 2022, 8:51pm

I wonder if we can avoid repeatedly creating sample object when transition is called, as this involves copying the Eigen::VectorXd in the sample constructor. The sampler already has the updated sample in its z_.q component, and all it needs for adaptation is just the accept_stat.

Instead in transition can we overwrite the input init_sample? @betanalpha Am I missing something here?

betanalpha · February 25, 2022, 9:04pm

The code was written to be as general as possible – taking in one state and outputting another – independent of how those states are used. In particular it intentionally does not assume that the previous state will not be used later on, for example by in-memory summaries/diagnostics/etc.

Currently once we write out a sampler state through the mcmc_writer and then pass it to the transition function we never use that state again so that memory could be reused (either by modifying the sample in place or maybe using an r-value pattern?) but then the code would be limited to that particular context.

Overall this is pretty small memory hit, however, especially compared to the overall memory burden used by the transition function internally as is being discussed din a few other threads.

yizhang · February 25, 2022, 9:24pm

Can you point me to them? Thanks.

Bob_Carpenter · March 9, 2022, 9:34pm

It’s small in size, but could be big in terms of memory pressure (how many times we have to malloc) and non-locality (fetching from RAM [instead of cache on a cache miss] is over 100 times slower than arithmetic).

betanalpha · March 15, 2022, 8:06pm

No disagreement on the potential for memory issues. I encountered no end of memory-related performance issues when experimenting with higher-order autodiff implementations back in the day with arena allocators modeled on those used by Stan. At the same time when I added the additional termination checks in the last big PR to the Hamiltonian Monte Carlo implementation, which required three or four new state vectors for each active subtree, there was no appreciable effect on performance over a range target distributions.

I’m not saying that memory can’t be an issue in certain cases, I’m just asking for empirical demposntatiosn that it’s actually becoming an issue in realistic problems.

Bob_Carpenter · March 28, 2022, 10:48pm

That’s also what I’d want to see. But I don’t know how to do this. To demonstrate it, I’d probably try to build something faster by being more careful with memory reallocation and show it’s faster. I’m curious how you might show this without a better alternative—I don’t know much about diagnosing/tracing memory issues like this.

betanalpha · April 26, 2022, 9:02pm

I’m going back to my time experimenting with higher-order autodiff frameworks based on Stan’s autodiff framework. The performance differences were drastic once the memory burden became too large for the caches. Since then I’ve always kept an eye out on scaling with system size for and I’ve never been able to see significant affect in the sampler code.

Dedicated tools like valgrind and their ilk definitely provide a more complete picture of what’s going on.

Topic		Replies	Views
Memory issues with custom model Modeling windows	5	685	March 10, 2021
Looking for advice on fitting a time-series model (every single day) General techniques , performance	12	870	November 11, 2020
Sampler parameters which the typical user might need to set (hmc_nuts_diag_e_adapt only) Developers	18	1369	December 3, 2019
Model fitting and sampling issue: Only 1 chain sampling properly Modeling rstan , fitting-issues , performance	14	2477	September 20, 2022
Memory issue & matrix[uni] indexing: accessing element out of range. error Modeling rstan	0	321	September 11, 2023

`sample` object in the `transition` function

Related topics