MCMC expectation values and errors

caphindsight · March 23, 2020, 2:54pm

I want to use Stan for long running, resource heavy MCMC simulations (QFT on the lattice).

After playing around with it for a few hours, I managed to run a trivial Monte-Carlo simulation (computing the first 5 powers of x where x is uniform within [0 \dots 1]).

Here’s the code:

data {
  int N;
}

parameters {
  real x;
}

model {
  x ~ uniform(0, 1);
}

generated quantities {
  real pow[N];
  for (i in 1:N)
    pow[i] = x^i;
}

When I run this simulation by running ./simulation sample data file=input.json, it produces the CSV output containing 1000 samples.

Instead, I would like it to display the expected values of the powers of x (\left< x^n \right> = \frac{1}{n + 1}) and estimated errors.

Now I can always get that information from the sample, in a post-processing phase. But if I am to run simulations in parallel for days, they would produce an enormous number of samples, for which I don’t have enough storage.

Can I somehow instruct Stan to not keep the samples in memory and not write them to disk, but only keep the expectation values and estimated errors?

ahartikainen · March 23, 2020, 3:37pm

Short answer, no.

Are you doing any inference or just simulation?

If not, then you can skip the model block and define random numbers directly in generated quantities.

x = uniform_rng(0,1);

If you only do simulation, then you could simulate n draws, calculate descriptive stats (with some online method), delete samples and rerun for n draws. You might lose (non-linear) correlation information.

caphindsight · March 23, 2020, 3:40pm

Just simulation, I’m only interested in expectation values and error estimates.

Thanks for the suggestion.

My understanding is – if I skip the model block completely and define everything in generated quantities, I’m not using Stan as it was intended to be used, and I’m probably not going to get much out of it. At this point I may as well be writing plain C++.

Do you agree with this assessment?

ahartikainen · March 23, 2020, 3:55pm

generated quantities is meant for simulation and is much faster at it than model block.

You still have Stan distributions, so not the same as plain C++.

You can use Stan in plain C++, so it is up to the user to select what interface to use.

Topic		Replies	Views
Specifying the number of samples for rng Modeling	6	2409	November 3, 2017
Running stan in R with only generated quantities RStan	7	2463	May 27, 2021
Posterior Predictive Checks After Sampling Modeling	3	807	October 23, 2022
Pattern for single Stan-file for simulate & inference Modeling	6	849	November 15, 2017
Generated quantites as a mechanism to define model-specific proposals Developers	3	518	June 5, 2020

MCMC expectation values and errors

Related topics