Is it possible to access the iteration/step number inside a Stan program?

That seems like an avenue worth exploring. Do you have a link to any similar C++ hacks for Stan? I’ve never successfully attempted one.

https://cran.r-project.org/package=rstan/vignettes/external.html

2 Likes

Just tried it out (am working with some external C++ right now). What @bgoodri proposed works fine.

Check the directions here for interfacing C++ with Rstan: https://cran.r-project.org/web/packages/rstan/vignettes/external.html

The C++ code is:

static int asdf = 0;

int test_func(std::ostream* pstream__) {
  asdf += 1;
  return asdf;
}

Which interfaces with Stan with this signature:

functions {
  int test_func();
}

And I just call:

print(test_func());

In the model block and it produces an incrementing counter. (edit: spelling)

1 Like

Very helpful. Thanks Ben, Ben.

Not from the Stan language. We intentionally designed the log density to be stateless.

I do understand your motivation. The only option is what Ben is suggesting.
RStan has its own mechanism for adding C++ directly to the build process after declaring the function you’re going to use in the function block. It won’t be guaranteed to work everywhere. And the sharing will be per-chain, not cross-chain. So you won’t be able to get the chain ID, which is another thing people ask for and which we intentionally made stateless so that all chains would have the same behavior.

1 Like

@bbbales2 @bgoodri: I need the iteration count for a different application (Two step Bayesian procedure to incorporate inverse probability weights into a regression model) where I would need the number of the mcmc iteration, which is not what test_func returns (which counts up to numbers much larger than the number of iterations).

Do you know how i could get the number of the current mcmc iteration?

Not from the inside.

I think the stan program must have some variable storing the iteration number in order to do the warmup and to stop sampling.

Are you saying that this variable (storing the iteration number) is not accessible?
[I tried searching the stan, math, and rstan repositories on github before submitting my question, but I had no luck finding such a variable].
Or am I mistaken and such a variable does not exist?

It is not in scope for the model block (or any other block).

OK, so there is a variable storing the iteration number, it is just not accessible, even not with a custom c++ function similar to the one ben posted above.

One hack would be to use the option sample_file to write the samples to a file and let the custom c++ function check the number of lines in that file (I am assuming that this file produces output similar to cmdstan, where the number of iterations is something like number_lines_sample_file - 37).

I realize that this would slow down sampling a lot, but my model is slow anyhow, and this hack might not slow it down by a lot (I’m not sure though)

Are you doing some sort of warmup with this? And then are gonna run a regular, MCMC chain afterwards? I had a quick read of the abstract of the paper you linked, but this isn’t something I’m familiar with.

It’s not about warm-up.
it is that I want to do a weighted regression model (which here means multiplying the …_lpdf with a weight before adding it to that target) whereby for each sample the weights are generated from then posterior predictions of the selection model. The selection model is 2nd regression model, which I run before the weighted regression and which estimates the probability that a person is included in the study sample. For reasons explained in the above linked paper, one should not put the selection model and the weighted regression model used to estimate treatment effects into the same Bayesian model).

I need the iteration number, so that I can calculate a finite number of weights from the selection model, which I then submit as data to the weighted regression model. When this model is fit, I want to use the iteration number to select which weights to use.

If this is a good way to get unbiased estimates of causal effects in a Bayesian model is matter of current research and debate. But my reading of the above linked paper is that this is an approach that works reasonably well.

No, you can’t get it through the Stan program.

It’s obviously in the C++, but it’s not something that gets passed to the model. As I wrote above, that’s intentional.

Correct.

I only briefly skimmed that paper that you mentioned, but I didn’t see anything that jumped out at me as requiring the iteration number. Where in the paper do you think it does that?

@jjramsey see the 3rd paragraph here (begins with “This procedure parallels that of multiple imputation”).

The author does not mention the iteration number directly, but I understand the paragraph to mean that one generates weights from the posterior predictive distribution (ppd) of the selection model and plugs in a new set of weights from the ppd for each mcmc iteration of the “causal” model.
I want to use the iteration number to determine which set of weights to use in each iteration (after I submit all to be used weights as data). Do you another way to generate an integer one can use to select sets of weights?

One problem I see is that changing weights from mcmc iteration to mcm iteration means that the data used will not be constant. I remember reading posts where @betanalpha recommended against using sub-sampling schemes to achieve faster sampling with very large data sets because non-constant data makes adaptation of HMC parameters difficult. So adaptation and sampling could suffer when I implement the approach.

I’ll soon know if/how this works, because I figured out how to use an external C++ function to count the lines in the sample_file written to the disk. [Not a great solution, but it gets me the integer numbers I need].

1 Like

Hi, can you share your C++ function that you used? I am trying to get the mcmc iteration number, but I have failed to do it.

@bbbales2 I am sorry for asking a question for the post more than 2 years ago. I am dealing with a similar problem with this. So I ran your C++ code, but the numbers that are shown are not from 1 to the number of iterations. They are way large numbers as @Guido_Biele mentioned. FYI, I attach parts of my code

functions{
vector some_function(vector lambda);
int test_func(); // C++ code
}
transformed parameters{
vector<lower=0>[p] lambda_upt;
if (test_func() == 1) {
lambda_upt = lambda;
// for the first run, use the given lambda
} else {
lambda_upt = lambda_upt - some_function(lambda_upt);
// for the rest of run, used the updated lambda
}
}

I don’t know how to do this.

You could probably do it with cmdstanr: https://mc-stan.org/cmdstanr/articles/cmdstanr.html

You can pass in the timestep and metric as arguments to sample there, so if you can get an estimate of those from a previous run and then pass them in to cmdstanr with a bunch of one sample calls.

It would be slow and awkward but might work. If this is of interest to you, let me know, and I can find some example code on how to do it. This is not something Stan was designed for though, so it’ll be a really really rough solution.

The function is here. Note that the last line says return iter_count-24;. If the header section of the sample file has changed, 24 might no longer be the correct number here.

If you have it in the same folder as your original stan file, you just need to include

functions { 
   int get_iter(int id);
}

id is just a number used to identify the correct sample file, in case there are multiple in the folder. (in rstan i would use the option sample_file = paste0('tmp',id,'csv') )

When I used it to incorporate uncertainty about weights into a regression model, convergence was good, there were no divergent iterations, but there were BMFI warnings.

I should also say that I myself don’t think this was a 100% good way to do things, but my model was so large (high N) and I already needed to fit each model 50 times for 50 imputed data set, and so I decided against also fitting one model per set of weights.

All this just to say that in case this is about using weights, there is a way to do this without accessing iteration numbers, which I think is preferable when computationally feasible.

I also found myself wanting to access the mcmc iteration number. I found a simple way to do it, which I share here just in case it is useful to someone. What bbbales2 suggested almost works. The key idea is to call the C++ function in the generated quantities block since such block is executed only once per mcmc iteration. We can define the following two C++ functions:

static int itct = 1;
inline void add_iter(std::ostream* pstream__) {
    itct += 1;
}
inline int get_iter(std::ostream* pstream__) {
    return itct;
}

Then within Stan:

functions{
    void add_iter();
    int get_iter();
}
model{
    print(get_iter()); // this will return the mcmc iteration number
}
generated quantities{
    add_iter(); // increment the counter after each iteration
}

You place the “add_iter()” function within the generated quantities block. Then, you can call the “get_iter()” function in any of the other blocks whenever needed and this will return the iteration number.

If you’re running multiple chains and you would like to discriminate among chains, then you could use something like the C++ function “getpid()” to get the PID for each chain/process.

2 Likes