Is it possible to access the iteration/step number inside a Stan program?

Hi all –

I think I’d like to access the step/iteration number within a Stan program. Is that possible?

The problem is that I run a fixed point iteration at each HMC step. It can be time-consuming,
and using the previous step’s solution as the starting value for the next step would save
a lot of time. Yet I only know how to use the same initial value for each step.

That is, at each step I do something like the following

transformed parameters {
    vector[N] mstar;
    mstar = find_fixed_point(starting_values, model_parameters);

    /* what I want to do is 
    if(iteration == 1) {
        mstar = find_fixed_point(starting_values, model_parameters);
    } else {
        mstar = find_fixed_point(previous_values_of_mstar_from_last_step, model_parameters);
    }
    */
}

Any tips?

1 Like

I think the starting values have to be known data whereas the function returns a vector of vars. So, this wouldn’t work even if you could cache, which you can’t.

1 Like

Maybe you could do something like this with a custom C++ function Allocate a vector with the static keyword to keep the memory from being garbage collected and use value_of(solution) to pull out the double values and assign them to that vector before you return the solution vector in vars.

1 Like

That seems like an avenue worth exploring. Do you have a link to any similar C++ hacks for Stan? I’ve never successfully attempted one.

https://cran.r-project.org/package=rstan/vignettes/external.html

2 Likes

Just tried it out (am working with some external C++ right now). What @bgoodri proposed works fine.

Check the directions here for interfacing C++ with Rstan: https://cran.r-project.org/web/packages/rstan/vignettes/external.html

The C++ code is:

static int asdf = 0;

int test_func(std::ostream* pstream__) {
  asdf += 1;
  return asdf;
}

Which interfaces with Stan with this signature:

functions {
  int test_func();
}

And I just call:

print(test_func());

In the model block and it produces an incrementing counter. (edit: spelling)

1 Like

Very helpful. Thanks Ben, Ben.

Not from the Stan language. We intentionally designed the log density to be stateless.

I do understand your motivation. The only option is what Ben is suggesting.
RStan has its own mechanism for adding C++ directly to the build process after declaring the function you’re going to use in the function block. It won’t be guaranteed to work everywhere. And the sharing will be per-chain, not cross-chain. So you won’t be able to get the chain ID, which is another thing people ask for and which we intentionally made stateless so that all chains would have the same behavior.

1 Like

@bbbales2 @bgoodri: I need the iteration count for a different application (Two step Bayesian procedure to incorporate inverse probability weights into a regression model) where I would need the number of the mcmc iteration, which is not what test_func returns (which counts up to numbers much larger than the number of iterations).

Do you know how i could get the number of the current mcmc iteration?

Not from the inside.

I think the stan program must have some variable storing the iteration number in order to do the warmup and to stop sampling.

Are you saying that this variable (storing the iteration number) is not accessible?
[I tried searching the stan, math, and rstan repositories on github before submitting my question, but I had no luck finding such a variable].
Or am I mistaken and such a variable does not exist?

It is not in scope for the model block (or any other block).

OK, so there is a variable storing the iteration number, it is just not accessible, even not with a custom c++ function similar to the one ben posted above.

One hack would be to use the option sample_file to write the samples to a file and let the custom c++ function check the number of lines in that file (I am assuming that this file produces output similar to cmdstan, where the number of iterations is something like number_lines_sample_file - 37).

I realize that this would slow down sampling a lot, but my model is slow anyhow, and this hack might not slow it down by a lot (I’m not sure though)

Are you doing some sort of warmup with this? And then are gonna run a regular, MCMC chain afterwards? I had a quick read of the abstract of the paper you linked, but this isn’t something I’m familiar with.

It’s not about warm-up.
it is that I want to do a weighted regression model (which here means multiplying the …_lpdf with a weight before adding it to that target) whereby for each sample the weights are generated from then posterior predictions of the selection model. The selection model is 2nd regression model, which I run before the weighted regression and which estimates the probability that a person is included in the study sample. For reasons explained in the above linked paper, one should not put the selection model and the weighted regression model used to estimate treatment effects into the same Bayesian model).

I need the iteration number, so that I can calculate a finite number of weights from the selection model, which I then submit as data to the weighted regression model. When this model is fit, I want to use the iteration number to select which weights to use.

If this is a good way to get unbiased estimates of causal effects in a Bayesian model is matter of current research and debate. But my reading of the above linked paper is that this is an approach that works reasonably well.

No, you can’t get it through the Stan program.

It’s obviously in the C++, but it’s not something that gets passed to the model. As I wrote above, that’s intentional.

Correct.

I only briefly skimmed that paper that you mentioned, but I didn’t see anything that jumped out at me as requiring the iteration number. Where in the paper do you think it does that?

@jjramsey see the 3rd paragraph here (begins with “This procedure parallels that of multiple imputation”).

The author does not mention the iteration number directly, but I understand the paragraph to mean that one generates weights from the posterior predictive distribution (ppd) of the selection model and plugs in a new set of weights from the ppd for each mcmc iteration of the “causal” model.
I want to use the iteration number to determine which set of weights to use in each iteration (after I submit all to be used weights as data). Do you another way to generate an integer one can use to select sets of weights?

One problem I see is that changing weights from mcmc iteration to mcm iteration means that the data used will not be constant. I remember reading posts where @betanalpha recommended against using sub-sampling schemes to achieve faster sampling with very large data sets because non-constant data makes adaptation of HMC parameters difficult. So adaptation and sampling could suffer when I implement the approach.

I’ll soon know if/how this works, because I figured out how to use an external C++ function to count the lines in the sample_file written to the disk. [Not a great solution, but it gets me the integer numbers I need].

1 Like

Hi, can you share your C++ function that you used? I am trying to get the mcmc iteration number, but I have failed to do it.

@bbbales2 I am sorry for asking a question for the post more than 2 years ago. I am dealing with a similar problem with this. So I ran your C++ code, but the numbers that are shown are not from 1 to the number of iterations. They are way large numbers as @Guido_Biele mentioned. FYI, I attach parts of my code

functions{
vector some_function(vector lambda);
int test_func(); // C++ code
}
transformed parameters{
vector<lower=0>[p] lambda_upt;
if (test_func() == 1) {
lambda_upt = lambda;
// for the first run, use the given lambda
} else {
lambda_upt = lambda_upt - some_function(lambda_upt);
// for the rest of run, used the updated lambda
}
}