Computational complexity of sampling method in Stan

I am trying to compute the complexity of my model using rStan. I have two nested integrals to compute which are \int_{m} P(m|x) (\int_{x} P(y|m,x) P(x)) dm.

There are 2 for loops in my Stan model that are related to model complexity. In addition to that, Stan computes the parameters where they have a complexity that I don’t know how to compute. Here is my model:

model_code <- "
functions {
    real get_p_y_given_m_rng(real m, real y, real mu_x, real sigma_x, real b_x_y, real b_m_y, real sigma_y){
        // Estimate inner integral
        real x_sample;
        real lp_x_sample;
        real lp_y_x_m_sample;
        real p_y_m_over_x;
        p_y_m_over_x = 0;
        for(i in 1:100){ // This is a number of samples for the inner integral. Complexity depends on this number.
            x_sample = normal_rng(mu_x, sigma_x);
            lp_x_sample = normal_lpdf(x_sample| mu_x, sigma_x);
            lp_y_x_m_sample = normal_lpdf(y| b_x_y * x_sample  + b_m_y * m, sigma_y);
            p_y_m_over_x = p_y_m_over_x + exp(lp_x_sample + lp_y_x_m_sample);
        return p_y_m_over_x;
data {
    vector[100] X;
    real x; // intervention value
    real y; // target value of Y in P(Y=y|do(X=x))
    vector[100] M;
    vector[100] Y;
parameters {
    real mu_x;
    real<lower=0> sigma_x;
    real b_x_m;
    real<lower=0> sigma_m;
    real b_m_y;
    real b_x_y;
    real<lower=0> sigma_y;
model {
    X ~ normal(mu_x, sigma_x);
    M ~ normal(b_x_m * X, sigma_m);
    Y ~ normal(b_x_y * X + b_m_y * M, sigma_y);
generated quantities{
    real p_m_given_x;
    real p_y_given_m;
    real p_y_given_do_x;
    real m;
    p_y_given_do_x = 0;
    for (j in 1:100){ // This is a number of samples for outer integral. Complexity depends on this number.
        m = normal_rng(b_x_m * x, sigma_m);
        p_m_given_x = exp(normal_lpdf(m|b_x_m * x, sigma_m ));
        p_y_given_m = get_p_y_given_m_rng(m, y, mu_x, sigma_x, b_x_y, b_m_y, sigma_y);
        p_y_given_do_x = p_y_given_do_x + p_y_given_m * p_m_given_x;
compiled_model <- stan_model(model_code = model_code)
samples <- sampling(compiled_model, data = c(.data, list(x = 2.0, y = 3.0)))

Where .data is just a tibble that contains values for X, M, and Y. Any help would be greatly appreciated.

Can you describe what is not working and in what way you know it is not working (so it’s easier to give advice)?

Everything is working here. My goal is to understand the complexity of this Stan model. I have 7 parameters and I am using Stan’s HMC algorithm to train these parameters. I have two for loops inside my model that also impact the complexity, but I can’t figure out what is the Stan’s HMC algorithm complexity and overall the complexity of my whole model.

Oh, I’m not sure how to write that.

In its default configuration with a diagonal inverse metric, most of the work in Stan is in gradient evaluations, and for each draw you make, there are between 1 and 2^max_treedepth gradient evaluations. Gradient evaluations are basically evaluations of the model block.

So if you have 1000 warmup draws, 1000 post warmup draws, max_treedepth = 10 (the default max), then you’ll do between 2000 and 2000 * 2^10 gradient evaluations. There’s a lot of variation in there how long something would run, and in some models you would want more than 2000 total draws or a treedepth higher than 10.

That help at all?

I think in terms of writing down big O notation I’d separate the work done by MCMC from the complexity of the model block.

With a dense inverse metric, if P is the number of parameters, there’s some O(P^3) linear algebra internally. For a diagonal inverse metric it is O§.

1 Like

So the complexity of the Stan MCMC model is: num_iterations * 2 ^ max_treedepth + nun_params^3? And I have to calculate the complexity of model block separately. Is that correct?

Well you need to evaluate the model block between num_iterations and num_iterations * 2^max_treedepth times with default settings.

If you’re using a dense inverse metric then for each of those iterations there is an O(P^3) thing (where P is the number of parameters), but really we should replace it with an O(P^2) thing, but if you’re using the defaults the operation is an O§ thing.

If you’re using the defaults and your model block complexity is at least O§ (dunno how it wouldn’t be), then just ignore this.

What is correct presumably depends on your audience. I would separate them myself if I knew nothing else.

1 Like

What is a O§ ?

Based on what you said can I conclude that if I am not using a dense inverse metric (using Stan’s default settings) then for each of those iterations there is’nt an O(P^3) thing ?(where P is the number of parameters)
Does this mean that the model complexity does not depend on the number of parameters? No matter if we have 1 parameter to learn or 20 parameters to learn, the complexity would be the same?

It means that doing leapfrog steps as part of HMC require some O(P) math. I’m assuming the gradient evaluations (defined by your model) are at least O(P), which would make them more important.

O§ -> O(P) my bad looks like some automatic Discourse formatting.


Thank you. Now I have a better idea of the complexity of my model even though I can’t compute it exactly.

In case it’s helpful, here’s a recent post that’s on a similar topic:


Thank you. this was helpful. It seems like there is not a fix big O complexity formula for HMC NUTS in terms of #iterations, #parameters, etc.

1 Like