Computational complexity of sampling method in Stan

srtaheri · August 24, 2020, 4:14pm

I am trying to compute the complexity of my model using rStan. I have two nested integrals to compute which are \int_{m} P(m|x) (\int_{x} P(y|m,x) P(x)) dm.

There are 2 for loops in my Stan model that are related to model complexity. In addition to that, Stan computes the parameters where they have a complexity that I don’t know how to compute. Here is my model:

model_code <- "
functions {
    real get_p_y_given_m_rng(real m, real y, real mu_x, real sigma_x, real b_x_y, real b_m_y, real sigma_y){
        // Estimate inner integral
        real x_sample;
        real lp_x_sample;
        real lp_y_x_m_sample;
        real p_y_m_over_x;
        p_y_m_over_x = 0;
        for(i in 1:100){ // This is a number of samples for the inner integral. Complexity depends on this number.
            x_sample = normal_rng(mu_x, sigma_x);
            lp_x_sample = normal_lpdf(x_sample| mu_x, sigma_x);
            lp_y_x_m_sample = normal_lpdf(y| b_x_y * x_sample  + b_m_y * m, sigma_y);
            p_y_m_over_x = p_y_m_over_x + exp(lp_x_sample + lp_y_x_m_sample);
        }
        return p_y_m_over_x;
    }
}
data {
    vector[100] X;
    real x; // intervention value
    real y; // target value of Y in P(Y=y|do(X=x))
    vector[100] M;
    vector[100] Y;
}
parameters {
    real mu_x;
    real<lower=0> sigma_x;
    real b_x_m;
    real<lower=0> sigma_m;
    real b_m_y;
    real b_x_y;
    real<lower=0> sigma_y;
}
model {
    X ~ normal(mu_x, sigma_x);
    M ~ normal(b_x_m * X, sigma_m);
    Y ~ normal(b_x_y * X + b_m_y * M, sigma_y);
}
generated quantities{
    real p_m_given_x;
    real p_y_given_m;
    real p_y_given_do_x;
    real m;
    p_y_given_do_x = 0;
    for (j in 1:100){ // This is a number of samples for outer integral. Complexity depends on this number.
        m = normal_rng(b_x_m * x, sigma_m);
        p_m_given_x = exp(normal_lpdf(m|b_x_m * x, sigma_m ));
        p_y_given_m = get_p_y_given_m_rng(m, y, mu_x, sigma_x, b_x_y, b_m_y, sigma_y);
        p_y_given_do_x = p_y_given_do_x + p_y_given_m * p_m_given_x;
    }
}
"
compiled_model <- stan_model(model_code = model_code)
samples <- sampling(compiled_model, data = c(.data, list(x = 2.0, y = 3.0)))

Where .data is just a tibble that contains values for X, M, and Y. Any help would be greatly appreciated.

bbbales2 · August 25, 2020, 2:27am

Can you describe what is not working and in what way you know it is not working (so it’s easier to give advice)?

srtaheri · August 25, 2020, 2:31pm

Everything is working here. My goal is to understand the complexity of this Stan model. I have 7 parameters and I am using Stan’s HMC algorithm to train these parameters. I have two for loops inside my model that also impact the complexity, but I can’t figure out what is the Stan’s HMC algorithm complexity and overall the complexity of my whole model.

bbbales2 · August 25, 2020, 3:07pm

Oh, I’m not sure how to write that.

In its default configuration with a diagonal inverse metric, most of the work in Stan is in gradient evaluations, and for each draw you make, there are between 1 and 2^max_treedepth gradient evaluations. Gradient evaluations are basically evaluations of the model block.

So if you have 1000 warmup draws, 1000 post warmup draws, max_treedepth = 10 (the default max), then you’ll do between 2000 and 2000 * 2^10 gradient evaluations. There’s a lot of variation in there how long something would run, and in some models you would want more than 2000 total draws or a treedepth higher than 10.

That help at all?

bbbales2 · August 25, 2020, 3:09pm

I think in terms of writing down big O notation I’d separate the work done by MCMC from the complexity of the model block.

With a dense inverse metric, if P is the number of parameters, there’s some O(P^3) linear algebra internally. For a diagonal inverse metric it is O§.

srtaheri · August 25, 2020, 3:42pm

So the complexity of the Stan MCMC model is: num_iterations * 2 ^ max_treedepth + nun_params^3? And I have to calculate the complexity of model block separately. Is that correct?

bbbales2 · August 25, 2020, 4:27pm

Well you need to evaluate the model block between num_iterations and num_iterations * 2^max_treedepth times with default settings.

If you’re using a dense inverse metric then for each of those iterations there is an O(P^3) thing (where P is the number of parameters), but really we should replace it with an O(P^2) thing, but if you’re using the defaults the operation is an O(P) thing.

If you’re using the defaults and your model block complexity is at least O(P) (dunno how it wouldn’t be), then just ignore this.

What is correct presumably depends on your audience. I would separate them myself if I knew nothing else.

srtaheri · August 25, 2020, 7:33pm

What is a O§ ?

srtaheri · August 25, 2020, 8:04pm

Based on what you said can I conclude that if I am not using a dense inverse metric (using Stan’s default settings) then for each of those iterations there is’nt an O(P^3) thing ?(where P is the number of parameters)
Does this mean that the model complexity does not depend on the number of parameters? No matter if we have 1 parameter to learn or 20 parameters to learn, the complexity would be the same?

bbbales2 · August 25, 2020, 9:15pm

It means that doing leapfrog steps as part of HMC require some O(P) math. I’m assuming the gradient evaluations (defined by your model) are at least O(P), which would make them more important.

O(P) → O(P) my bad looks like some automatic Discourse formatting.

srtaheri · August 26, 2020, 1:18am

Thank you. Now I have a better idea of the complexity of my model even though I can’t compute it exactly.

jonah · August 26, 2020, 1:41am

In case it’s helpful, here’s a recent post that’s on a similar topic:

srtaheri · August 26, 2020, 2:06pm

Thank you. this was helpful. It seems like there is not a fix big O complexity formula for HMC NUTS in terms of #iterations, #parameters, etc.

Topic		Replies	Views
Asymptotic computational complexity of HMC - for a simple GMM tutorial - is it - in the first approximation - O(P^3*N) ? where : P- number of parameters, N - number of samples Modeling performance	16	1493	January 13, 2019
Rstan much slower than self-coded MCMC Modeling performance	10	2397	January 14, 2022
Computationally expensive three component mixture model, suggestions for speed-up? Modeling	10	1254	August 22, 2017
Question about the HMC algorithm in rstan General rstan	14	947	January 19, 2021
A big stochastic model in Stan Modeling	20	2068	October 22, 2017

Computational complexity of sampling method in Stan

Related topics