I am trying to compute the complexity of my model using rStan. I have two nested integrals to compute which are \int_{m} P(m|x) (\int_{x} P(y|m,x) P(x)) dm.
There are 2 for loops in my Stan model that are related to model complexity. In addition to that, Stan computes the parameters where they have a complexity that I don’t know how to compute. Here is my model:
model_code <- "
functions {
real get_p_y_given_m_rng(real m, real y, real mu_x, real sigma_x, real b_x_y, real b_m_y, real sigma_y){
// Estimate inner integral
real x_sample;
real lp_x_sample;
real lp_y_x_m_sample;
real p_y_m_over_x;
p_y_m_over_x = 0;
for(i in 1:100){ // This is a number of samples for the inner integral. Complexity depends on this number.
x_sample = normal_rng(mu_x, sigma_x);
lp_x_sample = normal_lpdf(x_sample| mu_x, sigma_x);
lp_y_x_m_sample = normal_lpdf(y| b_x_y * x_sample + b_m_y * m, sigma_y);
p_y_m_over_x = p_y_m_over_x + exp(lp_x_sample + lp_y_x_m_sample);
}
return p_y_m_over_x;
}
}
data {
vector[100] X;
real x; // intervention value
real y; // target value of Y in P(Y=y|do(X=x))
vector[100] M;
vector[100] Y;
}
parameters {
real mu_x;
real<lower=0> sigma_x;
real b_x_m;
real<lower=0> sigma_m;
real b_m_y;
real b_x_y;
real<lower=0> sigma_y;
}
model {
X ~ normal(mu_x, sigma_x);
M ~ normal(b_x_m * X, sigma_m);
Y ~ normal(b_x_y * X + b_m_y * M, sigma_y);
}
generated quantities{
real p_m_given_x;
real p_y_given_m;
real p_y_given_do_x;
real m;
p_y_given_do_x = 0;
for (j in 1:100){ // This is a number of samples for outer integral. Complexity depends on this number.
m = normal_rng(b_x_m * x, sigma_m);
p_m_given_x = exp(normal_lpdf(m|b_x_m * x, sigma_m ));
p_y_given_m = get_p_y_given_m_rng(m, y, mu_x, sigma_x, b_x_y, b_m_y, sigma_y);
p_y_given_do_x = p_y_given_do_x + p_y_given_m * p_m_given_x;
}
}
"
compiled_model <- stan_model(model_code = model_code)
samples <- sampling(compiled_model, data = c(.data, list(x = 2.0, y = 3.0)))
Where .data is just a tibble that contains values for X, M, and Y. Any help would be greatly appreciated.
Everything is working here. My goal is to understand the complexity of this Stan model. I have 7 parameters and I am using Stan’s HMC algorithm to train these parameters. I have two for loops inside my model that also impact the complexity, but I can’t figure out what is the Stan’s HMC algorithm complexity and overall the complexity of my whole model.
In its default configuration with a diagonal inverse metric, most of the work in Stan is in gradient evaluations, and for each draw you make, there are between 1 and 2^max_treedepth gradient evaluations. Gradient evaluations are basically evaluations of the model block.
So if you have 1000 warmup draws, 1000 post warmup draws, max_treedepth = 10 (the default max), then you’ll do between 2000 and 2000 * 2^10 gradient evaluations. There’s a lot of variation in there how long something would run, and in some models you would want more than 2000 total draws or a treedepth higher than 10.
So the complexity of the Stan MCMC model is: num_iterations * 2 ^ max_treedepth + nun_params^3? And I have to calculate the complexity of model block separately. Is that correct?
Well you need to evaluate the model block between num_iterations and num_iterations * 2^max_treedepth times with default settings.
If you’re using a dense inverse metric then for each of those iterations there is an O(P^3) thing (where P is the number of parameters), but really we should replace it with an O(P^2) thing, but if you’re using the defaults the operation is an O(P) thing.
If you’re using the defaults and your model block complexity is at least O(P) (dunno how it wouldn’t be), then just ignore this.
What is correct presumably depends on your audience. I would separate them myself if I knew nothing else.
Based on what you said can I conclude that if I am not using a dense inverse metric (using Stan’s default settings) then for each of those iterations there is’nt an O(P^3) thing ?(where P is the number of parameters)
Does this mean that the model complexity does not depend on the number of parameters? No matter if we have 1 parameter to learn or 20 parameters to learn, the complexity would be the same?
It means that doing leapfrog steps as part of HMC require some O(P) math. I’m assuming the gradient evaluations (defined by your model) are at least O(P), which would make them more important.
O(P) → O(P) my bad looks like some automatic Discourse formatting.