Hi!
I’m fairly new to Stan. I have a large Supply-Use Table balancing problem with ~700k parameters. For small data, Stan works fine, but when I scale to 5–10% of real data, sampling becomes extremely slow (8+ hours for 500 warmup) and the chains appear stuck. I tried reduce_sum
, raising adapt_delta
, etc. Code below. Suggestions on reparameterizations, tuning, or alternatives are welcome!
Some background: I want to balance an economic Supply-Use Table (SUT). SUTs consist of activities (e.g. pizza production) and products (e.g. pizza, cheese, …). Each activity has one (or several) output product (e.g. pizza) and several inputs of products (e.g. cheese, tomatoes, dough). All flows of products from/to activities are recorded in tonnes.
The goal of my model is to make sure that:
- The sum of all inputs into an activity equals the sum of all outputs (activity balance)
- The sum of all products supplied equals the sum of all products used (product balance)
- (… other constraints that don’t matter for now)
That’s my Stan code:
functions {
// penalize relative difference in supply vs. use
real apply_rel_diff_constraint(vector v1, vector v2, real epsilon, real rel_tol) {
int N = num_elements(v1);
vector[N] mean_val = (v1 + v2) / 2 + epsilon;
vector[N] rel_diff = (v1 - v2) ./ mean_val;
return normal_lpdf(rel_diff | 0, rel_tol);
}
}
data {
int<lower=1> N_sup;
int<lower=1> N_use;
int<lower=1> N_pr; // Number of unique product-region combinations
int<lower=1> N_ar; // Number of unique activity-region combinations
// "Observed" means and stdevs on log-scale
vector[N_sup] mu_log_sup;
vector<lower=0>[N_sup] sigma_log_sup;
vector[N_use] mu_log_use;
vector<lower=0>[N_use] sigma_log_use;
// Sparse matrix data for use
vector[N_use] w_pr_use;
array[N_use] int v_pr_use;
array[N_pr + 1] int u_pr_use;
vector[N_use] w_ar_use;
array[N_use] int v_ar_use;
array[N_ar + 1] int u_ar_use;
// Sparse matrix data for supply
vector[N_sup] w_pr_sup;
array[N_sup] int v_pr_sup;
array[N_pr + 1] int u_pr_sup;
vector[N_sup] w_ar_sup;
array[N_sup] int v_ar_sup;
array[N_ar + 1] int u_ar_sup;
// Tolerances and other constants
real<lower=0> rel_tol_product;
real<lower=0> rel_tol_activity;
int<lower=0> grainsize;
real<lower=0> epsilon;
// Indices of constraints (as they do not apply to all products/activities)
int<lower=1> N_p_constr;
array[N_p_constr] int<lower=1, upper=N_pr> idx_p_constr;
int<lower=1> N_a_constr;
array[N_a_constr] int<lower=1, upper=N_ar> idx_a_constr;
}
parameters {
// parameterize supply & use in log-space
vector[N_sup] x_log_sup;
vector[N_use] x_log_use;
}
transformed parameters {
// Exponentiate to get supply/use on the original absolute scale
vector<lower=0>[N_sup] x_sup = exp(x_log_sup);
vector<lower=0>[N_use] x_use = exp(x_log_use);
// Compute total use by product-region
vector[N_pr] total_use_pr;
total_use_pr = csr_matrix_times_vector(N_pr, N_use, w_pr_use, v_pr_use, u_pr_use, x_use);
// Comput total supply by product-region
vector[N_pr] total_sup_pr;
total_sup_pr = csr_matrix_times_vector(N_pr, N_sup, w_pr_sup, v_pr_sup, u_pr_sup, x_sup);
// Compute total input by activity-region
vector[N_ar] total_input_ar;
total_input_ar = csr_matrix_times_vector(N_ar, N_use, w_ar_use, v_ar_use, u_ar_use, x_use);
// Compute total output by activity-region
vector[N_ar] total_output_ar;
total_output_ar = csr_matrix_times_vector(N_ar, N_sup, w_ar_sup, v_ar_sup, u_ar_sup, x_sup);
}
model {
// 1) Supply log-likelihood
target += normal_lpdf(x_log_sup | mu_log_sup,sigma_log_sup);
// 2) Use log-likelihood
target += normal_lpdf(x_log_use | mu_log_use, sigma_log_use);
// 3) Product-balance constraints (on the absolute scale of x_sup, x_use)
vector[N_p_constr] sup_sub = total_sup_pr[idx_p_constr];
vector[N_p_constr] use_sub = total_use_pr[idx_p_constr];
target += apply_rel_diff_constraint(sup_sub, use_sub, epsilon, rel_tol_product);
// 4) activity-balance constraints
vector[N_a_constr] output_sub = total_output_ar[idx_a_constr];
vector[N_a_constr] input_sub = total_input_ar[idx_a_constr];
target += apply_rel_diff_constraint(output_sub, input_sub, epsilon, rel_tol_activity);
}
In my case, products and activities have a location (i.e. “pizza production” in Italy uses “cheese” from Switzerland). In the code, product-region pairs are noted as *_pr
and activity-region pairs as *_ar
. I model the two constraints as “soft constraints”, with a relative tolerance.
My full data set currently has the following dimensions: N_sup = 5E4
, N_use = 7E5
, and is supposed to be larger in the end.
I run the model with cmdstanr
on a cluster with 4 cores. For the most recent runs I set the following configurations:
max_treedepth = 15
(as I got warnings about max_threshold being reached)adapt_delta = 0.95
iter_warmup = 1000
iter_sampling = 500
rel_tol_product = 0.01
,rel_tol_activity = 0.1
(as the activities show a much higher prior imbalance)- I set the initial values close to the priors/observations (with some random noise).
The good thing: The model does what it is supposed to do for small toy data (around 1% of the full data). However, when I run the model with a larger data (i.e. more “parameters” in my case), say 5-10% of the full data, I notice the following issues:
- chains get stuck (they more or less stay at their initial state)
- sampling takes very long (8h for 500 warm-up samples). Most of the time is consumed for the “use log-likelihood”, followed by “compute total use/inputs …”.
My questions:
- Does anyone have an idea how to reformulate/reparameterize the model to increase sampling speed and/or make a more “favourable geometry”? (I tried threading for the “use log-likelihood” using
reduce_sum
but did not observe any gains in speed) - What are the most promising configuration settings I can try out to “tune” the sampling algorithm? (further increase warmup/adapt_delta/max_treedepth?, any other?)
- Or, is my problem just “too big” for Stan/MCMC? If so, do you have any suggestions for alternatives to try out, e.g. pathfinder, ADVI, etc. (I haven’t yet looked into that all).
Please let me know if you need more information on background, diagnostics, etc.! Thanks for your help!