# Improve code efficiency for Bayesian calibration

Dear all,

I am currently trying to run a simulation calibration model in Stan. The model runs fine but its very slow. Therefore, I wanted to seek some advice here as to how the code can be made to run more efficiently. I have attached the R and Stan codes. The 2 data files can be found here

Any help would be very much appreciated.

Regards,

main.R (1.9 KB)
bayesCalib.stan (4.4 KB)

For everyone else’s convenience, I’ll paste the code here:

``````data {
int<lower=1> n; // number of field observations
int<lower=1> m; // number of computer simulation
int<lower=1> n_pred; // number of predictions
int<lower=1> p; // number of input factors x
int<lower=1> q; // number of calibration parameters t
matrix[n, p] xf; // corresponding input factors for field observations
// xc and tc are design points corresponding to eta
matrix[m, p] xc; // corresponding input factors for computer simulations
matrix[m, q] tc; // corresponding calibration parameters for computer simulations
// x_pred: new design points to make predictions
matrix[n_pred, p] x_pred; // corresponding input factors for predictions
vector[n] y; // observations
vector[m] eta; // simulation output
}

transformed data {
int<lower = 1> N;
vector[n+m] y_eta;
vector[n+m+n_pred] mu; // mean vector
matrix[n+n_pred, p] X; // X=[xf, x_pred]

N = n + m + n_pred;
// set mean vector to zero
for (i in 1:N) {
mu[i] = 0;
}
X = append_row(xf, x_pred);
y_eta = append_row(y, eta);
}

parameters {
// tf: calibration parameters
// rho_eta: reparameterization of beta_eta
// rho_delta: reparameterization of beta_delta
// lambda_eta: precision parameter for eta
// lambda_delta: precision parameter for bias term
// lambda_e: precision parameter of observation error
// y_pred: predictions
row_vector<lower=0, upper=1>[q] tf; //calibration parameters
row_vector<lower=0, upper=1>[p + q] rho_eta; // correlation parameter for eta
row_vector<lower=0, upper=1>[p] rho_delta; // correlation parameter for bias term
real<lower=0> lambda_eta; // precision parameter for eta
real<lower=0> lambda_delta; // precision parameter for bias term
real<lower=0> lambda_e; // precision parameter for observation error
vector<lower=-3, upper=3>[n_pred] y_pred; // predictions
}

transformed parameters {
row_vector[p+q] beta_eta;
row_vector[p] beta_delta;
// reparameterization so that beta_eta and beta_delta is between 0 and 1
beta_eta = -4.0 * log(rho_eta);
beta_delta = -4.0 * log(rho_delta);
}

model {
// declare variables
matrix[N, p+q] xt; // xt = [[xf,tf],[xc,tc],[x_pred,tf]]
matrix[N, N] sigma_eta;
matrix[n+n_pred, n+n_pred] sigma_delta;
matrix[n, n] sigma_y;
matrix[N, N] sigma_z; // covariance matrix
matrix[N, N] L; // cholesky decomposition of covariance matrix
vector[N] z; // z = [y, eta, y_pred]
row_vector[p] temp_delta;
row_vector[p+q] temp_eta;

xt[1:n, 1:p] = xf;
xt[1:n, (p+1):(p+q)] = rep_matrix(tf, n);
xt[(n+1):(n+m), 1:p] = xc;
xt[(n+1):(n+m), (p+1):(p+q)] = tc;
xt[(n+m+1):N, 1:p] = x_pred;
xt[(n+m+1):N, (p+1):(p+q)] = rep_matrix(tf, n_pred);

// diagonal elements of sigma_eta
sigma_eta = diag_matrix(rep_vector((1 / lambda_eta), N));

// off-diagonal elements of sigma_eta
for (i in 1:(N-1)) {
for (j in (i+1):N) {
temp_eta = xt[i, 1:(p + q)] - xt[j, 1:(p + q)];
sigma_eta[i, j] = beta_eta .* temp_eta * temp_eta';
sigma_eta[i, j] = exp(-sigma_eta[i, j]) / lambda_eta;
sigma_eta[j, i] = sigma_eta[i, j];
}
}

// diagonal elements of sigma_delta
sigma_delta = diag_matrix(rep_vector((1 / lambda_delta), n+n_pred));
// off-diagonal elements of sigma_delta
for (i in 1:(n+n_pred-1)) {
for (j in (i+1):(n+n_pred)) {
temp_delta = X[i, 1:p] - X[j, 1:p];
sigma_delta[i, j] = beta_delta .* temp_delta * temp_delta';
sigma_delta[i, j] = exp(-sigma_delta[i, j]) / lambda_delta;
sigma_delta[j, i] = sigma_delta[i, j];
}
}

// observation errors sigma_y
sigma_y = diag_matrix(rep_vector((1 / lambda_e), n));

// computation of covariance matrix sigma_z
sigma_z = sigma_eta;
sigma_z[1:n, 1:n] = sigma_eta[1:n, 1:n] + sigma_delta[1:n, 1:n] + sigma_y;
sigma_z[1:n, (n+m+1):N] = sigma_eta[1:n, (n + m + 1):N] +
sigma_delta[1:n, (n+1):(n+n_pred)];
sigma_z[(n+m+1):N, 1:n] = sigma_eta[(n+m+1):N, 1:n] +
sigma_delta[(n+1):(n+n_pred),1:n];
sigma_z[(n+m+1):N, (n+m+1):N] = sigma_eta[(n+m+1):N, (n+m+1):N] +
sigma_delta[(n+1):(n+n_pred), (n+1):(n+n_pred)];

// Specify Priors
for (i in 1:(p+q)){
rho_eta[i] ~ beta(1.0, 0.5);
}
for (j in 1:p){
rho_delta[j] ~ beta(1.0, 0.4);
}
lambda_eta ~ gamma(10, 10); // gamma (shape, rate)
lambda_delta ~ gamma(10, 0.3); // gamma (shape, rate)
lambda_e ~ gamma(10, 0.03); // gamma (shape, rate)

z = append_row(y_eta, y_pred); // z = [y, eta, y_pred]
L = cholesky_decompose(sigma_z); // cholesky decomposition
z ~ multi_normal_cholesky(mu, L);

}
``````

Yikes! I’m not even sure where to start. To getin, you don’t need `sigma[1:n, 1:n]` if `sigma` is `n x n`. That’s just `sigma` and that will cut down on indirection.

Similarly, `temp_delta = X[i, 1:p] - X[j, 1:p];` should probably just be `temp_delta = X[i] - X[j]`. Given that this is a data-only expression, you should define a `temp_delta` vector in the transformed data block and reuse it. Everything that can be defined in transformed data should be for efficiency and everything else that can be pushed to generated quantities should be.

Why do your correlation parameters have `lower = 0` rather than `lower = -1`? Truncating near boundaries can be problematic.

You almost never want to create a `diag_matrix` as you do for `sigma_y`. You’re better off just looping to add to the diagonal of `sigma_z`.

Things like this

`````` for (i in 1:(p+q)){
rho_eta[i] ~ beta(1.0, 0.5);
``````

can be vectorized to just `rho_eta[1 : p + q] ~ beta(1, 0.5);`

This `lambda_e ~ gamma(10, 0.03);` hints that your `lambda_e` values are expected to be very large. It can help immensely to knock predictors, etc., down to unit scale. That way, Stan doesn’t have to do any adaptation.

This comment doesn’t match the code, so I’m not sure what you intended:

``````  // reparameterization so that beta_eta and beta_delta is between 0 and 1
beta_eta = -4.0 * log(rho_eta);
``````

since `rho_eta` is constrained to be between 0 and 1 itself. Usually it’s best to just define parameters on the sclae you want to use them on.

Arithmetic is left associative, so

``````      sigma_eta[i, j] = beta_eta .* temp_eta * temp_eta';
``

does the elementwise multiply before the multiply-self-tranpose (for which there's a special more efficient function built in, called `multiply_self_transpose`).

This kind of thing with interval constraints:

``````

vector<lower=-3, upper=3>[n_pred] y_pred; // predictions

``is almost never a good idea unless there are physical constraints forcing `y_pred` into that range (and I don't mean that you think that's where it'll be because of the data).``

Thank you Bob for going through my code and providing your inputs.

Cutting down on the indirections and vectorizing the priors did indeed help shave some time cost from running my stan model

temp_delta = X[i] - X[j];

and

rho_eta[1 : p + q] ~ beta(1, 0.5);

looping to add to the diagonal of `sigma_z` also improved the code runtime like this

``````for (i in 1:n){
sigma_z[i, i] =  sigma_z[i, i] + (1 / lambda_e);
}
``````

This makes total sense and you are right that I should not make assumptions based on the data. I have since removed these constraints.

Some clarifications with regards to other points. Please forgive me if they are trivial or obvious.

I can’t seem to find this function `multiply_self_transpose` in the Stan reference documentation (Version 2.17). I did find a `multiply_lower_tri_self_transpose`. Is this what you are referring to? If it is, how could it be used to replace

``````  sigma_eta[i, j] = beta_eta .* temp_eta * temp_eta';
``````

Going by this, does this mean that I should set upper = 2 instead of upper = 2 since the correlation parameters have a range of [0,1]. Just out of curiosity, why would it be problematic to set the bounds near the physical bounds? I thought the intention of lower and upper was to set the constraints that represent the actual constraints of these parameters. Therefore if the correlation parameters have a range of [0,1], wouldn’t it be more intuitive to set `<lower=0, upper=1>` rather than `<lower=-1, upper=2>`.

Thank you again for reading through all that code. Really appreciate it.

That’s because I apparently fabricated it out of whole cloth. I thought we had implemented it. So no way to make that more efficient.

I couldn’t quit efollow that. Standardly defined correlations range between -1 and 1, so I’d suggest `<lower = -1, upper = 1>`.