Improve code efficiency for Bayesian calibration

Dear all,

I am currently trying to run a simulation calibration model in Stan. The model runs fine but its very slow. Therefore, I wanted to seek some advice here as to how the code can be made to run more efficiently. I have attached the R and Stan codes. The 2 data files can be found here


Any help would be very much appreciated.

Thank you in advance.

Regards,
Adrian

main.R (1.9 KB)
bayesCalib.stan (4.4 KB)

For everyone else’s convenience, I’ll paste the code here:

data {
  int<lower=1> n; // number of field observations
  int<lower=1> m; // number of computer simulation
  int<lower=1> n_pred; // number of predictions
  int<lower=1> p; // number of input factors x
  int<lower=1> q; // number of calibration parameters t
  matrix[n, p] xf; // corresponding input factors for field observations
  // xc and tc are design points corresponding to eta
  matrix[m, p] xc; // corresponding input factors for computer simulations
  matrix[m, q] tc; // corresponding calibration parameters for computer simulations
  // x_pred: new design points to make predictions
  matrix[n_pred, p] x_pred; // corresponding input factors for predictions
  vector[n] y; // observations
  vector[m] eta; // simulation output
}

transformed data {
  int<lower = 1> N;
  vector[n+m] y_eta;
  vector[n+m+n_pred] mu; // mean vector
  matrix[n+n_pred, p] X; // X=[xf, x_pred]
  
  N = n + m + n_pred;
  // set mean vector to zero
  for (i in 1:N) {
    mu[i] = 0;
  }
  X = append_row(xf, x_pred);
  y_eta = append_row(y, eta);
}

parameters {
  // tf: calibration parameters
  // rho_eta: reparameterization of beta_eta
  // rho_delta: reparameterization of beta_delta
  // lambda_eta: precision parameter for eta
  // lambda_delta: precision parameter for bias term
  // lambda_e: precision parameter of observation error
  // y_pred: predictions
  row_vector<lower=0, upper=1>[q] tf; //calibration parameters
  row_vector<lower=0, upper=1>[p + q] rho_eta; // correlation parameter for eta
  row_vector<lower=0, upper=1>[p] rho_delta; // correlation parameter for bias term
  real<lower=0> lambda_eta; // precision parameter for eta
  real<lower=0> lambda_delta; // precision parameter for bias term
  real<lower=0> lambda_e; // precision parameter for observation error
  vector<lower=-3, upper=3>[n_pred] y_pred; // predictions
}

transformed parameters {
  row_vector[p+q] beta_eta;
  row_vector[p] beta_delta;
  // reparameterization so that beta_eta and beta_delta is between 0 and 1
  beta_eta = -4.0 * log(rho_eta);
  beta_delta = -4.0 * log(rho_delta);
}

model {
  // declare variables
  matrix[N, p+q] xt; // xt = [[xf,tf],[xc,tc],[x_pred,tf]]
  matrix[N, N] sigma_eta;
  matrix[n+n_pred, n+n_pred] sigma_delta;
  matrix[n, n] sigma_y;
  matrix[N, N] sigma_z; // covariance matrix
  matrix[N, N] L; // cholesky decomposition of covariance matrix 
  vector[N] z; // z = [y, eta, y_pred]
  row_vector[p] temp_delta;
  row_vector[p+q] temp_eta;

  xt[1:n, 1:p] = xf;
  xt[1:n, (p+1):(p+q)] = rep_matrix(tf, n);
  xt[(n+1):(n+m), 1:p] = xc;
  xt[(n+1):(n+m), (p+1):(p+q)] = tc;
  xt[(n+m+1):N, 1:p] = x_pred;
  xt[(n+m+1):N, (p+1):(p+q)] = rep_matrix(tf, n_pred);
  
  // diagonal elements of sigma_eta
  sigma_eta = diag_matrix(rep_vector((1 / lambda_eta), N));

  // off-diagonal elements of sigma_eta
  for (i in 1:(N-1)) {
    for (j in (i+1):N) {
      temp_eta = xt[i, 1:(p + q)] - xt[j, 1:(p + q)];
      sigma_eta[i, j] = beta_eta .* temp_eta * temp_eta';
      sigma_eta[i, j] = exp(-sigma_eta[i, j]) / lambda_eta;
      sigma_eta[j, i] = sigma_eta[i, j];
    }
  }

  // diagonal elements of sigma_delta
  sigma_delta = diag_matrix(rep_vector((1 / lambda_delta), n+n_pred));
  // off-diagonal elements of sigma_delta
  for (i in 1:(n+n_pred-1)) {
    for (j in (i+1):(n+n_pred)) {
      temp_delta = X[i, 1:p] - X[j, 1:p];
      sigma_delta[i, j] = beta_delta .* temp_delta * temp_delta';
      sigma_delta[i, j] = exp(-sigma_delta[i, j]) / lambda_delta;
      sigma_delta[j, i] = sigma_delta[i, j];
    }   
  }

  // observation errors sigma_y
  sigma_y = diag_matrix(rep_vector((1 / lambda_e), n));

  // computation of covariance matrix sigma_z 
  sigma_z = sigma_eta;
  sigma_z[1:n, 1:n] = sigma_eta[1:n, 1:n] + sigma_delta[1:n, 1:n] + sigma_y;
  sigma_z[1:n, (n+m+1):N] = sigma_eta[1:n, (n + m + 1):N] + 
    sigma_delta[1:n, (n+1):(n+n_pred)];
  sigma_z[(n+m+1):N, 1:n] = sigma_eta[(n+m+1):N, 1:n] + 
    sigma_delta[(n+1):(n+n_pred),1:n];
  sigma_z[(n+m+1):N, (n+m+1):N] = sigma_eta[(n+m+1):N, (n+m+1):N] + 
    sigma_delta[(n+1):(n+n_pred), (n+1):(n+n_pred)];

  // Specify Priors
  for (i in 1:(p+q)){
    rho_eta[i] ~ beta(1.0, 0.5);
  }
  for (j in 1:p){
    rho_delta[j] ~ beta(1.0, 0.4);
  }
  lambda_eta ~ gamma(10, 10); // gamma (shape, rate)
  lambda_delta ~ gamma(10, 0.3); // gamma (shape, rate)
  lambda_e ~ gamma(10, 0.03); // gamma (shape, rate)

  z = append_row(y_eta, y_pred); // z = [y, eta, y_pred]
  L = cholesky_decompose(sigma_z); // cholesky decomposition 
  z ~ multi_normal_cholesky(mu, L);

}

Yikes! I’m not even sure where to start. To getin, you don’t need sigma[1:n, 1:n] if sigma is n x n. That’s just sigma and that will cut down on indirection.

Similarly, temp_delta = X[i, 1:p] - X[j, 1:p]; should probably just be temp_delta = X[i] - X[j]. Given that this is a data-only expression, you should define a temp_delta vector in the transformed data block and reuse it. Everything that can be defined in transformed data should be for efficiency and everything else that can be pushed to generated quantities should be.

Why do your correlation parameters have lower = 0 rather than lower = -1? Truncating near boundaries can be problematic.

You almost never want to create a diag_matrix as you do for sigma_y. You’re better off just looping to add to the diagonal of sigma_z.

Things like this

 for (i in 1:(p+q)){
    rho_eta[i] ~ beta(1.0, 0.5);

can be vectorized to just rho_eta[1 : p + q] ~ beta(1, 0.5);

This lambda_e ~ gamma(10, 0.03); hints that your lambda_e values are expected to be very large. It can help immensely to knock predictors, etc., down to unit scale. That way, Stan doesn’t have to do any adaptation.

This comment doesn’t match the code, so I’m not sure what you intended:

  // reparameterization so that beta_eta and beta_delta is between 0 and 1
  beta_eta = -4.0 * log(rho_eta);

since rho_eta is constrained to be between 0 and 1 itself. Usually it’s best to just define parameters on the sclae you want to use them on.

Arithmetic is left associative, so

      sigma_eta[i, j] = beta_eta .* temp_eta * temp_eta';
``

does the elementwise multiply before the multiply-self-tranpose (for which there's a special more efficient function built in, called `multiply_self_transpose`).

This kind of thing with interval constraints:

vector<lower=-3, upper=3>[n_pred] y_pred; // predictions

is almost never a good idea unless there are physical constraints forcing `y_pred` into that range (and I don't mean that you think that's where it'll be because of the data).

Thank you Bob for going through my code and providing your inputs.

Cutting down on the indirections and vectorizing the priors did indeed help shave some time cost from running my stan model

temp_delta = X[i] - X[j];

and

rho_eta[1 : p + q] ~ beta(1, 0.5);

looping to add to the diagonal of sigma_z also improved the code runtime like this

for (i in 1:n){
  sigma_z[i, i] =  sigma_z[i, i] + (1 / lambda_e);
}

This makes total sense and you are right that I should not make assumptions based on the data. I have since removed these constraints.

Some clarifications with regards to other points. Please forgive me if they are trivial or obvious.

I can’t seem to find this function multiply_self_transpose in the Stan reference documentation (Version 2.17). I did find a multiply_lower_tri_self_transpose. Is this what you are referring to? If it is, how could it be used to replace

  sigma_eta[i, j] = beta_eta .* temp_eta * temp_eta';

Going by this, does this mean that I should set upper = 2 instead of upper = 2 since the correlation parameters have a range of [0,1]. Just out of curiosity, why would it be problematic to set the bounds near the physical bounds? I thought the intention of lower and upper was to set the constraints that represent the actual constraints of these parameters. Therefore if the correlation parameters have a range of [0,1], wouldn’t it be more intuitive to set <lower=0, upper=1> rather than <lower=-1, upper=2>.

Thank you again for reading through all that code. Really appreciate it.

That’s because I apparently fabricated it out of whole cloth. I thought we had implemented it. So no way to make that more efficient.

I couldn’t quit efollow that. Standardly defined correlations range between -1 and 1, so I’d suggest <lower = -1, upper = 1>.