Improve code efficiency for Bayesian calibration

adrian · November 16, 2017, 4:22am

Dear all,

I am currently trying to run a simulation calibration model in Stan. The model runs fine but its very slow. Therefore, I wanted to seek some advice here as to how the code can be made to run more efficiently. I have attached the R and Stan codes. The 2 data files can be found here

Any help would be very much appreciated.

Thank you in advance.

Regards,
Adrian

main.R (1.9 KB)
bayesCalib.stan (4.4 KB)

Bob_Carpenter · November 27, 2017, 9:46pm

For everyone else’s convenience, I’ll paste the code here:

data {
  int<lower=1> n; // number of field observations
  int<lower=1> m; // number of computer simulation
  int<lower=1> n_pred; // number of predictions
  int<lower=1> p; // number of input factors x
  int<lower=1> q; // number of calibration parameters t
  matrix[n, p] xf; // corresponding input factors for field observations
  // xc and tc are design points corresponding to eta
  matrix[m, p] xc; // corresponding input factors for computer simulations
  matrix[m, q] tc; // corresponding calibration parameters for computer simulations
  // x_pred: new design points to make predictions
  matrix[n_pred, p] x_pred; // corresponding input factors for predictions
  vector[n] y; // observations
  vector[m] eta; // simulation output
}

transformed data {
  int<lower = 1> N;
  vector[n+m] y_eta;
  vector[n+m+n_pred] mu; // mean vector
  matrix[n+n_pred, p] X; // X=[xf, x_pred]
  
  N = n + m + n_pred;
  // set mean vector to zero
  for (i in 1:N) {
    mu[i] = 0;
  }
  X = append_row(xf, x_pred);
  y_eta = append_row(y, eta);
}

parameters {
  // tf: calibration parameters
  // rho_eta: reparameterization of beta_eta
  // rho_delta: reparameterization of beta_delta
  // lambda_eta: precision parameter for eta
  // lambda_delta: precision parameter for bias term
  // lambda_e: precision parameter of observation error
  // y_pred: predictions
  row_vector<lower=0, upper=1>[q] tf; //calibration parameters
  row_vector<lower=0, upper=1>[p + q] rho_eta; // correlation parameter for eta
  row_vector<lower=0, upper=1>[p] rho_delta; // correlation parameter for bias term
  real<lower=0> lambda_eta; // precision parameter for eta
  real<lower=0> lambda_delta; // precision parameter for bias term
  real<lower=0> lambda_e; // precision parameter for observation error
  vector<lower=-3, upper=3>[n_pred] y_pred; // predictions
}

transformed parameters {
  row_vector[p+q] beta_eta;
  row_vector[p] beta_delta;
  // reparameterization so that beta_eta and beta_delta is between 0 and 1
  beta_eta = -4.0 * log(rho_eta);
  beta_delta = -4.0 * log(rho_delta);
}

model {
  // declare variables
  matrix[N, p+q] xt; // xt = [[xf,tf],[xc,tc],[x_pred,tf]]
  matrix[N, N] sigma_eta;
  matrix[n+n_pred, n+n_pred] sigma_delta;
  matrix[n, n] sigma_y;
  matrix[N, N] sigma_z; // covariance matrix
  matrix[N, N] L; // cholesky decomposition of covariance matrix 
  vector[N] z; // z = [y, eta, y_pred]
  row_vector[p] temp_delta;
  row_vector[p+q] temp_eta;

  xt[1:n, 1:p] = xf;
  xt[1:n, (p+1):(p+q)] = rep_matrix(tf, n);
  xt[(n+1):(n+m), 1:p] = xc;
  xt[(n+1):(n+m), (p+1):(p+q)] = tc;
  xt[(n+m+1):N, 1:p] = x_pred;
  xt[(n+m+1):N, (p+1):(p+q)] = rep_matrix(tf, n_pred);
  
  // diagonal elements of sigma_eta
  sigma_eta = diag_matrix(rep_vector((1 / lambda_eta), N));

  // off-diagonal elements of sigma_eta
  for (i in 1:(N-1)) {
    for (j in (i+1):N) {
      temp_eta = xt[i, 1:(p + q)] - xt[j, 1:(p + q)];
      sigma_eta[i, j] = beta_eta .* temp_eta * temp_eta';
      sigma_eta[i, j] = exp(-sigma_eta[i, j]) / lambda_eta;
      sigma_eta[j, i] = sigma_eta[i, j];
    }
  }

  // diagonal elements of sigma_delta
  sigma_delta = diag_matrix(rep_vector((1 / lambda_delta), n+n_pred));
  // off-diagonal elements of sigma_delta
  for (i in 1:(n+n_pred-1)) {
    for (j in (i+1):(n+n_pred)) {
      temp_delta = X[i, 1:p] - X[j, 1:p];
      sigma_delta[i, j] = beta_delta .* temp_delta * temp_delta';
      sigma_delta[i, j] = exp(-sigma_delta[i, j]) / lambda_delta;
      sigma_delta[j, i] = sigma_delta[i, j];
    }   
  }

  // observation errors sigma_y
  sigma_y = diag_matrix(rep_vector((1 / lambda_e), n));

  // computation of covariance matrix sigma_z 
  sigma_z = sigma_eta;
  sigma_z[1:n, 1:n] = sigma_eta[1:n, 1:n] + sigma_delta[1:n, 1:n] + sigma_y;
  sigma_z[1:n, (n+m+1):N] = sigma_eta[1:n, (n + m + 1):N] + 
    sigma_delta[1:n, (n+1):(n+n_pred)];
  sigma_z[(n+m+1):N, 1:n] = sigma_eta[(n+m+1):N, 1:n] + 
    sigma_delta[(n+1):(n+n_pred),1:n];
  sigma_z[(n+m+1):N, (n+m+1):N] = sigma_eta[(n+m+1):N, (n+m+1):N] + 
    sigma_delta[(n+1):(n+n_pred), (n+1):(n+n_pred)];

  // Specify Priors
  for (i in 1:(p+q)){
    rho_eta[i] ~ beta(1.0, 0.5);
  }
  for (j in 1:p){
    rho_delta[j] ~ beta(1.0, 0.4);
  }
  lambda_eta ~ gamma(10, 10); // gamma (shape, rate)
  lambda_delta ~ gamma(10, 0.3); // gamma (shape, rate)
  lambda_e ~ gamma(10, 0.03); // gamma (shape, rate)

  z = append_row(y_eta, y_pred); // z = [y, eta, y_pred]
  L = cholesky_decompose(sigma_z); // cholesky decomposition 
  z ~ multi_normal_cholesky(mu, L);

}

Yikes! I’m not even sure where to start. To getin, you don’t need sigma[1:n, 1:n] if sigma is n x n. That’s just sigma and that will cut down on indirection.

Similarly, temp_delta = X[i, 1:p] - X[j, 1:p]; should probably just be temp_delta = X[i] - X[j]. Given that this is a data-only expression, you should define a temp_delta vector in the transformed data block and reuse it. Everything that can be defined in transformed data should be for efficiency and everything else that can be pushed to generated quantities should be.

Why do your correlation parameters have lower = 0 rather than lower = -1? Truncating near boundaries can be problematic.

You almost never want to create a diag_matrix as you do for sigma_y. You’re better off just looping to add to the diagonal of sigma_z.

Things like this

 for (i in 1:(p+q)){
    rho_eta[i] ~ beta(1.0, 0.5);

can be vectorized to just rho_eta[1 : p + q] ~ beta(1, 0.5);

This lambda_e ~ gamma(10, 0.03); hints that your lambda_e values are expected to be very large. It can help immensely to knock predictors, etc., down to unit scale. That way, Stan doesn’t have to do any adaptation.

This comment doesn’t match the code, so I’m not sure what you intended:

  // reparameterization so that beta_eta and beta_delta is between 0 and 1
  beta_eta = -4.0 * log(rho_eta);

since rho_eta is constrained to be between 0 and 1 itself. Usually it’s best to just define parameters on the sclae you want to use them on.

Arithmetic is left associative, so

      sigma_eta[i, j] = beta_eta .* temp_eta * temp_eta';
``

does the elementwise multiply before the multiply-self-tranpose (for which there's a special more efficient function built in, called `multiply_self_transpose`).

This kind of thing with interval constraints:

vector<lower=-3, upper=3>[n_pred] y_pred; // predictions

is almost never a good idea unless there are physical constraints forcing `y_pred` into that range (and I don't mean that you think that's where it'll be because of the data).

adrian · November 30, 2017, 3:34am

Thank you Bob for going through my code and providing your inputs.

Cutting down on the indirections and vectorizing the priors did indeed help shave some time cost from running my stan model

temp_delta = X[i] - X[j];

and

rho_eta[1 : p + q] ~ beta(1, 0.5);

looping to add to the diagonal of sigma_z also improved the code runtime like this

for (i in 1:n){
  sigma_z[i, i] =  sigma_z[i, i] + (1 / lambda_e);
}

This makes total sense and you are right that I should not make assumptions based on the data. I have since removed these constraints.

Some clarifications with regards to other points. Please forgive me if they are trivial or obvious.

Bob_Carpenter:

Arithmetic is left associative, so
  sigma_eta[i, j] = beta_eta .* temp_eta * temp_eta';
does the elementwise multiply before the multiply-self-tranpose (for which there’s a special more efficient function built in, called multiply_self_transpose).

I can’t seem to find this function multiply_self_transpose in the Stan reference documentation (Version 2.17). I did find a multiply_lower_tri_self_transpose. Is this what you are referring to? If it is, how could it be used to replace

  sigma_eta[i, j] = beta_eta .* temp_eta * temp_eta';

Going by this, does this mean that I should set upper = 2 instead of upper = 2 since the correlation parameters have a range of [0,1]. Just out of curiosity, why would it be problematic to set the bounds near the physical bounds? I thought the intention of lower and upper was to set the constraints that represent the actual constraints of these parameters. Therefore if the correlation parameters have a range of [0,1], wouldn’t it be more intuitive to set <lower=0, upper=1> rather than <lower=-1, upper=2>.

Thank you again for reading through all that code. Really appreciate it.

Bob_Carpenter · December 1, 2017, 11:27pm

That’s because I apparently fabricated it out of whole cloth. I thought we had implemented it. So no way to make that more efficient.

I couldn’t quit efollow that. Standardly defined correlations range between -1 and 1, so I’d suggest <lower = -1, upper = 1>.

Topic		Replies	Views
Extending Rstan's Simulation Based Calibration Algorithms simulation-based-calibration	4	653	July 25, 2024
Speeding up multivariate normal model Modeling techniques	7	3164	September 9, 2017
Large data sets with stan code Modeling	1	682	September 18, 2018
How to speed up this code Modeling	1	496	November 10, 2018
Slow Stan Code Modeling fitting-issues , performance	33	3038	June 27, 2018

Improve code efficiency for Bayesian calibration

Related topics