Hello, I’m working on a Bayesian calibration problem of a computer experiment. To do so, I’m using a Gaussian Process as an emulator to a computer model and I’m training and calibrating the emulator at the same time as per Higdon et al. (2004). I’m running the calibration on a High Performance Computing (HPC) facility as a shared memory threaded parallel job. I run 4 chains of 1000 iterations. For training points (N) less than 600, the model works fine with no issues (see image attached of Comp. Time (mins) vs N) as judged by R and nEff > 10%. When I tried to further increase N by 140 points (total of 728), and with an allocated RAM per core of 16 GB, it took 22 hours for one chain to complete and none of the other chains finished their 1000 iterations – two of them were stuck at 250 iterations. I don’t understand what is happening, so any help would be greatly appreciated! Specifically:
- As training points (N) increase, I expect an increase in computational time of N^3 and memory requirement of N^2. I assumed that 16GB of RAM would suffice - could I be running out of RAM or is there something else that’s wrong? Does memory grow with iterations as well as N?
- A small constant was not added to the diagonal of the covariance matrix as it’s often recommended since I’ve based my code from a publication by Chong et al. (2018) that didn’t include this component. I’m more than happy to add it but do you think that this could be the reason I’m facing these issues?
- Do you have any suggestions on how to speed up the calibration? This would be regardless of the problem I’ve described above. I have seen a couple of suggestions on some different topics although I don’t know whether they would apply on this specific problem.
data {
int<lower=0> n; // number of field data
int<lower=0> m; // number of computer simulation
int<lower=0> p; // number of observable inputs x
int<lower=0> q; // number of calibration parameters t
vector[n] y; // field observations
vector[m] eta; // output of computer simulations
matrix[n, p] xf; // observable inputs corresponding to y
// (xc, tc): design points corresponding to eta
matrix[m, p] xc;
matrix[m, q] tc;
}
// Need to combine y (observations) and eta (simulations)
// into a single vector to establish statistical relation
// Chong & Menberg (2018), Hidgon et al. (2004)
transformed data {
int<lower=1> N;
vector[n+m] z; // z = [y, eta]
vector[n+m] mu; // mean vector
N = n + m;
// set mean vector to zero
for (i in 1:N) {
mu[i] = 0;
}
z = append_row(y, eta);
}
parameters {
// tf: calibration parameters
// rho_eta: reparameterization of beta_eta (the correlation parameters of the simulator)
// rho_delta: reparameterization of beta_delta (the correlation parameter of the model discrepancy)
// lambda_eta: precision parameter for eta
// lambda_delta: precision parameter for bias term
// lambda_e: precision parameter of observation error
row_vector<lower=0,upper=1>[q] tf;
row_vector<lower=0,upper=1>[p+q] rho_eta;
row_vector<lower=0,upper=1>[p] rho_delta;
real<lower=0> lambda_eta;
real<lower=0> lambda_delta;
real<lower=0> lambda_e;
}
transformed parameters {
// beta_delta: correlation parameter for bias term
// beta_eta: correlation parameter of observation error
row_vector[p+q] beta_eta;
row_vector[p] beta_delta;
beta_eta = -4.0 * log(rho_eta);
beta_delta = -4.0 * log(rho_delta);
}
// Create GP model based on the definitions from above
model {
// declare variables
matrix[N, (p+q)] xt;
matrix[N, N] sigma_eta; // simulator covariance
matrix[n, n] sigma_delta; // bias term covariance
matrix[N, N] sigma_z; // covariance matrix
matrix[N, N] L; // cholesky decomposition of covariance matrix
row_vector[p] temp_delta;
row_vector[p+q] temp_eta;
// field observation (xf) and calibration variables (tf)
// are placed together in a matrix with the
// computer observation (xc) and calibration variables (xc)
// xt = [[xf,tf],[xc,tc]]
xt[1:n, 1:p] = xf; // field observations
xt[(n+1):N, 1:p] = xc; // computer observations (assume to be the same as xf)
xt[1:n, (p+1):(p+q)] = rep_matrix(tf, n);
xt[(n+1):N, (p+1):(p+q)] = tc; // computer calibration variables
// diagonal elements of sigma_eta
sigma_eta = diag_matrix(rep_vector((1 / lambda_eta), N));
// off-diagonal elements of sigma_eta
// for the squared covariance (alpha = 2)
// xt[i] is row i and xt[j] is row j
for (i in 1:(N-1)) {
for (j in (i+1):N) {
temp_eta = xt[i] - xt[j]; # Subtract row i from row j
sigma_eta[i, j] = beta_eta .* temp_eta * temp_eta'; #
sigma_eta[i, j] = exp(-sigma_eta[i, j]) / lambda_eta;
sigma_eta[j, i] = sigma_eta[i, j];
}
}
// diagonal elements of sigma_delta
sigma_delta = diag_matrix(rep_vector((1 / lambda_delta), n));
// off-diagonal elements of sigma_delta
for (i in 1:(n-1)) {
for (j in (i+1):n) {
temp_delta = xf[i] - xf[j];
sigma_delta[i, j] = beta_delta .* temp_delta * temp_delta';
sigma_delta[i, j] = exp(-sigma_delta[i, j]) / lambda_delta;
sigma_delta[j, i] = sigma_delta[i, j];
}
}
// computation of covariance matrix sigma_z
sigma_z = sigma_eta;
sigma_z[1:n, 1:n] = sigma_eta[1:n, 1:n] + sigma_delta;
// add observation errors
for (i in 1:n) {
sigma_z[i, i] = sigma_z[i, i] + (1.0 / lambda_e);
}
//print(sigma_z)
// Specify hyperparameters here - based on Chong et al.(2018)
rho_eta[1:(p+q)] ~ beta(1.0, 0.3);
rho_delta[1:p] ~ beta(1.0, 0.3);
lambda_eta ~ gamma(10, 10); // gamma (shape, rate)
lambda_delta ~ gamma(10, 0.3);
lambda_e ~ gamma(10, 0.03);
// Specify priors here - these have a physical meaning in the computer software being calibrated
tf[1] ~ weibull(2.724,0.417);
tf[2] ~ uniform(1.737e-05,1);
tf[3] ~ normal(0.472,0.140);
tf[4] ~ gamma(6.90965275109803,19.158);
tf[5] ~ lognormal(-1.983,0.640);
L = cholesky_decompose(sigma_z); // cholesky decomposition
z ~ multi_normal_cholesky(mu, L);
}
Edit
One of my simulations (with 750 iters) just finished and I noticed that one of the chains did not mix at all as can be seen below:
From the pairs plot attached, I’m thinking that there could be something funny going one with beta_eta1 and beta_eta6 although I might be wrong!
Any insights on how to best tackle this? Is this related to the model definition and specifically the priors used?