Simulation Based Calibration for Gaussian Process Model

Hi Stanimals,

I have problems with a rather simple Gaussian Process model on synthetic (fake) data, so I decided to use SBC to check whether I have a bug/flaw in my Stan program and if Stan is calibrated for this model (given my selected NUTS parameters and MCMC specs).

I focus here on the 3 parameters \rho_1,\alpha_1 being the length-scale and marginal standard deviation of the squared exponential kernel and \sigma_y being the standard deviation of i.i.d normal measurement noise (see Stan model below).

The model/implementation is inspired by @betanalpha case studies, in particular I put an inverse gamma prior on \rho_1. I use the prior parameters given by Michael and rescale and shift them to interval of choice (in Michael’s case study the inverse-gamma parameters were chosen with the interval [2,10] in mind, here I work with [0.5,0.7], where I want to ensure that my prior has the property that \mathbb{P}[\rho_1 < 0.5] = \mathbb{P}[\rho_1 > 0.7] = 0.01).

I use the SBC code as part of @Bob_Carpenter’s SBC case study.

The results do not look calibrated, and I cannot figure out what the reason is. In fact this might suggest a bias in the data-averaged posterior distribution with respect to the prior?

The corresponding p-values, as defined in Bob’s case study, are:

[1] 0.02875681 0.37790959 0.00000000

and the stats for the thinning:

# table(result$thin)
  4   8  16 
768 221  11 

Since thinning is used, I would not explain the above deviation from uniformity to MCMC induced (temporal) correlations in the posterior sample sequence, right?

Below the correspond Stan and R code:

Stan model (below referred to as gp_sbc_bob.stan):

functions {
  real trunc_normal_rng(real mu, real sigma) {
    real u = uniform_rng(normal_cdf(0, mu, sigma), 1);
    real y = mu + sigma * inv_Phi(u);
    return y;
  }
}
data {
  int<lower=1> N;
  real t[N]; 
  real<lower=0> sigma_y_scale;
  real<lower=0> sigma_y_loc;
  real<lower=0> rho1_lower;
  real<lower=0> rho1_upper;
  real<lower=0> alpha1_loc;
  real<lower=0> alpha1_scale;
}
transformed data {
  int N_SBC_pars = 3;
  real<lower=0> rho1_shape;
  real<lower=0> rho1_scale;
  real rho1_shift;
  real rho1_factor;
  matrix[N,N] K1_;
  matrix[N,N] L1_;
  vector[N] y;
  real<lower=0> alpha1_;
  real<lower=0> rho1_raw_;
  real<lower=0> rho1_;
  real<lower=0> sigma_y_;
  rho1_shape = 8.91924;
  rho1_scale = 34.5805;
  rho1_factor = (rho1_upper - rho1_lower)*0.125;
  rho1_shift = (5*rho1_lower - rho1_upper )*0.25;
  sigma_y_ = trunc_normal_rng(sigma_y_loc, sigma_y_scale);
  rho1_raw_ =inv_gamma_rng(rho1_shape, rho1_scale);
  alpha1_ = trunc_normal_rng(alpha1_loc, alpha1_scale);
  rho1_ = rho1_shift + rho1_factor*rho1_raw_;
  K1_ =  cov_exp_quad(t, alpha1_, rho1_)+diag_matrix(rep_vector(1e-9,N))+diag_matrix(rep_vector(sigma_y_, N));
  L1_ = cholesky_decompose(K1_);
  y = multi_normal_cholesky_rng(rep_vector(0, N), L1_);
}  
parameters {
  real<lower=0> alpha1;
  real<lower=0> rho1_raw;
  real<lower=0> sigma_y;
}
transformed parameters {
  real<lower=0> rho1;
  rho1 = rho1_shift + rho1_factor*rho1_raw;
}
model {
  matrix[N, N] cov =   cov_exp_quad(t, alpha1, rho1)
                     + diag_matrix(rep_vector(square(sigma_y), N));
  matrix[N, N] L_cov = cholesky_decompose(cov);
  sigma_y ~ normal(sigma_y_loc, sigma_y_scale);
  rho1_raw ~ inv_gamma(rho1_shape, rho1_scale);
  alpha1 ~ normal(alpha1_loc, alpha1_scale);
  y ~ multi_normal_cholesky(rep_vector(0, N), L_cov);
}
generated quantities {
  int<lower=0,upper=1> I_lt_sim[N_SBC_pars];
  I_lt_sim[1] = alpha1 < alpha1_;
  I_lt_sim[2] = rho1 < rho1_;
  I_lt_sim[3] = sigma_y < sigma_y_;
}

and here the R code to run everything (the sbc is exactly as defined by Bob the case study)

library(tidyverse)
library(rstan)
options(mc.cores = 3)
rstan_options(auto_write = TRUE)
model <- stan_model("./gp_sbc_bob.stan")
N <- 11
times <- seq(-1,1, length.out = N)
stan_data <- list(
  N = N,
  t= times,
  sigma_y_scale=0.2,
  sigma_y_loc=0,
  rho1_lower=0.5,
  rho1_upper=0.7,
  alpha1_loc=0,
  alpha1_scale=1
)
result <- sbc(model, data = stan_data, sbc_sims = 1000, stan_sims = 999, max_thin = 64)

By the way I refrained from using the new rstanarm function sbc(), as it does not provide the automatic thinning (while guaranteeing a fixed number of sample outputs). On the other hand the rstanarm implementation is parallelised, which I think is a huge advantage given the embarrassingly parallel nature of SBC. I guess sooner or later we combine this…

2 Likes

Those figures definitely suggest that something is going wrong with \sigma – it could be posterior reconstruction problems or you might not be using the same observational model in the sampling and the fit. To double check the latter you can generate half-normal RNGs more simply just by applying the absolute value,

half_normal_sample = fabs(normal_rng(0, 1);

If sigma_y_loc isn’t zero then I don’t think your implementation will work anyways, which may also be a source of problems.

To double check you can just histogram your sampled parameters to verify that they are consistent with your prior distributions.

Thanks Michael. If I am not mistaken, then Stan implements the truncated normal in a statement sigma_y ~ normal(a,b) for real<lower=0> sigma_y (former is therefore equivalent to sigma_y ~ normal(a,b)T[0,]) and not the folded normal. Thus I should use the above truncated_normal_rng instead of fabs(normal_rng(.,.)) statement. Here is a small experiment:

functions {
  real trunc_normal_rng(real mu, real sigma) {
    real u = uniform_rng(normal_cdf(0, mu, sigma), 1);
    real y = mu + sigma * inv_Phi(u);
    return y;
  }
}
data {
  real<lower=0> sigma_y_scale;
  real<lower=0> sigma_y_loc;
}
transformed data {
}  
parameters {
  real<lower=0> sigma_y;
}
transformed parameters {
}
model {
  sigma_y ~ normal(sigma_y_loc, sigma_y_scale);
}
generated quantities {
  real <lower=0> sigma_y_1 = trunc_normal_rng(sigma_y_loc, sigma_y_scale);
  real <lower=0> sigma_y_2 = fabs(normal_rng(sigma_y_loc, sigma_y_scale));
}

and

library(rstan)
library(bayesplot)
fit <- stan("half_normal.stan", 
                 data=list(sigma_y_scale=1, sigma_y_loc=1),
                 iter=50000,
                 refresh=0, 
                 chains=20, 
                 cores=3, 
                 seed=42, 
                 control=list(adapt_delta=.95)
                )
print(fit, digits=4)

which outputs (with rstan 2.19.2)

Inference for Stan model: half_normal.
20 chains, each with iter=50000; warmup=25000; thin=1; 
post-warmup draws per chain=25000, total post-warmup draws=5e+05.

             mean se_mean     sd    2.5%     25%     50%    75%  97.5%  n_eff
sigma_y    1.2861  0.0027 0.7968  0.0794  0.6597  1.1989 1.8071 3.0386  89355
sigma_y_1  1.2876  0.0011 0.7942  0.0844  0.6646  1.1998 1.8045 3.0374 498752
sigma_y_2  1.1671  0.0011 0.7999  0.0521  0.5169  1.0513 1.6862 2.9647 493700
lp__      -0.4006  0.0039 0.9306 -3.0388 -0.6528 -0.0366 0.2171 0.2895  57495
            Rhat
sigma_y   1.0002
sigma_y_1 1.0000
sigma_y_2 1.0000
lp__      1.0002

Samples were drawn using NUTS(diag_e) at Sun Jul 14 07:59:57 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Will do so!

I found a potential discrepancy!

VS

Will re-run and report here whether this solves the above.

1 Like

That was exactly the reason! Below the p-values and rank histograms.

In my eyes this is a perfect example why SBC is so useful (the rank histogram is compatible with the discrepancy in the code)!

[1] 0.6034153 0.1074794 0.3685510

2 Likes

Here we go.

2 Likes

Nice.

Right, which is why I specifically referred to the half-normal with sigma_y_loc = 0. I didn’t think the scaling and translation would work after a sample had been generated but your nice experiment proved me wrong!

Knowing that the dynamic HMC sampler in Stan handles these particular GPs fine, I strongly suspected that was some difference between the simulation and the fitting model. And there you go!

1 Like