Hi Stanimals,
I have problems with a rather simple Gaussian Process model on synthetic (fake) data, so I decided to use SBC to check whether I have a bug/flaw in my Stan program and if Stan is calibrated for this model (given my selected NUTS parameters and MCMC specs).
I focus here on the 3 parameters \rho_1,\alpha_1 being the length-scale and marginal standard deviation of the squared exponential kernel and \sigma_y being the standard deviation of i.i.d normal measurement noise (see Stan model below).
The model/implementation is inspired by @betanalpha case studies, in particular I put an inverse gamma prior on \rho_1. I use the prior parameters given by Michael and rescale and shift them to interval of choice (in Michael’s case study the inverse-gamma parameters were chosen with the interval [2,10] in mind, here I work with [0.5,0.7], where I want to ensure that my prior has the property that \mathbb{P}[\rho_1 < 0.5] = \mathbb{P}[\rho_1 > 0.7] = 0.01).
I use the SBC code as part of @Bob_Carpenter’s SBC case study.
The results do not look calibrated, and I cannot figure out what the reason is. In fact this might suggest a bias in the data-averaged posterior distribution with respect to the prior?
The corresponding p-values, as defined in Bob’s case study, are:
[1] 0.02875681 0.37790959 0.00000000
and the stats for the thinning:
# table(result$thin)
4 8 16
768 221 11
Since thinning is used, I would not explain the above deviation from uniformity to MCMC induced (temporal) correlations in the posterior sample sequence, right?
Below the correspond Stan and R code:
Stan model (below referred to as gp_sbc_bob.stan
):
functions {
real trunc_normal_rng(real mu, real sigma) {
real u = uniform_rng(normal_cdf(0, mu, sigma), 1);
real y = mu + sigma * inv_Phi(u);
return y;
}
}
data {
int<lower=1> N;
real t[N];
real<lower=0> sigma_y_scale;
real<lower=0> sigma_y_loc;
real<lower=0> rho1_lower;
real<lower=0> rho1_upper;
real<lower=0> alpha1_loc;
real<lower=0> alpha1_scale;
}
transformed data {
int N_SBC_pars = 3;
real<lower=0> rho1_shape;
real<lower=0> rho1_scale;
real rho1_shift;
real rho1_factor;
matrix[N,N] K1_;
matrix[N,N] L1_;
vector[N] y;
real<lower=0> alpha1_;
real<lower=0> rho1_raw_;
real<lower=0> rho1_;
real<lower=0> sigma_y_;
rho1_shape = 8.91924;
rho1_scale = 34.5805;
rho1_factor = (rho1_upper - rho1_lower)*0.125;
rho1_shift = (5*rho1_lower - rho1_upper )*0.25;
sigma_y_ = trunc_normal_rng(sigma_y_loc, sigma_y_scale);
rho1_raw_ =inv_gamma_rng(rho1_shape, rho1_scale);
alpha1_ = trunc_normal_rng(alpha1_loc, alpha1_scale);
rho1_ = rho1_shift + rho1_factor*rho1_raw_;
K1_ = cov_exp_quad(t, alpha1_, rho1_)+diag_matrix(rep_vector(1e-9,N))+diag_matrix(rep_vector(sigma_y_, N));
L1_ = cholesky_decompose(K1_);
y = multi_normal_cholesky_rng(rep_vector(0, N), L1_);
}
parameters {
real<lower=0> alpha1;
real<lower=0> rho1_raw;
real<lower=0> sigma_y;
}
transformed parameters {
real<lower=0> rho1;
rho1 = rho1_shift + rho1_factor*rho1_raw;
}
model {
matrix[N, N] cov = cov_exp_quad(t, alpha1, rho1)
+ diag_matrix(rep_vector(square(sigma_y), N));
matrix[N, N] L_cov = cholesky_decompose(cov);
sigma_y ~ normal(sigma_y_loc, sigma_y_scale);
rho1_raw ~ inv_gamma(rho1_shape, rho1_scale);
alpha1 ~ normal(alpha1_loc, alpha1_scale);
y ~ multi_normal_cholesky(rep_vector(0, N), L_cov);
}
generated quantities {
int<lower=0,upper=1> I_lt_sim[N_SBC_pars];
I_lt_sim[1] = alpha1 < alpha1_;
I_lt_sim[2] = rho1 < rho1_;
I_lt_sim[3] = sigma_y < sigma_y_;
}
and here the R code to run everything (the sbc
is exactly as defined by Bob the case study)
library(tidyverse)
library(rstan)
options(mc.cores = 3)
rstan_options(auto_write = TRUE)
model <- stan_model("./gp_sbc_bob.stan")
N <- 11
times <- seq(-1,1, length.out = N)
stan_data <- list(
N = N,
t= times,
sigma_y_scale=0.2,
sigma_y_loc=0,
rho1_lower=0.5,
rho1_upper=0.7,
alpha1_loc=0,
alpha1_scale=1
)
result <- sbc(model, data = stan_data, sbc_sims = 1000, stan_sims = 999, max_thin = 64)
By the way I refrained from using the new rstanarm
function sbc()
, as it does not provide the automatic thinning (while guaranteeing a fixed number of sample outputs). On the other hand the rstanarm
implementation is parallelised, which I think is a huge advantage given the embarrassingly parallel nature of SBC. I guess sooner or later we combine this…