Yes, he used āsomethingā to find a positive solution using the EP algorithmn. See that paper you
referenced on p. 647 right side:
āSince the likelihood for the derivative observations in
(8) is not Gaussian, the posterior is analytically intractable. We apply the EP algorithm, and compute
the Gaussian approximation for the posterior distribution.ā
Where you have already have a solution and put something on it. For me thatās two different things.
Update: I simplified your code and used the sampling tMVN from @bgoodri
Sometimes it produces : Exception: Exception: Phi: x is -nan, but must not be nan!
real v; real u_star = Phi(z_star);
functions{
vector[] make_stuff(vector mu, matrix L, vector b, vector s, vector u) {
int K = rows(mu); vector[K] d; vector[K] z; vector[K] out[2];
for (k in 1:K) {
int km1 = k - 1;
if (s[k] != 0) {
real z_star = (b[k] -
(mu[k] + ((k > 1) ? L[k,1:km1] * head(z, km1) : 0))) /
L[k,k];
real v; real u_star = Phi(z_star);
if (s[k] == -1) {
v = u_star * u[k];
d[k] = u_star;
}
else {
d[k] = 1 - u_star;
v = u_star + d[k] * u[k];
}
z[k] = inv_Phi(v);
}
else {
z[k] = inv_Phi(u[k]);
d[k] = 1;
}
}
out[1] = z;
out[2] = d;
return out;
}
/*Functions to create all three covariance (sub) matrices (fourth is just cov_quad_exp) */
real K22(real s, real sp, real alpha, real rho){
real sq_delta = square(s-sp);
real sq_rho = square(rho);
real cov = square(alpha) * exp(-sq_delta/(2*sq_rho)) * 1/square(sq_rho) * ( square(sq_delta)/square(sq_rho) - (6*sq_delta/sq_rho) + 3 );
return(cov);
}
matrix K22_cov_mat(real[] s, real alpha, real rho){
int K = size(s);
matrix[K, K] X;
real diag_entry = 3*square(alpha/square(rho));
for (i in 1:K) {
X[i, i] = diag_entry;
for (j in 1:K) {
X[i, j] = K22(s[i], s[j], alpha, rho);
X[j, i] = X[i, j];
}
}
return(X);
}
real K20(real t, real s, real alpha, real rho){
real cov; // computation in log-space for numerical stability
real sq_delta = square(t-s);
real sq_rho = square(rho);
cov = square(alpha) * exp(-sq_delta/(2*sq_rho)) * ( sq_delta / square(sq_rho) - 1/sq_rho);
return(cov);
}
matrix K20_cov_mat(real[] x, real[] s, real alpha, real rho){
int N = size(x);
int K = size(s);
matrix[N, K] Y;
for (i in 1:N) {
for (j in 1:K) {
Y[i, j] = K20(x[i], s[j], alpha, rho);
}
}
return(Y);
}
matrix ffprime_cov_mat(real [] x, real[] s, real alpha, real rho){
int N = size(x);
int K = size(s);
int J = N + K;
matrix[N, N] A;
matrix[N, K] B;
matrix[K, K] C;
matrix[J, J] M;
// | A, B|
// | B',C|
A = cov_exp_quad(x, alpha, rho);
B = K20_cov_mat(x, s, alpha, rho);
C = K22_cov_mat(s, alpha, rho);
/*Filling in the matrix*/
M[1:N, 1:N] = A;
M[1:N, (N+1):J] = B;
M[(N + 1):J, (N+1):J] = C;
M[(N + 1):J, 1:N] = B';
return(M);
}
matrix full_cov_mat(real [] x, real[] s, real[] x_pred, real alpha, real rho){
int N = size(x);
int K = size(s);
int N_pred = size(x_pred);
int W = N + K + N_pred;
matrix[W, W] M;
matrix[N_pred, K] Q = K20_cov_mat(x_pred, s, alpha, rho);
/* building sub-matrices */
// | (f, f) (f,f'') (f, f_p) |
// | (f'',f) (f'', f'') (f'', f_p) |
// | (f_p, f) (f_p, f'') (f_p, f_p) |
M[1:(N + K), 1:(N + K)] = ffprime_cov_mat(x, s, alpha, rho);
M[(N + K + 1):W, 1:N] = cov_exp_quad(x_pred, x, alpha, rho);
M[(N + K + 1):W, (N + 1):(N + K)] = Q;
M[1:N, (N + K + 1):W] = cov_exp_quad(x, x_pred, alpha, rho);
M[(N + 1):(N + K), (N + K + 1):W] = Q';
M[(N + 1 + K):W, (N + 1 + K):W] = cov_exp_quad(x_pred, x_pred, alpha, rho);
return(M);
}
}
data {
int<lower=1> K;
real x[K];
}
transformed data {
vector[K*2] b = rep_vector(0, 2*K); // lower or upper bound
vector[K*2] mu = rep_vector(0, 2*K);
real alpha = 1.0;
real rho = 1.0;
// s[k] == 0 implies no constraint; otherwise
// s[k] == -1 -> b[k] is an upper bound
// s[k] == +1 -> b[k] is a lower bound
vector<lower=-1,upper=1>[K*2] s = append_row(rep_vector(0, K), rep_vector(1, K));
matrix[K*2,K*2] Sigma =
append_row(
append_col(cov_exp_quad(x, alpha, rho), K20_cov_mat(x, x, alpha, rho))
, append_col( K20_cov_mat(x, x, alpha, rho), K22_cov_mat(x, alpha, rho))
) + diag_matrix(rep_vector(1e-5,2*K));
matrix[K*2,K*2] L = cholesky_decompose(Sigma);
}
parameters {
vector<lower=0,upper=1>[K*2] u;
}
model {
target += log(make_stuff(mu, L, b, s, u)[2]); // Jacobian adjustments
// implicit: u ~ uniform(0,1)
}
generated quantities {
matrix[K*2,K*2] Sigma_gen = Sigma;
vector[K*2] y = mu + L * make_stuff(mu, L, b, s, u)[1];
}
GPconv.stan (4.2 KB) GPconv.R (692 Bytes)
Hint: If you flip Zāā and Z in the covariance matrix and adjust the constraints, youāll get a better performance.
GPconv_fit.R (1.4 KB) GPconv_fit.stan (4.3 KB)
Update: Constraint a GP f(0) = 0
https://stackoverflow.com/questions/37539498/conditioning-gaussian-process-regression
Quoting Julien Bect:
- the samplepaths of your GP will satisfy the property f(0)=0 almost surely if, and only if, the mean function and the variance function vanish at x=0.
- Note that this is only possible for a non-stationary GP (or for the trivial GP with null mean and variance functions, of course).
- A typical example of this the standard Brownian motion over [0; +\infty), which has a zero mean function and covariance function k(x,y) = min(x,y).
- Starting from a generic covariance function k0, you can obtain this property using the conditioning formula: k(x,y) = k0(x,y) - k0(x,0) k0(0,y) / k0(0,0).
Letās try:
x <- seq(-1,1,length.out=11)
k0<-function(xi, xj) exp(-(xi - xj)^2 / 2)
cov_exp_quad = function(x) {
outer(x, x, function(x, y) { k0(x,y) - k0(x,0)* k0(0,y) / k0(0,0) })
}
cov_exp_quad(x)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,] 0.6321206 0.5397670 0.41649935 0.27537184 0.13162849 0 -0.10776829 -0.18458727 -0.22857969 -0.2425330 -0.2325442
[2,] 0.5397670 0.4727076 0.37366801 0.25279630 0.12349989 0 -0.10523966 -0.18356779 -0.23121956 -0.2492551 -0.2425330
[3,] 0.4164994 0.3736680 0.30232367 0.20914709 0.10438559 0 -0.09258172 -0.16452093 -0.21092407 -0.2312196 -0.2285797
[4,] 0.2753718 0.2527963 0.20914709 0.14785621 0.07536126 0 -0.06956721 -0.12599475 -0.16452093 -0.1835678 -0.1845873
[5,] 0.1316285 0.1234999 0.10438559 0.07536126 0.03921056 0 -0.03767309 -0.06956721 -0.09258172 -0.1052397 -0.1077683
[6,] 0.0000000 0.0000000 0.00000000 0.00000000 0.00000000 0 0.00000000 0.00000000 0.00000000 0.0000000 0.0000000
[7,] -0.1077683 -0.1052397 -0.09258172 -0.06956721 -0.03767309 0 0.03921056 0.07536126 0.10438559 0.1234999 0.1316285
[8,] -0.1845873 -0.1835678 -0.16452093 -0.12599475 -0.06956721 0 0.07536126 0.14785621 0.20914709 0.2527963 0.2753718
[9,] -0.2285797 -0.2312196 -0.21092407 -0.16452093 -0.09258172 0 0.10438559 0.20914709 0.30232367 0.3736680 0.4164994
[10,] -0.2425330 -0.2492551 -0.23121956 -0.18356779 -0.10523966 0 0.12349989 0.25279630 0.37366801 0.4727076 0.5397670
[11,] -0.2325442 -0.2425330 -0.22857969 -0.18458727 -0.10776829 0 0.13162849 0.27537184 0.41649935 0.5397670 0.6321206
Looks good. Whatās left over?
-
monotonicity
The covariances can be found in the papers. Same approach as convexity.
-
f(0) = 0 and monotonicity / convexity
adjust the derivatives
-
adopt @hhau splines approach
central question how to sample with Ax > 0 constraint with A constraint matrix?