Gaussian Process: Rejecting initial value

Hello,

I’m having trouble with the GP model below. The model works but when I make this inconsequential change: sqrt(square(…)) in the kernel function I get a “Rejecting initial value” message. If anyone could shed some light on why this is happening it would be much appreciated. I’ve uploaded a small data set and some python code to illustrate the problem.

Thanks,
Ben

P.S. I’m interested in being able to take the sqrt of the dot_self(x[i] -x[j]) ./ rho because I would like to implement a 5/2 matern kernel.

gp_error.py (819 Bytes)
gp_data.csv (18.5 KB)

// Fit a Gaussian process's hyperparameters
data {
    int<lower=1> N;                 // number of data items
    int<lower=1> D;                 // number of predictors
    vector[D] x[N];                 // predictor matrix
    vector[N] y;                    // outcome variable
    // real alpha;                  // alpha prior for variance
    // real beta;                   // beta prior
}
transformed data {
    vector[N] mu;
    for (i in 1:N) mu[i] = 0;       // scaled mu prior
}
parameters {
    real<lower=0> eta;
    // real<lower=0> sigma_sq;
    real<lower=0> sigma;
    vector<lower=0>[D] rho;
}
transformed parameters {
    real<lower=0> eta_sq;
    real<lower=0> sigma_sq;
    eta_sq = square(eta);
    sigma_sq = square(sigma);
}
model {
    matrix[N, N] Sigma;

    // off-diagonal elements
    for (i in 1:(N - 1)) {
        for (j in (i + 1):N) {
            Sigma[i, j] = eta_sq * exp(-0.5 * dot_self((x[i] - x[j]) ./ rho));
            Sigma[j, i] = Sigma[i, j];
        }
    }

    // diagonal elements
    for (k in 1:N)
        Sigma[k,k] = eta_sq + sigma_sq;           // + jitter

    eta ~ normal(0, 1);
    rho ~ gamma(4, 4);
    // sigma_sq ~ inv_gamma(alpha, beta);
    sigma ~ normal(0, 1);

    y ~ multi_normal(mu, Sigma);
}
generated quantities{
    matrix[N, N] Sigma;
    matrix[N, N] inv_cov_matrix;

    for (i in 1:(N - 1)) {
        for (j in (i + 1):N) {
            Sigma[i, j] = eta_sq * exp(-0.5 * dot_self((x[i] - x[j]) ./ rho));
            Sigma[j, i] = Sigma[i, j];
        }
    }

    // diagonal elements
    for (k in 1:N)
        Sigma[k,k] = eta_sq + sigma_sq;           // + jitter

    inv_cov_matrix = inverse_spd(Sigma);
}

Here is the change that causes the Rejecting initial value message.

model {
    matrix[N, N] Sigma;

    // off-diagonal elements
    for (i in 1:(N - 1)) {
        for (j in (i + 1):N) {
            Sigma[i, j] = eta_sq * exp(-0.5 * sqrt(square(dot_self((x[i] - x[j]) ./ rho))));
            Sigma[j, i] = Sigma[i, j];
        }
    }

    // diagonal elements
    for (k in 1:N)
        Sigma[k,k] = eta_sq + sigma_sq;           // + jitter

    eta ~ normal(0, 1);
    rho ~ gamma(4, 4);
    // sigma_sq ~ inv_gamma(alpha, beta);
    sigma ~ normal(0, 1);

    y ~ multi_normal(mu, Sigma);
}

sqrt(square(dot_self(y))) == dot_self(y) where y = (x[i] - x[j]) ./ rho. I think it should be

Sigma[i, j] = eta_sq * exp(-0.5 * sqrt(dot_self( (x[i] - x[j]) ./ rho )));

Hi Ben,

Thanks for the fast response. Let me clarify my question. Why does the gaussian process work when the radial bias kernel function is:

    Sigma[i, j] = eta_sq * exp(-0.5 * dot_self((x[i] - x[j]) ./ rho));

But fails when it is:

     Sigma[i, j] = eta_sq * exp(-0.5 * sqrt(square(dot_self((x[i] - x[j]) ./ rho))));

Given that these two statements are equivalent.

Thanks,
Ben

I think the sqrt(square(...)) messes up the autodiff. It is equivalent to abs.

Good catch that sqrt(square(x)) == abs(x).

I’m a bit embarrassed that I did not thoroughly read the user manual. On page 71 there is a discussion of Errors Due to Chain Rule in differentiation.

Thank you,
Ben