# How to convert gaussian GP model to fit binomial outcome variable

I have a gaussian process regression fit to a gaussian output variable that works fine. However I’m trying to adapt it for a binomial output and I’m suddenly getting initial values `Random variable[1] is nan, but must be not nan! ` errors. I don’t understand why so hoping someone can help.

First some toy data:

``````data1 <- list(n_obs = 20L, n_exp = 3L, n_cov = 1,
y = c(1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1),
Z = structure(c(0.235, -0.331, -0.312, -2.302, -0.171, 0.14, -1.497, -1.01, -0.948,
-0.494, -0.174, -0.407, 1.846, 0.394, 0.798, -1.567, -0.086,
-0.359, -1.194, 0.364, 0.362, 0.347, 0.19, -0.16, 0.327, 0.598,
-1.842, 2.718, 0.191, -1.301, -3.113, -0.941, 1.4, -1.62, -2.266,
1.163, -0.116, 0.334, -0.621, -1.31, -1.176, -1.121, -1.362,
0.481, 0.742, 0.028, 0.331, 0.644, 2.486, 1.96, 0.192, 1.553,
0.914, 0.359, 0.175, -0.847, 0.978, 1.806, 0.123, -0.13), dim = c(20L, 3L),
dimnames = list(NULL, c("z1", "z2", "z3"))),
X = structure(c(2.485, 5.73, 3.675, -0.182, 5.817, 2.208, 0.625, -0.017, 2.338, 5.451,
5.002, 3.708, -2.155, 3.089, 1.33, 1.884, 1.726, 2.612, 3.169,
3.579), dim = c(20L, 1L)))
``````

The gaussian fit model that works:
gpgaussian.stan (1.1 KB)

``````res <- stan(file='Stan/gpgaussian.stan', data = data1,
chains=4, seed=49485, iter = 2000, cores=4)
``````

My attempt at a binomial outcome version:

``````functions {
matrix cov_Kzr(matrix Z, vector r) {
int n_obs = dims(Z)[1];
int n_exp = dims(Z)[2];
matrix[n_obs, n_exp] Zr = Z .* rep_matrix(sqrt(r'), n_obs);
matrix[n_obs, n_obs] K = rep_matrix(0.0, n_obs, n_obs);

for (i in 1:n_obs){
for (j in 1:n_obs){
if (i < j){
for (k in 1:n_exp){
K[i, j] += square( Zr[i, k] - Zr[j, k] );
}
K[j, i] = K[i, j];
}
}
}
return K;
}
}

data {
int n_obs;
int n_exp;
int n_cov;

int y[n_obs];
matrix[n_obs, n_exp] Z;
matrix[n_obs, n_cov] X;
}

parameters {
real<lower=0.0> sigma;
real<lower=0.0> lambda;
vector[n_cov] beta;
real a;

vector<lower=0.0>[n_exp] r;
}

model {
// local params
matrix[n_obs, n_obs] V;
vector[n_obs] alpha;

V = add_diag( lambda * exp(-cov_Kzr(Z, r)) , 1.0 ) ;

// Priors
a ~ std_normal();
sigma ~ inv_gamma(1, 0.25);
lambda ~ inv_gamma(10, 10);
r ~ cauchy(0, 0.1);
beta ~ std_normal();

// Likelihood
alpha ~ multi_normal_cholesky(X*beta, sigma * cholesky_decompose(V));
y ~ bernoulli_logit( a + alpha );
}

``````

So when I run this model I get this error over and over for all chains:

``````Chain 3: Rejecting initial value:
Chain 3:   Error evaluating the log probability at the initial value.
Chain 3: Exception: multi_normal_cholesky_lpdf: Random variable[1] is nan, but must be not nan! (in 'anon_model', line 48, column 4 to column 73)
``````

I don’t understand why? Can anyone explain / propose solution?
Many thanks!

The hint is here, and this is the line in question:

``````    alpha ~ multi_normal_cholesky(X*beta, sigma * cholesky_decompose(V));
``````

This isn’t entirely clear from the message, but the “variate” is the outcome `alpha` here and it’s saying the first component of it has a NaN value. This is reporting the bug that `alpha` is declared but never defined:

``````vector[n_obs] alpha;
``````

Stan initializes variables to NaN precisely to catch this kind of bug. I believe for this to work, you will need to declare `alpha` as a parameter.

The explanation here is that `a ~ foo(theta)` is just shorthand for `target += foo_lpdf(a | theta);`

1 Like

Ahh ok thanks @Bob_Carpenter that did the trick. Apparently I should read up on the differences between declaring things in different blocks! … now… I have divergences to investigate :D