R2D2 prior and Gaussian Processes

I’ve run up to 10,000 iterations with no only small increased in emfi values. I’ve also trie don larger datasets and problem still happens.

Ok sounds good I’m really willing to try this, but I don’t quite follow what you mean I’m afraid :/ (Edit: to be clear I understand the motivation to avoid calculating F, but I don’t understand how it should be coded)

Yes I’ve been looking forward to try this out!

So I tested this in one model and getting an error:


This seems incorrect to me - surely it should accept a vector of reals like the non-laplace version?

I’m still lost on this having thought about it all weekend. Any further details on what you intended would be much appreciated :)

Sorry for recommending this before testing it myself. The documentation has some errors and I’ve also found a couple bugs, so I can’t recommend it before I get it working myself. It is also possible that the function signatures are not yet as flexible as for other functions.

1 Like

No worries at all - I know it’s only at release candidate stage, I half expected a problem to happen! What’s given me more stress is the moving the linear model to the GP covariance matrix topic, since it may hopefully unlock my final issue with the model but I dont’ know what to do!

I’m still not completely sure what you meant but I tried a few options.

For these runs the R2 priors are:

R2D2_mean_R2 = 0.6, R2D2_prec_R2 = 15, R2D2_cons_D2 = c(1.5, 1.5, 0.5)

Here are the errors messages and some pairs plots from the model above so with this likelihood:

    V = add_diag( lambda * exp(-cov_Kzr(Zc, r)) , delta );

    // GP
    target += multi_normal_cholesky_lpdf(F | rep_vector(0, n_obs), cholesky_decompose(V));
    // Likelihood
    if (!prior_only) {
        target += bernoulli_logit_glm_lupmf(y | Xc, Intcpt + F, beta);
    }

I first tried changing it as follows:

   // GP
   target += multi_normal_cholesky_lpdf(F | Intcpt + Xc * beta, cholesky_decompose(V));
   // Likelihood
   if (!prior_only) {
       target += bernoulli_logit_lpmf(y | inv_logit(F));
   }

This was worse:

Then I tried changing it as follows (which I think is what you intended in your previous comment @avehtari ?)

    V = add_diag( lambda * exp(-cov_Kzr(Zc, r)) , delta ) +
                (Intcpt + Xc*beta) * to_row_vector(Intcpt + Xc*beta) .* identity_matrix(n_obs);

    target += multi_normal_cholesky_lpdf(F | rep_vector(0, n_obs), cholesky_decompose(V));
    target += bernoulli_logit_lupmf(y | inv_logit(F));

This was worse again (and again I changed nothing else between these runs):

I am somewhat at a loss what to try next. The first model is clearly less pathologic than the other two, but having tried many different priors, increasing iterations, larger datasets, I cannot quite quosh the EMFI warnings. Any thoughts anyone?

Sorry for not providing more details the first time. See GPML book Section 4.2.2 and Stan documentation about dot product covariance function. If the linear model has normal prior, and we integrate out the linear model parameters, we get prior on function space which has the dot product covariance form which can be added to the total covariance matrix. If we only have a linear model with p<<n we don’t do that as the GP covariance matrix based computation scales as n^3, but if you already are using GP with covariance based computation, then it’s better to just add everything to the GP.

1 Like

Hi Aki. Been a while since I had a chance to work on this, but have now tried the dot product version changing the code as follows:

V = add_diag( lambda * exp(-cov_Kzr(Zc, r)) , delta ) + gp_dot_prod_cov(to_array_1d(Xc*beta), 0.0);


// GP
target += multi_normal_cholesky_lpdf(F | rep_vector(0, n_obs), cholesky_decompose(V));
// Likelihood

target += bernoulli_logit_lpmf(y | inv_logit(Intcpt + F));

This led to a new issue:


In addition to the BFMI, rhat and ess wanrings, the beta parameter now has a zero avoiding behaviour - I’ve no idea why I didn’t change other aspect of the model and this wasn’t happening before using gp_dot_prod_cov(). Any ideas why that’s happening?

Comparing to the code far above, it seems like beta is a vector of coefficients, and then it means you have misunderstood the dot product covariance. The dot product covariance corresponds to what happens if we analytically integrate over the coefficients. I recommend to read sections 2.1 Weight-space View and 2.2 Function-space View in GPML book

The dot product covariance part should look something like (I can’t remember whether Xc can be a matrix or whether it needs to be an array of vectors)

sd_b * gp_dot_prod_cov(Xc, 1.0)

where sd_b is the normal prior sd for beta, and 1.0 is the normal prior sd for the intercept.

1 Like

Ah yes I did misunderstand, I will read up more and experiment with code. Thank you.

I add that when reading GPML book Chapter 2, you don’t need to understand all equations, but hopefully it clarifies why beta disappears and why it’s called dot product covariance

1 Like

Hi again @avehtari. I took a look at those chapters and makes sense.

But when it comes tomorrow implementation I’m confused:

So in my model Xc is for example an m sized array of vectors each of length n representing independent covariates and sd_b is an m sized vector of the corresponding hyper-parameters for the coefficients. With this code I end up with a m sized vector sd_b multipltied by an n x n sized matrix which is the result of gp_dot_prod_cov(). So then the vector of matrix have non-matching dimensions a nd can’t be multiplied. Have I again misunderstood something? Or does this approach only work for a single variable at a time?

  • I assumed you would have the same prior scale for all linear model coefficients (which is sensible if the number of coefficients is not big), in which case the code works
  • In case of separate prior scales for each coefficient (and some hierarchical prior for the scales) it would be nice if gp_dot_prod_cov() had third argument for the scales, but unfortunately it doesn’t. Thus, you need to scale the covariate values, by dividing the covariate values by the corresponding prior scale before calling gp_dot_prod_cov().

gp_ ... _cov() functions have several limitations. After we made some internal gp_ ... _cov() we realized that they are not solving the problem how Stan’s autodiff makes covariance matrix based GPs slow and use a lot of memory. Improving arguments would make the function easier to use, but still the use is limited to small data and relatively small number of combined covariance matrices. Developers have been thinking more flexible approach using immutable matrices for covariance matrices presented as transformation of data and small number of parameters, but it’s not coming soon. While GPs are sometimes useful in Stan models, Stan can’t compete in performance with specialized GP (or GMRF) software, so there it’s not clear how much effort we should put in improving GP (and GMRF) performance.

1 Like

Thanks Aki - thats really helpful to understand where I was going wrong. Thanks also about the info on the development plans! For my purposes I’m not so interested in big datasets and speed, more to stomp out emfi wanrings I was getting, but i’m making progress!