Hey all 
Here are some Wiki’s in different places related to Stan’s GP library, as well as things I’ve done or come across. I’m hoping that we can unify these documents, and come up with a concrete plan:
1.1 @rtrangucci’s Roadmap for structured linear alegebra and GP covariance functions
1.2 @betanalpha’s Adding a GP Covariance Function

We have the following kernels that have PRs or are already merged into dev:
gp_cov_exp_quad
:prim/rev
,
gp_dot_prod_cov
:prim
,
gp_periodic_cov
:prim/rev
gp_matern52_cov
:prim
,
gp_matern32_cov
:prim
and thengp_exponential_cov
I have on my fork w/prim
,rev
, but it needs edits prior to opening a PR, and I’ve been waiting for other PR’s to go through, as this has taken much longer than expected. 
We need edit the documentation in the Stan manual, because some of the example models are incorrect, see this issue: Update GP models in Stan Reference Manual. This isn’t very labor intensive, and I’m happy to edit some of them if I could find the Rmarkdown document (this will not go through without peer review, of course).

Having a GP function: For large kernels and operations (summing or elementwise multiplication of them), the memory consumption of the Autodiff stack is too large. I’ve had the idea of using a
shared_memory
or a some memory mapping that would share the autodiff stack among different cores.
However, this might not be necessary (or at least to pursue immediately) for a couple reasons:
a. It looks like @bbbales2’s Adj_jac_apply could in part solve the memory consumption issue. Ben  I’m seeing in bullet (1.) you mention that “we do not need to compute the jacobian of your operator, just a vector^T jacobian product”… I gone through the code and your comments in adj_jac_apply.hpp, but would you mind elaborating a bit, perhaps how could this reduce memory consumption in the context of GPs in Stan? I’m sorry  sometime I don’t understand things on the first pass.
b. Having sparse matrices could also reduce memory consumption. 
Alebra functions for Sparse Matrices. See @Daniel_Simpson’s document above in section “Sparse Matrix” support. There is a fair amount of intersect with Rob’s 1.1 wiki.
It’s important to think about workflow and efficiency. In case more that one person wants to work on this, so that we can do things in parallel, we can divide the above tasks into disjoint sets. All of the below bullet points are can be developed independently:
 Finish the above set of kernels in
prim
andrev
.  Implement some of the structured linear Algebra types as in Rob’s 1.1, leaving out the specialized toeplitz output for kernels until the above set of kernels are done in ‘rev’.
 Implement a GP function (may be after @bbales updates the adj_jac_apply for matrix datatypes, this probably has to be a one man job)
 Sparse Matrix types and operations, following requirements in the “Sparse Matrix Wishlist” (there are notes from @Bob_Carpenter in the section entitles “A Sparse Matrix Type”, that we would need to follow, i.e. something about a map from a sparse matrix to a vector).
 Implement some of the
ncp
's of the kernels as in 1.2 above. (but instead of using NCP can’t we just sum a dot product kernel with another kernel, and it’s this the same as incuding a linear regression part into the mean function, something like that? Or can someone explain to me the importance of developing ancp
independently?).
The rng
's 1.2 I’m not so sure about. A great feature of GP’s is that we can sum/multiply kernels. And for all combinations of sums and multiplications, we’d need to have a specialization gp_pred_rng
. This would create a lot of work that might only be used in specific cases, and we could probably have the predictive posteriors for a given GP kernel combination done automatically if we develop a stan gp
function. We could also specialize the matrix computation (i.e. if it’s sparse or a toeplits type) within the function, once we develop some more of the structured linear algebra functions. So I don’t think making gp_pred_rng’s for each kernel would be a good use of time.
Anyway, I think priority is finishing up initial set of kernels, sparse matrix type, and making sure we don’t have memory issues with large kernels.
thoughts??