@linas Hey! So don’t take the “Stan Developer” tag so seriously. Not an expert. Don’t know much about trees, haven’t taken any serious statistics classes either.
I think it really depends on asking “what would I like to learn?”
Common way to model time series with a GP, might not be useful in your case:
But in time series data, you could generate a covariance matrix k(x', x) using x =[1,2,3,...,T]', but if you use a squared exponential covariance function, and use for example, tree height as y, the GP time series model y \sim GP(0, k(x',x)) would just give you a smooth basis function that intersects the tree height at each point in time, which would make a pretty plot, but might not be particularly informative (you could plot the data and learn just as much).
Potentially more useful:
One thing that you could do… Say we want to learn something about what influences tree height. One simple model we could consider is an AR model, which would give us an average rate of change over time about how some covariate, say water (again, don’t know what canopy means without wikipedia), influences tree growth.
But hey, there’s over watering, and under-watering, right? I’ve killed plants by both over watering and underwatering them. Possibly a non-linear relationship. Could be a good place for a gaussian process prior on the AR regression coefficient. This way, we could capture that non-linear relationship that over watering, or underwatering, at time point T-1 has at time T.
So the “pseudo code” would look something like this.
For t in (2,3,…T):
- Generate the water covariance matrix for all trees at time T-1.
- Sample from the GP (multivariate normal), where f_{water} is one of the AR regression coefficients at time t-1, i.e. y_t = \beta_{water,t-1} + \eta, \beta \sim GP(0, k(x',x))
I’m definitely omitting/fudging some computational and statistical details, but the principal is there. Real model would require more work and thought. Also, this could potentially not be possible. I’m abusing notation too.
I hope this helps. It would help me if I could have more information. If the dataset is “open source,” I think it would be a good time to have a back and forth and write a model. Other people looking around could see how to do it them selves. Do you have a comprehensive list of what’s in the dataset? What would you like to learn?
Disclaimer: my suggestions could be awful!
It’s really on me that there’s no good examples on how to apply this tool. I’ve been slacking.
Definitely possible but I remember it being lots of work. I think we have GPU released to users for Cholesky decomposition, which will do a lot of the heavy lifting for GP models. The big bottle neck here is using Cholesky decomp to sample from the posterior. I remember it taking up something like 2/3 of iteration time for a GP regression. Also, I think the GPU was made easier for users because it abstracts all the computational details away. It’s really not a big deal though unless you have a giant covariance matrix, or a bunch of covariance matrices though. The GPU devs are mainly Rok, Steve and t4c1. They would know.
But yeah, the biggest question is what do you want to know? Also, what’s possible with the information you have.
Andre