I am having some issues with priors for the range parameter for a simple exponential covariance function. I have looked at Betancourt’s nice write up on informative priors for squared exponential covariance functions [https://betanalpha.github.io/assets/case_studies/gp_part1/part1.html], but still have a couple questions.
In terms of the exponential covariance function, I typically see this parameterization (C(d) = \sigma^2exp(-\phi d)) where \phi is the spatial decay parameter, and a back of the envelope calculation for the effective range is 3/\phi.
I have data with a spatial range from a minimum distance of 25km to a maximum distance of 350km. I am struggling with a reasonable prior for \phi. I have tried 1/\phi \sim \text{half-t}(0,1), as recommended by Flaxman (2015). However, as seen in the attached posterior plots of 1/\phi vs \sigma, there seems to be a ridge, and I am also getting a fair amount of divergent transitions. I am not clear if I should try a more informative gamma (e.g., restrict the majority of mass to fall between 25 and 150 km) on 1/\phi (the range parameter), or put an inverse gamma prior on \phi directly? I know the range and spatial variance parameters are notoriously hard to pin down, so any suggestions would be much appreciated.
It’s important to put a prior on \phi that penalises small ranges (ie, smaller than the locations you’ve observe can infer). The Stan Manual gives some suggestions, as do we here: https://arxiv.org/abs/1503.00256
For the most part, the posterior is far less senisitve to the right hand tail of 1/\phi than it is to the left tail! (Basically, at some point as 1/\phi gets big relative to the range of d you’ve observed, the GP fit starts looking a lot like a smoothing spline fit, and how far the paramater is into the tail stops mattering so much). The left hand tail is bad because if you’ve got observations spaced at around 1 unit apart, the data cannot tell you about closer behaviour than that. This leads to really weird structure in the posterior, which in turn leads to divergences.
TL;DR: Don’t use the Flaxman suggestion. Put an inverse gamma on the range (which is a gamma on \phi) and tune it to know about the set of the locations at which you’re observing the GP. (As in Fuglstad et al.)
Did you read on to part 3 of that case study? I tried @betanalpha’s prescription for estimating the parameters of an informative inverse gamma prior for a similar situation of having a reasonable idea of the minimum and maximum length scales. It works at least in the sense of not producing obvious pathologies. The only implementation issue I ran into is that algebra_solver seems to need a good starting guess or it fails with a cryptic error message.
Here’s a sample pairs plot of the parameters related to the GP. I don’t know what the “true” values should be (this is from real data, not a simulation), but ρ should be somewhere in the range (0.2, 1) in the adopted distance units. Curvature in the posterior is supposed to be a problem, but all convergence diagnostics were fine in this case.
Thanks for the reply. Yes, I tried that algebra_solver but was also getting cryptic errors. Did you have to modify the code to get it to work or just choose starting values that seemed really close to the appropriate length endpoints.
Thanks, Daniel, for the references and suggestions. I will try putting an inverse gamma prior on \phi, where the covariance function is defined as: C(d) = \sigma^2 \text{exp}(-d/\phi)). From looking at the Stan manual (Section 18.3), it seems like that is how Stan parameterizes the cov_exp_quad function. Is that correct? I have my own exponential covariance function written now, but may switch to Stan’s built in one for stability if I am still seeing problems. As you say, hopefully putting little mass on values less than that minimum distance of 25 km it will help with the divergences.
@anon75146577 just a quick follow up to our discussion. I tried the inverse gamma (IG) prior and did get fewer divergences. Thank you for the suggestion. However, for a parameter that has very little/no spatial structure, by putting an informative prior on the range, the posterior looks similar to the prior since there is no updating in the likelihood. From an empirical standpoint, when the prior and posterior look identical, I am inclined to drop the spatial term as it seems the data does not suggest spatial dependence. If the true data generating process didn’t not exhibit any spatial correlation, would you expect the posterior of the range parameter to match the IG prior or shrink towards zero? I know the IG distribution does bound values away from zero, which I am worried could suggest a spurious spatial range.