 # Modelling spatiotemporal random effects results in

Hey!

I’m trying to reproduce the model described in https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/1365-2664.12710 (appendix contains more complete formulations, see https://besjournals.onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1111%2F1365-2664.12710&file=jpe12710-sup-0001-AppendixS1.pdf).

In doing so, I’m constructing a couple of covariance matrices (one for a spatial random effect, and another for spatiotemporal random effect).
When these are introduced, I consistently get:

Rejecting initial value:
Gradient evaluated at the initial value is not finite.
Stan can’t start sampling from this initial value.

I’ve tried adding some small noise to the diagonals and I’ve tried using small non-zero initial values.
Could some underflow occur that then creates zeros where they shouldn’t be, leading to exploding gradients?

In the below model I’ve removed the spatiotemporal effect as it demonstrates the issue sufficiently as it is.

data {
int<lower=0>       N;             // total samples
matrix[N, 2]       S;             // site coordinates
vector[N]          m;             // number of tows needed for a specific reef at a certain year
vector[N]          x_inner;       // indicator whether the reef is inner-shelf
vector[N]          x_outer;       // indicator whether the reef is outer-shelf
vector[N]          tau_pre04;     // time (years) since pre-2004 closure
vector[N]          tau_04;        // time (years) since post-2004 closure
int<lower=0>       y[N];          // COTS count
}
transformed data {
real beta_sig = 10;
}
parameters {
real<lower=0>      r;            // overdispersion
vector<lower=0> l;            // length-scale parameters for spatial correlation
real<lower=0>      sigma_phi;    // magnitude parameter for spatial random effects
real<lower=0>      sigma_g;      // magnitude parameter for temporal random effects
real<lower=0>      a_pre04;      // half-saturation constant for pre-2004 closure
real<lower=0>      a_04;         // half-saturation constant for post-2004 closure
vector[N]          f;
}
model {
// priors
a_pre04 ~ student_t(4, 0, 100);
a_04 ~ student_t(4, 0, 100);
l ~ student_t(4, 0, 10^3);
r ~ gamma(4, 0.1);
// covariance matrix
matrix[N, N] Sigma_phi;
for (i in 1:N) {
for (j in 1:N) {
Sigma_phi[i, j] = sigma_phi*exp(-sqrt((((S[i, 1]-S[j, 1])/l)^2) + ((S[i, 2]-S[j, 2])/l)^2));
}
Sigma_phi[i, i] += 0.0001;
}

vector[N] H_pre04 = tau_pre04 ./ (a_pre04 + tau_pre04);
vector[N] H_04 = tau_04 ./ (a_04 + tau_04);

f ~ multi_normal(rep_vector(0, N), beta_sig*identity_matrix(N) +
beta_sig*x_inner*x_inner' +
beta_sig*x_outer*x_outer' +
beta_sig*H_pre04*H_pre04' +
beta_sig*H_04*H_04' +
Sigma_phi);
y ~ neg_binomial_2(m .* exp(f), r);
}


Have you tried telling the sampler to init to a narrower-than-default range? (The default is uniform from -2 to 2)

1 Like

Yes I have, that’s what I meant by “small non-zero initial values”. It had no effect :(

Oh, shoot, sorry, I missed that you said you’d tried that. And presumably 0 doesn’t work too?

Maybe also try to generate data from the priors? I don’t have a strong sense that this is to blame, but the scale of your priors look pretty extreme.

Unfortunately, 0 does not work either. The priors are indeed weak but the data points vary quite a lot.

I was hoping perhaps there’s a way to cut the loop generating Sigma_phi in a more elaborate and stable way that I’m unaware of. I wanted to use cov_exp_quad but it’s a bit too tailored for my use.

Just a note: it is possible that the problem is not Sigma_phi, but something else. The error is not about some values being immediately problematic, but rather that the gradient is infinite - this is unfortunately a bit harder to debug. The possibilities include:

• Some of the denominators in computing H_pre04 / H_04 being very close to 0.
• f large leading to exp(y) very large
• m < 0 (but I think that would give a different error, but I am not sure)

In all cases, you can use print statements to get output from the model and see a bit better what’s happening. I would also suggest you try to simplify the model further - most notably, which terms of the covariance matrix can you remove and still see the issue?

EDIT: Additionally, in older versions of Stan (I think before 2.23, which includes the CRAN version of rstan), there were some numerical instabilities in neg_binomial_2 which might also be partly responsible.

1 Like

Thanks Martin! I’ll try the neg_binomial_2_log for better numerical stability.
The reason I suspect Sigma_phi is because the code runs smoothly if I exclude it entirely.

I’ll have a go at your suggestions and report back, much appreciated!

Oh, I also just noticed that l close to 0 could also be an issue. Generally, one would use priors on the length-scale to a-priori avoid length-scales smaller than the minimal distance between two observed points or larger than maximal distance between two observed points as discussed e.g. at Robust Gaussian Processes in Stan

Thanks Martin!
I’ve tried your suggestions and here are my results. Bear in mind, to test these out I’ve truncated my dataset to 10 samples as a sanity test.

• Using neg_binomial_2_log did not have any effect.
• For m it is always the case that m >= 0
• In all rejected cases, l >> 0
• The denominators for H_pre04 and H_04 are also strictly larger than 0. More importantly, if I remove Sigma_phi from the covariance, the code executes flawlessly.
• I’ve taken additional steps to try and optimize this step by applying the Cholesky decomposition and using the multi_normal_cholesky for f - same results.
• Printing out Sigma_phi reveals minscule numeric values (e.g. 3.74265e-119)
• Removing any (and all terms) except Sigma_phi still leads to this error. Removing Sigma_phi results in a smooth run.
1 Like

Eureka! While playing around with Sigma_phi's loop and trying to simplify things, I randomly thought "hah, why am I event calculating the case when i==j, that would simply be sigma_phi".

I’ve then updated the loop to look like so:

for (i in 1:N) {
for (j in (i+1):N) {
Sigma_phi[i, j] = sigma_phi*exp(-sqrt((((S[i, 1]-S[j, 1])/l)^2) + ((S[i, 2]-S[j, 2])/l)^2));
Sigma_phi[j, i] = Sigma_phi[i, j]; // copy symmetry
}
Sigma_phi[i, i] = sigma_phi + 1e-6; // jitter
}


This now works as expected!

Curiously, I wanted to see if perhaps the values are different themselves, so I added the following printout in the outer loop:

print(sigma_phi + 1e-6, " ", sigma_phi*exp(-sqrt((((S[i, 1]-S[i, 1])/l)^2) + ((S[i, 2]-S[i, 2])/l)^2)));


and this caused the same error, even though it’s not directly saved anywhere!

2 Likes

So just for the record, here’s a simple reprex. The following stan program errors when run via cmdstanr (cmdstan 2.26.1).

parameters{
real x;
}
model{
x ~ std_normal();
print(sqrt(x - x));
}


If I replace x-x with 0 inside the sqrt(), it runs. If I replace with -1, it also runs (and returns nan). If I replace with exp(-300), it still runs.

Seems like kinda funky behavior and also a weird error message…

Edited to add: the error message is

Chain 1 Rejecting initial value:
Chain 1   Gradient evaluated at the initial value is not finite.
Chain 1   Stan can't start sampling from this initial value.
Chain 1 Initialization between (-2, 2) failed after 100 attempts.
Chain 1  Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
Chain 1 Initialization failed.
Warning: Chain 1 finished unexpectedly!

1 Like

Cool! Seems like this is perhaps some bug in Stan then? Should I raise this as an issue on GH?

1 Like

Well technically the gradient of the square root is not finite at zero.

Edit: Ah, but it only gets printed D:

1 Like

If we put the print statement in generated quantities everything is fine.

So the issue, I guess, is that Stan is taking the gradients of everything in the model block, including the variable that is implicitly defined for printing. Probably this is intentional and not a bug, and probably the error message is exactly correct.

Edit:
And Stan recognizes x-x as a function of parameters, rather than as a constant 0, which probably shouldn’t be surprising…

I see, good stuff, thanks for clarifying!

1 Like

I think (though my understanding of the autodiff is a bit foggy, so would be great if somebody more knowledgable could confirm - maybe @stevebronder ?) that all autodiff variables are put on a stack as they are created (including all intermediate expressions). Then, for the reverse pass, the whole stack is traversed to propagate the gradients. Stan actually has no idea which expressions involving autodiff variables ended up contributing to the target and which did not. In that sense this is IMHO expected behaviour.

EDIT: Yeah, this is apparently the case, looking at the code at math/grad.hpp at b9944754f973de6638ce21105a98a6d490e3893f · stan-dev/math · GitHub

I also think that since the parser moved to OCaml there are plans to actually do some optimizations and other tricks over the parsed program which could lead to stuff like recognizing that x - x is always zero, but I don’t think there’s anything like this in the current release.

2 Likes

Martin that’s correct

The code for sqrt I’d here

inline var sqrt(const var& a) {
return make_callback_var(std::sqrt(a.val()), [a](auto& vi) mutable {
});
}


vi.val() is the std::sqrt(a.val()) and since that’s zero you run into division by zero.

I think we could put a more informative error message here by checking whether the value is exactly zero. I think it would be nice to have some sort of debug mode where we throw errors even for numerical stability checks, if we see NaN, etc.

3 Likes

A debug mode for numerical stability checks would be great tbh.

I think the community (or at least newcomers such as myself) might also benefit from a “typical errors and their cause” with some examples (or maybe such a page exists and I just missed it…?)

1 Like

Yeah I need to think about that. I wonder if some of these can be caught with the new compilers pedantic mode. And it could turn on a macro we can then use in the math library to tell it to be more exhaustive in error checks

1 Like