Modelling spatiotemporal random effects results in

idantene · April 30, 2021, 8:30am

Hey!

I’m trying to reproduce the model described in https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/1365-2664.12710 (appendix contains more complete formulations, see https://besjournals.onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1111%2F1365-2664.12710&file=jpe12710-sup-0001-AppendixS1.pdf).

In doing so, I’m constructing a couple of covariance matrices (one for a spatial random effect, and another for spatiotemporal random effect).
When these are introduced, I consistently get:

Rejecting initial value:
Gradient evaluated at the initial value is not finite.
Stan can’t start sampling from this initial value.

I’ve tried adding some small noise to the diagonals and I’ve tried using small non-zero initial values.
Could some underflow occur that then creates zeros where they shouldn’t be, leading to exploding gradients?

In the below model I’ve removed the spatiotemporal effect as it demonstrates the issue sufficiently as it is.

data {
    int<lower=0>       N;             // total samples
    matrix[N, 2]       S;             // site coordinates
    vector[N]          m;             // number of tows needed for a specific reef at a certain year
    vector[N]          x_inner;       // indicator whether the reef is inner-shelf
    vector[N]          x_outer;       // indicator whether the reef is outer-shelf
    vector[N]          tau_pre04;     // time (years) since pre-2004 closure
    vector[N]          tau_04;        // time (years) since post-2004 closure
    int<lower=0>       y[N];          // COTS count
}
transformed data {
    real beta_sig = 10;
}
parameters {
    real<lower=0>      r;            // overdispersion
    vector<lower=0>[2] l;            // length-scale parameters for spatial correlation
    real<lower=0>      sigma_phi;    // magnitude parameter for spatial random effects
    real<lower=0>      sigma_g;      // magnitude parameter for temporal random effects
    real<lower=0>      a_pre04;      // half-saturation constant for pre-2004 closure
    real<lower=0>      a_04;         // half-saturation constant for post-2004 closure
    vector[N]          f;
}
model {
    // priors
    a_pre04 ~ student_t(4, 0, 100);
    a_04 ~ student_t(4, 0, 100);
    l ~ student_t(4, 0, 10^3);
    r ~ gamma(4, 0.1);
    // covariance matrix
    matrix[N, N] Sigma_phi;
    for (i in 1:N) {
        for (j in 1:N) {
            Sigma_phi[i, j] = sigma_phi*exp(-sqrt((((S[i, 1]-S[j, 1])/l[1])^2) + ((S[i, 2]-S[j, 2])/l[2])^2));
        }
       Sigma_phi[i, i] += 0.0001;
    }

    vector[N] H_pre04 = tau_pre04 ./ (a_pre04 + tau_pre04);
    vector[N] H_04 = tau_04 ./ (a_04 + tau_04);

    f ~ multi_normal(rep_vector(0, N), beta_sig*identity_matrix(N) +
                                       beta_sig*x_inner*x_inner' +
                                       beta_sig*x_outer*x_outer' +
                                       beta_sig*H_pre04*H_pre04' +
                                       beta_sig*H_04*H_04' +
                                       Sigma_phi);
    y ~ neg_binomial_2(m .* exp(f), r);
}

mike-lawrence · April 30, 2021, 7:13pm

Have you tried telling the sampler to init to a narrower-than-default range? (The default is uniform from -2 to 2)

idantene · April 30, 2021, 7:47pm

Yes I have, that’s what I meant by “small non-zero initial values”. It had no effect :(

mike-lawrence · April 30, 2021, 7:58pm

Oh, shoot, sorry, I missed that you said you’d tried that. And presumably 0 doesn’t work too?

mike-lawrence · April 30, 2021, 8:01pm

Maybe also try to generate data from the priors? I don’t have a strong sense that this is to blame, but the scale of your priors look pretty extreme.

idantene · May 3, 2021, 7:06am

Unfortunately, 0 does not work either. The priors are indeed weak but the data points vary quite a lot.

I was hoping perhaps there’s a way to cut the loop generating Sigma_phi in a more elaborate and stable way that I’m unaware of. I wanted to use cov_exp_quad but it’s a bit too tailored for my use.

martinmodrak · May 3, 2021, 7:50am

Just a note: it is possible that the problem is not Sigma_phi, but something else. The error is not about some values being immediately problematic, but rather that the gradient is infinite - this is unfortunately a bit harder to debug. The possibilities include:

Some of the denominators in computing H_pre04 / H_04 being very close to 0.
f large leading to exp(y) very large
- Using neg_binomial_2_log(log(m) + f, r) might be more numerically stable. Also note that using offest in neg. binom models has some complexities not seen in Poisson models as discussed e.g. at: Scaling of the overdispersion in negative binomial models
m < 0 (but I think that would give a different error, but I am not sure)

In all cases, you can use print statements to get output from the model and see a bit better what’s happening. I would also suggest you try to simplify the model further - most notably, which terms of the covariance matrix can you remove and still see the issue?

EDIT: Additionally, in older versions of Stan (I think before 2.23, which includes the CRAN version of rstan), there were some numerical instabilities in neg_binomial_2 which might also be partly responsible.

idantene · May 3, 2021, 8:18am

Thanks Martin! I’ll try the neg_binomial_2_log for better numerical stability.
The reason I suspect Sigma_phi is because the code runs smoothly if I exclude it entirely.

I’ll have a go at your suggestions and report back, much appreciated!

martinmodrak · May 3, 2021, 9:17am

Oh, I also just noticed that l close to 0 could also be an issue. Generally, one would use priors on the length-scale to a-priori avoid length-scales smaller than the minimal distance between two observed points or larger than maximal distance between two observed points as discussed e.g. at Robust Gaussian Processes in Stan

idantene · May 4, 2021, 9:16pm

Thanks Martin!
I’ve tried your suggestions and here are my results. Bear in mind, to test these out I’ve truncated my dataset to 10 samples as a sanity test.

Using neg_binomial_2_log did not have any effect.
For m it is always the case that m >= 0
In all rejected cases, l >> 0
The denominators for H_pre04 and H_04 are also strictly larger than 0. More importantly, if I remove Sigma_phi from the covariance, the code executes flawlessly.
I’ve taken additional steps to try and optimize this step by applying the Cholesky decomposition and using the multi_normal_cholesky for f - same results.
Printing out Sigma_phi reveals minscule numeric values (e.g. 3.74265e-119)
Removing any (and all terms) except Sigma_phi still leads to this error. Removing Sigma_phi results in a smooth run.

idantene · May 4, 2021, 9:53pm

Eureka! While playing around with Sigma_phi's loop and trying to simplify things, I randomly thought "hah, why am I event calculating the case when i==j, that would simply be sigma_phi".

I’ve then updated the loop to look like so:

for (i in 1:N) {
    for (j in (i+1):N) {
        Sigma_phi[i, j] = sigma_phi*exp(-sqrt((((S[i, 1]-S[j, 1])/l[1])^2) + ((S[i, 2]-S[j, 2])/l[2])^2));
        Sigma_phi[j, i] = Sigma_phi[i, j]; // copy symmetry
    }
    Sigma_phi[i, i] = sigma_phi + 1e-6; // jitter
}

This now works as expected!

Curiously, I wanted to see if perhaps the values are different themselves, so I added the following printout in the outer loop:

print(sigma_phi + 1e-6, " ", sigma_phi*exp(-sqrt((((S[i, 1]-S[i, 1])/l[1])^2) + ((S[i, 2]-S[i, 2])/l[2])^2)));

and this caused the same error, even though it’s not directly saved anywhere!

jsocolar · May 5, 2021, 3:41am

So just for the record, here’s a simple reprex. The following stan program errors when run via cmdstanr (cmdstan 2.26.1).

parameters{
  real x;
}
model{
  x ~ std_normal();
  print(sqrt(x - x));
}

If I replace x-x with 0 inside the sqrt(), it runs. If I replace with -1, it also runs (and returns nan). If I replace with exp(-300), it still runs.

Seems like kinda funky behavior and also a weird error message…

Edited to add: the error message is

Chain 1 Rejecting initial value:
Chain 1   Gradient evaluated at the initial value is not finite.
Chain 1   Stan can't start sampling from this initial value.
Chain 1 Initialization between (-2, 2) failed after 100 attempts. 
Chain 1  Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
Chain 1 Initialization failed.
Warning: Chain 1 finished unexpectedly!

idantene · May 5, 2021, 12:56pm

Cool! Seems like this is perhaps some bug in Stan then? Should I raise this as an issue on GH?

Funko_Unko · May 5, 2021, 1:46pm

Well technically the gradient of the square root is not finite at zero.

Edit: Ah, but it only gets printed D:

jsocolar · May 5, 2021, 1:51pm

If we put the print statement in generated quantities everything is fine.

So the issue, I guess, is that Stan is taking the gradients of everything in the model block, including the variable that is implicitly defined for printing. Probably this is intentional and not a bug, and probably the error message is exactly correct.

Edit:
And Stan recognizes x-x as a function of parameters, rather than as a constant 0, which probably shouldn’t be surprising…

idantene · May 5, 2021, 2:16pm

I see, good stuff, thanks for clarifying!

martinmodrak · May 5, 2021, 6:07pm

I think (though my understanding of the autodiff is a bit foggy, so would be great if somebody more knowledgable could confirm - maybe @stevebronder ?) that all autodiff variables are put on a stack as they are created (including all intermediate expressions). Then, for the reverse pass, the whole stack is traversed to propagate the gradients. Stan actually has no idea which expressions involving autodiff variables ended up contributing to the target and which did not. In that sense this is IMHO expected behaviour.

EDIT: Yeah, this is apparently the case, looking at the code at math/grad.hpp at b9944754f973de6638ce21105a98a6d490e3893f · stan-dev/math · GitHub

I also think that since the parser moved to OCaml there are plans to actually do some optimizations and other tricks over the parsed program which could lead to stuff like recognizing that x - x is always zero, but I don’t think there’s anything like this in the current release.

stevebronder · May 5, 2021, 7:44pm

Martin that’s correct

The code for sqrt I’d here

github.com

stan-dev/math/blob/b9944754f973de6638ce21105a98a6d490e3893f/stan/math/rev/fun/sqrt.hpp#L44


  \begin{cases}
    \textrm{NaN} & x < 0 \\
    \frac{1}{2\sqrt{x}} & x\geq 0\\[6pt]
    \textrm{NaN} & \mbox{if } x = \textrm{NaN}
  \end{cases}
  \f]
*
* @param a Variable whose square root is taken.
* @return Square root of variable.
*/
inline var sqrt(const var& a) {
 return make_callback_var(std::sqrt(a.val()), [a](auto& vi) mutable {
   a.adj() += vi.adj() / (2.0 * vi.val());
 });
}


/**
* Return elementwise square root of vector
*
* @tparam T a `var_value` with inner Eigen type
* @param a input

inline var sqrt(const var& a) {
 return make_callback_var(std::sqrt(a.val()), [a](auto& vi) mutable { 
  a.adj() += vi.adj() / (2.0 * vi.val()); 
  });
}

vi.val() is the std::sqrt(a.val()) and since that’s zero you run into division by zero.

I think we could put a more informative error message here by checking whether the value is exactly zero. I think it would be nice to have some sort of debug mode where we throw errors even for numerical stability checks, if we see NaN, etc.

idantene · May 5, 2021, 7:51pm

A debug mode for numerical stability checks would be great tbh.

I think the community (or at least newcomers such as myself) might also benefit from a “typical errors and their cause” with some examples (or maybe such a page exists and I just missed it…?)

stevebronder · May 5, 2021, 7:58pm

Yeah I need to think about that. I wonder if some of these can be caught with the new compilers pedantic mode. And it could turn on a macro we can then use in the math library to tell it to be more exhaustive in error checks

Topic		Replies	Views
Rejecting initial value - Gradient evaluated at the initial value is not finite Modeling fitting-issues , specification	1	40	August 22, 2024
Issue with model: Gradient evaluated at the initial value is not finite Modeling rstan , specification , multinomial-response , hierarchical-model , dirichlet-multinomial	9	487	September 14, 2023
Help with Multivariate Autoregressive Model with random effects Modeling rstan , specification , performance , multivariate-normal , autoregressive-model	2	185	April 26, 2024
Paper and R package for spatial models Publicity ecology , public-health , spatial , rstantools	0	882	July 2, 2020
Rejecting initial value: / Gradient evaluated at the initial value is not finite. / Stan can't start sampling from this initial value Modeling rstan , fitting-issues	5	382	March 29, 2024

Modelling spatiotemporal random effects results in

Related topics