Gradient evaluation took 0 seconds?


I’m trying to run a model using the stan function in rstan on my university’s supercomputer, and I was wondering if it’s normal that it keeps saying that the gradient evaluation took 0 seconds for each of my chains:

Chain 1: Gradient evaluation took 0 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.

I tried running the eight schools example model as a test, and for that model too, it says that the gradient evaluation took 0 seconds. This was the code that I used for that:

my_code <- '
data {
  int<lower=0> J;         // number of schools 
  real y[J];              // estimated treatment effects
  real<lower=0> sigma[J]; // standard error of effect estimates 
parameters {
  real mu;                // population treatment effect
  real<lower=0> tau;      // standard deviation in treatment effects
  vector[J] eta;          // unscaled deviation from mu by school
transformed parameters {
  vector[J] theta = mu + tau * eta;        // school treatment effects
model {
  target += normal_lpdf(eta | 0, 1);       // prior log-density
  target += normal_lpdf(y | theta, sigma); // log-likelihood

schools_dat <- list(J = 8, 
                    y = c(28,  8, -3,  7, -1,  1, 18, 12),
                    sigma = c(15, 10, 16, 11,  9, 11, 10, 18))

fit <- stan(model_code = my_code, data = schools_dat)

When I run both my model and the eight schools model on my Mac, it gives me a non-zero value; it has never given me a zero value.

Basically, my question is: is it normal for it to say that the gradient evaluation took 0 seconds, or does that indicate a bigger problem?

I’m asking because I ran my model on the same small test dataset using both the supercomputer and my Mac. On my Mac, the maximum Rhat is equal to 1 and the minimum number of effective samples is greater than the cutoff that I specify, which is good. However, on the supercomputer, that is not the case, and it tries rerunning my model with more iterations, but it doesn’t look like that solves the problem (because it keeps trying to rerun the model with even more iterations), so I’m trying to understand why I’m getting different results using the supercomputer.

Hi, and sorry that you’ve waited so long for a response. I cannot say what’s going on with the 0 second gradient evaluation, but I would say that if you consistently get substantially worse R-hats on the cluster than you do locally, it is indicative of some kind of problem. Perhaps the gradient evaluation is related to the same problem. Maybe @andrjohns has an idea?