Solved. Simple single parameter model, yet: Gradient evaluated at the initial value is not finite

Update:
Silly me! It is the log(current_history) when current_history == 0, which is of course -Inf, that was causing the problem. It is “corrected” by the exp (-current_history)=exp(Inf) that is 0, which is why the final numbers were correct (and why R never complained). However, having that negative infinity, even it was exponentiated to 0 later on, apparently, still breaks the gradient computation in some (subtle?) ways.

Original post:
I am new to Stan and I am trying to program a simple history integrator. And I am REALLY STUCK! Any help, hint, suggestion is highly appreciated. I suspect that it is because of local variables that I use in the loop but I have not a slightest clue of what am I doing so terribly wrong. :(

Here is the small data frame I use for the example.

test_df <- tibble(
  Duration = c(3.181, 2.553, 2.552, 2.976, 3.094, 3.517, 4.764, 2.894, 2.718, 1.435, 2.105, 2.799, 5.398, 2.84699999999999, 3.48099999999999, 3.058, 4.023, 2.599, 2.552, 2.341),
  IsOn = c(0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1),
  Tmean = 2.5213
)

The history starts at 0 state. Whenever the state IsOn, it grows as negative exponential to 1, whenever it is not IsOn it drops as negative exponential. The grow/drop speed is controlled via a single parameter norm_tau, so that actual grow scale is tau = norm_tau * Tmean to match the average duration. Here is a simple implementation of it in R for norm_tau=1:

analytic_history <- function(df, norm_tau){
  df$History <- NA

  tau <- norm_tau * df$Tmean[1]
  current_history <- 0
  
  for(iT in 1:nrow(df)){
    df$History[iT] <- current_history

    if (df$IsOn[iT]){
      current_history <-  1 - exp(-(df$Duration[iT] - tau * log(1 - current_history)) / tau)
    }
    else {
      current_history <- exp(-(df$Duration[iT] - tau * log(current_history)) / tau)
    }
  }
  
  df
}

analytical_df <- analytic_history(test_df, 1)
analytical_df

As a first simple step, I wanted to implement the same algorithm in Stan and, as a test, let it recover the norm_tau parameter. Here is the same code implemented in Stan:

data{
  int<lower=1> N;
  real<lower=0> Duration[N];
  real<lower=0,upper=1> IsOn[N];
  real<lower=0,upper=1> History[N];
  real Tmean;
}
parameters{
  real<lower=0.9, upper=1.1> norm_tau;
  real<lower=0> sigma;
}
transformed parameters{
  real<lower=0, upper=1> Predictions[N];
  {
    real tau = norm_tau * Tmean;
    real current_history = 0;
   
    for(iT in 1:N){
      Predictions[iT] = current_history;

      if (IsOn[iT] > 0){
        current_history =  1 - exp(-(Duration[iT] - tau * log(1 - current_history)) / tau);
      }
      else {
        current_history = exp(-(Duration[iT] - tau * log(current_history)) / tau);
      }
    } 
  }
}
model{
  norm_tau ~ normal(1, 0.1);
  sigma ~ exponential(1);
  History ~ normal(Predictions, sigma);
}

And here is the sampling call

test_data <- list(
  N = nrow(analytical_df),
  Duration = analytical_df$Duration,
  IsOn = analytical_df$IsOn,
  History = analytical_df$History,
  Tmean = analytical_df$Tmean[1]
)

test_init <- function(){
  list(norm_tau = 1, sigma = 0.1)
}

test_fit <- rstan::sampling(single_state_history, 
                            chains=1,
                            data=test_data,
                            init=test_init)

However, if I run it, I get
Chain 1: Rejecting initial value:
Chain 1: Gradient evaluated at the initial value is not finite.
Chain 1: Stan can’t start sampling from this initial value.
Chain 1:
Chain 1: Initialization between (-2, 2) failed after 100 attempts.
Chain 1: Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] “Error in sampler$call_sampler(args_list[[i]]) : Initialization failed.”
[1] “error occurred during calling the sampler; sampling not done”

If I fix tau = Tmean in the code (throwing out **norm_tau * ** bit, sampling works and the numbers for Predictions match those of History exactly (so, the code does compute the same thing). It makes no difference whether I init norm_tau to 1 (as I do in the code above) or to “random”. It makes no difference whether I restrict norm_tau (in the code above to be between 0.9 and 1.1) or just state <lower=0> or let it be unrestricted and use exp(norm_tau). I’ve also printed out values during the sampling, all finite, within a reasonable range or matching History values exactly, if initializing to norm_tau == 1.

To recap, I am stuck as I cannot understand what sort of diagnostics should I look at and what could I possibly be doing wrong. Any help is highly appreciated!

1 Like

I wonder if that the constraint on Predictions is getting violated somehow.

Can you do:

real Predictions[N];

instead of:

real<lower=0, upper=1> Predictions[N];

Stan can handle constraints in parameters by behind the scenes reparameterizing the calculation. In transformed parameters though, if the constraint is violated, it won’t be able to evaluate the log density (or gradient).

Thanks for the suggestion but unfortunately this didn’t help, same problem. :(

I’ve tried to exclude temporary variables completely (see below) but that doesn’t help either… So, this has something to do with how each Prediction depends on the previous value but why would that work if norm_tau is a constant rather than a parameter? Or, alternatively, why is sampling the parameter breaks this? BTW, if I include other parameters, so that Prediction[It] = a + b * ... and I a ~ normal(0, 1) and b ~ normal(0, 1) than having these two parameters is okay. It is only the tau that breaks everything.

transformed parameters{
  real Predictions[N];

  Predictions[1] = 0;
  for(iT in 2:N){
    if (IsOn[iT-1] > 0){
      Predictions[iT] =  1 - exp(-(Duration[iT-1] - Tmean * norm_tau * log(1 - Predictions[iT-1])) / (Tmean * norm_tau));
    }
    else {
      Predictions[iT] = exp(-(Duration[iT-1] - Tmean * norm_tau * log(Predictions[iT-1])) / (Tmean * norm_tau));
    }
  } 
}

Moving everything to model directly also does not work:

model{
  norm_tau ~ normal(1, 0.1);
  sigma ~ exponential(1);
  {
      real current_history = 0;
      for(iT in 1:N){
        History[iT] ~ normal(current_history, sigma);
        
      if (IsOn[iT] > 0){
        current_history =  1 - exp(-(Duration[iT] - (norm_tau * Tmean) * log(1 - current_history)) / (norm_tau * Tmean));
      }
      else {
        current_history = exp(-(Duration[iT] - (norm_tau * Tmean) * log(current_history)) / (norm_tau * Tmean));
      }
    }
  }
}

Sorry I just saw this in my unreads. The title says Solved now. Did you figure out what the problem was in the end?

Yes, I missed the fact that I sometimes compute log(0) which is, of course, negative infinity. I missed it because it is inside an exponential, so exp(log(0)) gives me a nice 0 as a final value, which is why everything looked sane in printouts. But, apparently, that breaks the gradient computation. I just had to ensure that I never explicitly evaluate exp(log(0)), as it gives me a trivial 0 anyhow.

2 Likes