Solved. Simple single parameter model, yet: Gradient evaluated at the initial value is not finite

alexander-pastukhov · September 23, 2020, 2:40pm

Update:
Silly me! It is the log(current_history) when current_history == 0, which is of course -Inf, that was causing the problem. It is “corrected” by the exp (-current_history)=exp(Inf) that is 0, which is why the final numbers were correct (and why R never complained). However, having that negative infinity, even it was exponentiated to 0 later on, apparently, still breaks the gradient computation in some (subtle?) ways.

Original post:
I am new to Stan and I am trying to program a simple history integrator. And I am REALLY STUCK! Any help, hint, suggestion is highly appreciated. I suspect that it is because of local variables that I use in the loop but I have not a slightest clue of what am I doing so terribly wrong. :(

Here is the small data frame I use for the example.

test_df <- tibble(
  Duration = c(3.181, 2.553, 2.552, 2.976, 3.094, 3.517, 4.764, 2.894, 2.718, 1.435, 2.105, 2.799, 5.398, 2.84699999999999, 3.48099999999999, 3.058, 4.023, 2.599, 2.552, 2.341),
  IsOn = c(0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1),
  Tmean = 2.5213
)

The history starts at 0 state. Whenever the state IsOn, it grows as negative exponential to 1, whenever it is not IsOn it drops as negative exponential. The grow/drop speed is controlled via a single parameter norm_tau, so that actual grow scale is tau = norm_tau * Tmean to match the average duration. Here is a simple implementation of it in R for norm_tau=1:

analytic_history <- function(df, norm_tau){
  df$History <- NA

  tau <- norm_tau * df$Tmean[1]
  current_history <- 0
  
  for(iT in 1:nrow(df)){
    df$History[iT] <- current_history

    if (df$IsOn[iT]){
      current_history <-  1 - exp(-(df$Duration[iT] - tau * log(1 - current_history)) / tau)
    }
    else {
      current_history <- exp(-(df$Duration[iT] - tau * log(current_history)) / tau)
    }
  }
  
  df
}

analytical_df <- analytic_history(test_df, 1)
analytical_df

As a first simple step, I wanted to implement the same algorithm in Stan and, as a test, let it recover the norm_tau parameter. Here is the same code implemented in Stan:

data{
  int<lower=1> N;
  real<lower=0> Duration[N];
  real<lower=0,upper=1> IsOn[N];
  real<lower=0,upper=1> History[N];
  real Tmean;
}
parameters{
  real<lower=0.9, upper=1.1> norm_tau;
  real<lower=0> sigma;
}
transformed parameters{
  real<lower=0, upper=1> Predictions[N];
  {
    real tau = norm_tau * Tmean;
    real current_history = 0;
   
    for(iT in 1:N){
      Predictions[iT] = current_history;

      if (IsOn[iT] > 0){
        current_history =  1 - exp(-(Duration[iT] - tau * log(1 - current_history)) / tau);
      }
      else {
        current_history = exp(-(Duration[iT] - tau * log(current_history)) / tau);
      }
    } 
  }
}
model{
  norm_tau ~ normal(1, 0.1);
  sigma ~ exponential(1);
  History ~ normal(Predictions, sigma);
}

And here is the sampling call

test_data <- list(
  N = nrow(analytical_df),
  Duration = analytical_df$Duration,
  IsOn = analytical_df$IsOn,
  History = analytical_df$History,
  Tmean = analytical_df$Tmean[1]
)

test_init <- function(){
  list(norm_tau = 1, sigma = 0.1)
}

test_fit <- rstan::sampling(single_state_history, 
                            chains=1,
                            data=test_data,
                            init=test_init)

However, if I run it, I get
Chain 1: Rejecting initial value:
Chain 1: Gradient evaluated at the initial value is not finite.
Chain 1: Stan can’t start sampling from this initial value.
Chain 1:
Chain 1: Initialization between (-2, 2) failed after 100 attempts.
Chain 1: Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] “Error in sampler$call_sampler(args_list[[i]]) : Initialization failed.”
[1] “error occurred during calling the sampler; sampling not done”

If I fix tau = Tmean in the code (throwing out **norm_tau * ** bit, sampling works and the numbers for Predictions match those of History exactly (so, the code does compute the same thing). It makes no difference whether I init norm_tau to 1 (as I do in the code above) or to “random”. It makes no difference whether I restrict norm_tau (in the code above to be between 0.9 and 1.1) or just state <lower=0> or let it be unrestricted and use exp(norm_tau). I’ve also printed out values during the sampling, all finite, within a reasonable range or matching History values exactly, if initializing to norm_tau == 1.

To recap, I am stuck as I cannot understand what sort of diagnostics should I look at and what could I possibly be doing wrong. Any help is highly appreciated!

bbbales2 · September 23, 2020, 4:31pm

I wonder if that the constraint on Predictions is getting violated somehow.

Can you do:

real Predictions[N];

instead of:

real<lower=0, upper=1> Predictions[N];

Stan can handle constraints in parameters by behind the scenes reparameterizing the calculation. In transformed parameters though, if the constraint is violated, it won’t be able to evaluate the log density (or gradient).

alexander-pastukhov · September 23, 2020, 4:44pm

Thanks for the suggestion but unfortunately this didn’t help, same problem. :(

alexander-pastukhov · September 23, 2020, 4:48pm

I’ve tried to exclude temporary variables completely (see below) but that doesn’t help either… So, this has something to do with how each Prediction depends on the previous value but why would that work if norm_tau is a constant rather than a parameter? Or, alternatively, why is sampling the parameter breaks this? BTW, if I include other parameters, so that Prediction[It] = a + b * ... and I a ~ normal(0, 1) and b ~ normal(0, 1) than having these two parameters is okay. It is only the tau that breaks everything.

transformed parameters{
  real Predictions[N];

  Predictions[1] = 0;
  for(iT in 2:N){
    if (IsOn[iT-1] > 0){
      Predictions[iT] =  1 - exp(-(Duration[iT-1] - Tmean * norm_tau * log(1 - Predictions[iT-1])) / (Tmean * norm_tau));
    }
    else {
      Predictions[iT] = exp(-(Duration[iT-1] - Tmean * norm_tau * log(Predictions[iT-1])) / (Tmean * norm_tau));
    }
  } 
}

alexander-pastukhov · September 23, 2020, 5:02pm

Moving everything to model directly also does not work:

model{
  norm_tau ~ normal(1, 0.1);
  sigma ~ exponential(1);
  {
      real current_history = 0;
      for(iT in 1:N){
        History[iT] ~ normal(current_history, sigma);
        
      if (IsOn[iT] > 0){
        current_history =  1 - exp(-(Duration[iT] - (norm_tau * Tmean) * log(1 - current_history)) / (norm_tau * Tmean));
      }
      else {
        current_history = exp(-(Duration[iT] - (norm_tau * Tmean) * log(current_history)) / (norm_tau * Tmean));
      }
    }
  }
}

bbbales2 · September 29, 2020, 12:29pm

Sorry I just saw this in my unreads. The title says Solved now. Did you figure out what the problem was in the end?

alexander-pastukhov · September 29, 2020, 12:52pm

Yes, I missed the fact that I sometimes compute log(0) which is, of course, negative infinity. I missed it because it is inside an exponential, so exp(log(0)) gives me a nice 0 as a final value, which is why everything looked sane in printouts. But, apparently, that breaks the gradient computation. I just had to ensure that I never explicitly evaluate exp(log(0)), as it gives me a trivial 0 anyhow.

Topic		Replies	Views
Gradient evaluated at the initial value is not finite Modeling rstan , fitting-issues	1	568	January 1, 2022
Please delete Modeling	4	505	May 14, 2018
Gradient evaluated at the initial value is not finite although the lp is finite Modeling cognitive-science	12	3813	April 13, 2020
Issue with model: Gradient evaluated at the initial value is not finite Modeling rstan , specification , multinomial-response , hierarchical-model , dirichlet-multinomial	9	485	September 14, 2023
How to debug "Gradient evaluated at the initial value is not finite" error Modeling ode	2	855	June 5, 2021

Solved. Simple single parameter model, yet: Gradient evaluated at the initial value is not finite

Related topics