Initialization failed till usage of skew normal distribution in likelyhood

lex · October 16, 2019, 2:02pm

Hello! This is my model. When I try to sample from it I get this error:
Initialization between (-2, 2) failed after 100 attempts.
Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can’t start sampling from this initial value.

If I replace skew_normal with normal, then sampling works.
Could you help to fix that problem please?

functions{
  real multiplicative_impact(
    real value, real feature, real impact_coef, real impact_bias) {
    return exp(impact_coef * real_feature + impact_bias) * value;
  }
}
data {
  int<lower=0> N;
  real x1[N];
  real x2[N];
  real x3[N];

  real y[N];
}
parameters {
  real coef1;
  real coef2;
  real<lower=0, upper=1> vote_coef;

  real impact_coef;
  real impact_bias;

  real<lower=0> omega;
  real alpha;
}
model {
  real mean;
  real impact_mean;
  coef1 ~ normal(0, 1);
  coef2 ~ normal(0, 1);
  vote_coef ~ normal(0.5, 0.05);
  impact_coef ~ normal(0, 1);
  impact_bias ~ normal(0, 1);
  omega ~ cauchy(0, 0.05);
  alpha ~ normal(0, 1);
  for (n in 1:N) {
    mean = vote_coef * coef1 * x1[n] + (1 - vote_coef) * coef2 *  x2[n];
    impact_mean = multiplicative_impact(mean, x3[n], impact_coef, impact_bias);
    y[n] ~ skew_normal(impact_mean, omega, alpha);
  }
}

Guido_Biele · October 16, 2019, 4:34pm

As you probably know, the error message indicates that the data are impossible given the model and parameters tried out during initialization. This suggests that your priors could be improved.

You could do a prior predictive check to investigate the data broadly implied by your priors.

To do this, comment out the line with skew_normal from your code and add a generated quantities block in which you calculate impact_mean as in the model block and use skew_normal_rng to generate hypothetical y. Run the model for 1000 or so iterations and then plot iteration-wise histograms (I like to overlay a partially transparent histograms ) to check if the data implied by your model and parameters are broadly consistent with data that are plausible (given domain expertise).

Hope this helps :-)

lex · October 16, 2019, 6:10pm

I find it strange that everything works with normal distribution. If I am not mistaken skew normal is more general case of normal distribution in other words skew normal is normal up to a constant when alpha == 0

nhuurre · October 16, 2019, 6:58pm

It’s also possible that Stan’s skew normal implementation just has a bug. The initial guess is usually not very good and that multiplicative_impact function can easily push impact_mean very far from y. Far in the left tail of skew_normal the calculation underflows and the sampler cannot start.

Here, this should be a bit more stable approximation

functions {
  real robust_skew_normal_lpdf(real y, real xi, real omega, real alpha) {
    real diff = alpha * (y - xi)/omega;
    if (diff > -35) {
      return skew_normal_lpdf(y| xi, omega, alpha);
    } else {
      return normal_lpdf(y| xi, omega) - 0.5*square(diff) - log(-diff) - 0.2266;
    }
  }
}
model {
  ...
    y[n] ~ robust_skew_normal(impact_mean, omega, alpha);
}

lex · October 18, 2019, 9:58am

Thank you, now I have no problems with initialization. But in sampling report I have n_eff == 1 and Rhat == nan for each parameter (only for lp_ Rhat = 2.0e5). May be it is connected with the fact that I show here simplified version of my program for easier understanding, and for my full script something goes wrong way. Could you please tell me the algorithm of finding constants -35 and 0.2266, may be I can adapt them for my program, to run it properly?

nhuurre · October 18, 2019, 11:37am

That -35 was found by running something like

data {}
model {}
generated quantities {
  vector[100] s;
  for (i in 1:100)
    s[i] = skew_normal_lpdf(-i| 0, 1, 1);
}

for one iteration and observing that everything beyond -37 or so was negative infinity. I chose 0.2266 because that made the function graph look approximately continuous at the branch point.

Sounds like all the chains are getting stuck badly. Does it warn about divergent transitions?

lex · October 18, 2019, 12:14pm

I can not find it in traceback.
There are all unique traceback messages:

Rejecting initial value:
Gradient evaluated at the initial value is not finite.
Stan can’t start sampling from this initial value.

Gradient evaluation took 0.000146 seconds
1000 transitions using 10 leapfrog steps per transition would take 1.46 seconds.
Adjust your expectations accordingly!

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: Exception: normal_lpdf: Location parameter is inf, but must be finite! (in ‘unknown file name’ at line 14)
(in ‘unknown file name’ at line 155)

nhuurre · October 18, 2019, 12:21pm

Which interface are you using? RStan or PyStan? The interface should have a diagnose() function you can use on the fit object.
Can you extract the parameter values from the fit?

lex · October 18, 2019, 2:22pm

PyStan, I updates to 2.19 and now my program (corrected with your approaches) sometimes falls with initialization fail, sometimes runs with this messages(here is information about divergence):

WARNING:pystan:n_eff / iter below 0.001 indicates that the effective sample size has likely been overestimated
WARNING:pystan:Rhat above 1.1 or below 0.9 indicates that the chains very likely have not mixed
WARNING:pystan:951 of 2000 iterations ended with a divergence (47.5 %).
WARNING:pystan:Try running with adapt_delta larger than 0.8 to remove the divergences.
WARNING:pystan:Chain 2: E-BFMI = 7.3e-05
WARNING:pystan:E-BFMI below 0.2 indicates you may need to reparameterize your model

When it runs I can extract parameter values

nhuurre · October 18, 2019, 2:39pm

Ugh, 951? Well, the suggestion to reparameterize is probably a good one.
You said earlier that it works if you replace skew-normal with a normal. Was that just the simplified model?

lex · October 18, 2019, 3:21pm

No, it is the same complex model, it works properly when at the end I use normal distribution, but when I replace it on skew_normal everything fails. And I use this distribution as final step in my model.

lex · October 18, 2019, 3:23pm

Student distribution also works fine, I am interested in using distributions with big 95 %(or 90 does not matter) percentile of target samples. I do it because I need to estimate efficient upper bound of target value.

Student works better than normal, but I thought that skew normal must be better if alpha is positive value.

nhuurre · October 18, 2019, 3:29pm

What happens if alpha in the skew normal is data instead of a parameter? After all, small alpha should be equivalent to normal distribution…

lex · October 19, 2019, 11:26am

Everything works properly for alpha = 0. But when I set alpha = 0.1 I get intialization failed. It is true for standard skew_normal and your approach. Maximal value for working without errors is alpha=0.06 with the accuracy up to one hundredth. But this value is also does not work every time.

Topic		Replies	Views
Initialization failure in rstan Modeling	1	540	May 15, 2020
Initialization between (-2, 2) failed after 100 attempts Modeling	4	6006	April 24, 2018
Initialization failed, initial values rejected Modeling	10	2379	October 17, 2018
Model trains fails with message, but no indication which part of the code Modeling techniques	9	565	March 30, 2018
Rejecting initial value: Chain 1: Log probability evaluates to log(0), i.e. negative infinity Modeling	3	8488	September 5, 2020

Initialization failed till usage of skew normal distribution in likelyhood

Related topics