Initialization failed till usage of skew normal distribution in likelyhood

Hello! This is my model. When I try to sample from it I get this error:
Initialization between (-2, 2) failed after 100 attempts.
Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can’t start sampling from this initial value.

If I replace skew_normal with normal, then sampling works.
Could you help to fix that problem please?

functions{
  real multiplicative_impact(
    real value, real feature, real impact_coef, real impact_bias) {
    return exp(impact_coef * real_feature + impact_bias) * value;
  }
}
data {
  int<lower=0> N;
  real x1[N];
  real x2[N];
  real x3[N];

  real y[N];
}
parameters {
  real coef1;
  real coef2;
  real<lower=0, upper=1> vote_coef;

  real impact_coef;
  real impact_bias;

  real<lower=0> omega;
  real alpha;
}
model {
  real mean;
  real impact_mean;
  coef1 ~ normal(0, 1);
  coef2 ~ normal(0, 1);
  vote_coef ~ normal(0.5, 0.05);
  impact_coef ~ normal(0, 1);
  impact_bias ~ normal(0, 1);
  omega ~ cauchy(0, 0.05);
  alpha ~ normal(0, 1);
  for (n in 1:N) {
    mean = vote_coef * coef1 * x1[n] + (1 - vote_coef) * coef2 *  x2[n];
    impact_mean = multiplicative_impact(mean, x3[n], impact_coef, impact_bias);
    y[n] ~ skew_normal(impact_mean, omega, alpha);
  }
}
1 Like

As you probably know, the error message indicates that the data are impossible given the model and parameters tried out during initialization. This suggests that your priors could be improved.

You could do a prior predictive check to investigate the data broadly implied by your priors.

To do this, comment out the line with skew_normal from your code and add a generated quantities block in which you calculate impact_mean as in the model block and use skew_normal_rng to generate hypothetical y. Run the model for 1000 or so iterations and then plot iteration-wise histograms (I like to overlay a partially transparent histograms ) to check if the data implied by your model and parameters are broadly consistent with data that are plausible (given domain expertise).

Hope this helps :-)

1 Like

I find it strange that everything works with normal distribution. If I am not mistaken skew normal is more general case of normal distribution in other words skew normal is normal up to a constant when alpha == 0

It’s also possible that Stan’s skew normal implementation just has a bug. The initial guess is usually not very good and that multiplicative_impact function can easily push impact_mean very far from y. Far in the left tail of skew_normal the calculation underflows and the sampler cannot start.

Here, this should be a bit more stable approximation

functions {
  real robust_skew_normal_lpdf(real y, real xi, real omega, real alpha) {
    real diff = alpha * (y - xi)/omega;
    if (diff > -35) {
      return skew_normal_lpdf(y| xi, omega, alpha);
    } else {
      return normal_lpdf(y| xi, omega) - 0.5*square(diff) - log(-diff) - 0.2266;
    }
  }
}
model {
  ...
    y[n] ~ robust_skew_normal(impact_mean, omega, alpha);
}
3 Likes

Thank you, now I have no problems with initialization. But in sampling report I have n_eff == 1 and Rhat == nan for each parameter (only for lp_ Rhat = 2.0e5). May be it is connected with the fact that I show here simplified version of my program for easier understanding, and for my full script something goes wrong way. Could you please tell me the algorithm of finding constants -35 and 0.2266, may be I can adapt them for my program, to run it properly?

That -35 was found by running something like

data {}
model {}
generated quantities {
  vector[100] s;
  for (i in 1:100)
    s[i] = skew_normal_lpdf(-i| 0, 1, 1);
}

for one iteration and observing that everything beyond -37 or so was negative infinity. I chose 0.2266 because that made the function graph look approximately continuous at the branch point.

Sounds like all the chains are getting stuck badly. Does it warn about divergent transitions?

I can not find it in traceback.
There are all unique traceback messages:

Rejecting initial value:
Gradient evaluated at the initial value is not finite.
Stan can’t start sampling from this initial value.

Gradient evaluation took 0.000146 seconds
1000 transitions using 10 leapfrog steps per transition would take 1.46 seconds.
Adjust your expectations accordingly!

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: Exception: normal_lpdf: Location parameter is inf, but must be finite! (in ‘unknown file name’ at line 14)
(in ‘unknown file name’ at line 155)

Which interface are you using? RStan or PyStan? The interface should have a diagnose() function you can use on the fit object.
Can you extract the parameter values from the fit?

PyStan, I updates to 2.19 and now my program (corrected with your approaches) sometimes falls with initialization fail, sometimes runs with this messages(here is information about divergence):

WARNING:pystan:n_eff / iter below 0.001 indicates that the effective sample size has likely been overestimated
WARNING:pystan:Rhat above 1.1 or below 0.9 indicates that the chains very likely have not mixed
WARNING:pystan:951 of 2000 iterations ended with a divergence (47.5 %).
WARNING:pystan:Try running with adapt_delta larger than 0.8 to remove the divergences.
WARNING:pystan:Chain 2: E-BFMI = 7.3e-05
WARNING:pystan:E-BFMI below 0.2 indicates you may need to reparameterize your model

When it runs I can extract parameter values

Ugh, 951? Well, the suggestion to reparameterize is probably a good one.
You said earlier that it works if you replace skew-normal with a normal. Was that just the simplified model?

No, it is the same complex model, it works properly when at the end I use normal distribution, but when I replace it on skew_normal everything fails. And I use this distribution as final step in my model.

Student distribution also works fine, I am interested in using distributions with big 95 %(or 90 does not matter) percentile of target samples. I do it because I need to estimate efficient upper bound of target value.

Student works better than normal, but I thought that skew normal must be better if alpha is positive value.

What happens if alpha in the skew normal is data instead of a parameter? After all, small alpha should be equivalent to normal distribution…

Everything works properly for alpha = 0. But when I set alpha = 0.1 I get intialization failed. It is true for standard skew_normal and your approach. Maximal value for working without errors is alpha=0.06 with the accuracy up to one hundredth. But this value is also does not work every time.