Pystan error with optimizing

I’m getting this error intermittenly (with simulated data, as I change sample sizes etc). What does it mean?

image

Can you check the terminal (where you started jupyter)

Initial log joint probability = -2.26469e+09
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
       1  -2.26469e+09             0   4.55166e+09       0.001       0.001       12   
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.

Optimization terminated with error: 
  Line search failed to achieve a sufficient decrease, no more progress can be made

But then running the Jupyter cell again worked:

Initial log joint probability = -1.49048e+07
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.

       8      -6237.88    0.00378842     0.0742458           1           1       41   
Optimization terminated normally: 
  Convergence detected: relative gradient magnitude is below tolerance

Also if it helps-- when I give pystan.optimizing an explicit seed it doesn’t seem to fail. If I don’t give it a seed, it fails roughly 1/3 of the time (observed from repeatedly re-running the same Jupyter cell).

You probably have a hard model which fails with some initial values. Hard to say anything without the model code.

Here’s the code. Data is simulated and seed will ensure reproducibility. (The error isn’t reproducable.)

import numpy as np
import pystan

ocode = """
data {
    int<lower=0> N; // number of input
    vector[N] x;
    vector[N] y;
    int<lower=0> K; // number of predicted values
    vector[K] x_pred; // values we want predictions for
}
parameters {
    real alpha;
    real beta;
    real<lower=0> sigma;
}
model {
    y ~ normal(alpha +beta * x, sigma);
}
generated quantities {
    vector[K] yhat;
    for (n in 1:K) {
        yhat[n] = normal_rng(alpha +beta * x_pred[n], sigma);
    }
}
"""

sm = pystan.StanModel(model_code=ocode,model_name='ols_with_prediction')

# actual (i.e. unobservable) model parameters
alpha = 1
beta = 10
sigma = 300

# simulate the regression model
N=1000 # number of points to simulate
x_mean=0 # point in domain around which data will be centered
x_variance=100
df=20

# first, simulate the x points 
np.random.seed(42)
x_values = x_mean+np.random.standard_t(df=df, size=N)*x_variance

# now, simulate the y points
y_values = np.random.normal(loc=alpha+beta*x_values,scale=sigma)

# calculate values for predictions
K=5
variance_scale=2
x_pred_values = x_mean+np.random.standard_t(df=df, size=K)*x_variance*variance_scale
x_pred_values = np.sort(x_pred_values)
x_pred_values

# assemble data to pass to stan
stan_data = {
    'x':x_values,
    'y':y_values,
    'N':N,
    'K':K,
    'x_pred':x_pred_values,
}

### Point estimates and point predictions

op = sm.optimizing(data=stan_data)
op

Just wondering if this code is helpful for isolating the problem? I’ve noticed the error only happens when I don’t pass a seed.