About Pystan Convergence Issue

Dear all,

I generated a hierarchical model data X by using some parameter true \theta^* then I fitted pystan on a hierarchical model using the generated data X.

I initialized the starting param (\theta_{stan}) to be the true parameter \theta^*, but after the pystan model converges, I had the parameter converging to some wrong value \theta_{stan}.

When I calculate the log_prob by using the fit.log_prob() function, I found the true param \theta^* indeeds have higher log prob but still the model converges to a low log prob param \theta_{stan}

Does anyone knows why? It seems that if the pystan is using the log_prob as it target maximization objective, it should find some \theta_{stan} close to the true \theta^*.

Here is the convergence diagnosis, the true sigma2 is 0.0155868 and true v[0] is 1.76456398

Here is the log_prob comparison


hmc/nuts do not maximize against log_prob, but our optimize method does.

See for example 15.1 Hamiltonian Monte Carlo | Stan Reference Manual


Thanks for the reply. I just tried sm.optimizing(data=…,init=init_list)

But it seems still that it found the sub-optimal points, isn’t that wired as the \theta^* is my init_list?

Sorry, hard to tell without a model (stan code)

How did you define the init?


here my stan code:

ppca_code = """
data { 
    int D; //number of dimensions
    int N; //number of data
    int Q; //number of principle components
    vector[D] x[N]; //data
    real a_vj; // w_j prior 
    real epsilon;// w_j mean
    real a_sigma2; // sigma2 prior 
    real beta_sigma2;// sigma2 mean


transformed data {
    matrix[D,D] S;
    S = x[1] * x[1]';
    for (n in 2:N){
    S += x[n] * x[n]';
    S = S/N;
parameters {
    ordered[Q] v; // v_j
    real<lower=0> sigma2; //data sigma2
    matrix[Q,D] W; //projection matrix
model {
    matrix[D,D] C; //covaraince matrix
    for(j in 1:Q){
        v[j] ~ inv_gamma(a_vj, epsilon * (a_vj -1));
        W[j] ~ multi_normal(rep_vector(0,D), v[j] * diag_matrix(rep_vector(1, D)));
    sigma2 ~ inv_gamma(a_sigma2, beta_sigma2);
    C = crossprod(W)+ sigma2 * diag_matrix(rep_vector(1, D));

    target += -  N/2 *(log_determinant (C) + trace( C\S));

generated quantities {
    vector[D] y_hat[N]; //predictive
    for (n in 1:N) {
        y_hat[n] = multi_normal_rng(rep_vector(0,D), crossprod(W)+ sigma2 * diag_matrix(rep_vector(1, D)));


And here is how I initialize the param:

init_list = []
for i_ in range(n_chains):
    temp_dict = {

         'v': sorted(v_star_list),
        'sigma2': sigma2_star,
         "W": W_star.T


And after I ran

fit_standard = sm.optimizing(data=ppca_dat_standard,init=init_list)

I have the fit_standard found some value that is different from my init_list, which is the true \theta^* that has higher log_prob

btw, I just wanted to be a bit more specific about optimize