Dear all,
I generated a hierarchical model data X  by using some parameter true \theta^*  then I fitted pystan on a hierarchical model using the generated data X .
I initialized the starting param (\theta_{stan} ) to be the true parameter \theta^* , but after the pystan model converges, I had the parameter converging to some wrong value \theta_{stan} .
When I calculate the log_prob by using the fit.log_prob() function, I found the true param \theta^*  indeeds have higher log prob but still the model converges to a low log prob param \theta_{stan} 
Does anyone knows why? It seems that if the pystan is using the log_prob as it target maximization objective, it should find some \theta_{stan}  close to the true \theta^* .
             
            
               
               
               
            
            
           
          
            
            
              Here is the convergence diagnosis, the true sigma2 is 0.0155868 and true v[0] is 1.76456398 
Here is the log_prob comparison 
             
            
               
               
               
            
                
            
           
          
            
            
              Hi,
hmc/nuts do not maximize against log_prob, but our optimize method does.
See for example 15.1 Hamiltonian Monte Carlo | Stan Reference Manual 
and
  
  
    tl;dr 
The typical set (at some level of coverage) is the set of parameter values for which the log density (the target function) is close to its expected value. As has been much discussed, this is not the same as the posterior mode. In a d-dimensional unit normal distribution with a high value of d, the 99% typical set looks like an annulus or sphere or doughnut, corresponding to the points whose distance from the origin is approximately \sqrt{d}. This intuition is important because it helps un…
   
 
             
            
               
               
               
            
            
           
          
            
            
              Thanks for the reply. I just tried sm.optimizing(data=…,init=init_list)
But it seems still that it found the sub-optimal points, isn’t that wired as the \theta^*  is my init_list?
             
            
               
               
               
            
            
           
          
            
            
              Sorry, hard to tell without a model (stan code)
How did you define the init?
             
            
               
               
               
            
            
           
          
            
            
              Sure,
here my stan code:
ppca_code = """
data { 
    int D; //number of dimensions
    int N; //number of data
    int Q; //number of principle components
    vector[D] x[N]; //data
    real a_vj; // w_j prior 
    real epsilon;// w_j mean
    real a_sigma2; // sigma2 prior 
    real beta_sigma2;// sigma2 mean
}
transformed data {
    matrix[D,D] S;
    S = x[1] * x[1]';
    
    for (n in 2:N){
    S += x[n] * x[n]';
    }
    S = S/N;
    
}
parameters {
    ordered[Q] v; // v_j
    real<lower=0> sigma2; //data sigma2
    matrix[Q,D] W; //projection matrix
}
model {
    matrix[D,D] C; //covaraince matrix
    for(j in 1:Q){
        v[j] ~ inv_gamma(a_vj, epsilon * (a_vj -1));
        W[j] ~ multi_normal(rep_vector(0,D), v[j] * diag_matrix(rep_vector(1, D)));
        }
        
    sigma2 ~ inv_gamma(a_sigma2, beta_sigma2);
    
    C = crossprod(W)+ sigma2 * diag_matrix(rep_vector(1, D));
    
    target += -  N/2 *(log_determinant (C) + trace( C\S));
}
generated quantities {
    vector[D] y_hat[N]; //predictive
    for (n in 1:N) {
        y_hat[n] = multi_normal_rng(rep_vector(0,D), crossprod(W)+ sigma2 * diag_matrix(rep_vector(1, D)));
    }
}
"""
 
And here is how I initialize the param:
init_list = []
for i_ in range(n_chains):
    temp_dict = {
         'v': sorted(v_star_list),
        'sigma2': sigma2_star,
         "W": W_star.T
    }
    init_list.append(temp_dict)
 
And after I ran
fit_standard = sm.optimizing(data=ppca_dat_standard,init=init_list)
 
I have the fit_standard found some value that is different from my init_list, which is the true \theta^*  that has higher log_prob
             
            
               
               
               
            
            
           
          
            
            
              btw, I just wanted to be a bit more specific about optimize
  
  
    Technically, our optimizer finds penalized maximum likelihood estimates. 
They’re not MAP estimates because we throw away the Jacobian in the 
implicit priors on the transformed parameters.  We’ll probably be 
adding a proper MAP estimator soon. 
Andrew likes to think of all these methods as approximate Bayes. 
The penalized max likelihood gives you a Laplace approximation from 
which you can gauge uncertainty in the same way as variational 
inference, with the difference being centering on the …