Dear all,

I generated a hierarchical model data X by using some parameter true \theta^* then I fitted pystan on a hierarchical model using the generated data X.

I initialized the starting param (\theta_{stan}) to be the true parameter \theta^*, but after the pystan model converges, I had the parameter converging to some wrong value \theta_{stan}.

When I calculate the log_prob by using the fit.log_prob() function, I found the true param \theta^* indeeds have higher log prob but still the model converges to a low log prob param \theta_{stan}

Does anyone knows why? It seems that if the pystan is using the log_prob as it target maximization objective, it should find some \theta_{stan} close to the true \theta^*.

Here is the convergence diagnosis, the true sigma2 is 0.0155868 and true v[0] is 1.76456398

Here is the log_prob comparison

Hi,

hmc/nuts do not maximize against log_prob, but our optimize method does.

See for example 15.1 Hamiltonian Monte Carlo | Stan Reference Manual

and

Thanks for the reply. I just tried sm.optimizing(data=…,init=init_list)

But it seems still that it found the sub-optimal points, isn’t that wired as the \theta^* is my init_list?

Sorry, hard to tell without a model (stan code)

How did you define the init?

Sure,

here my stan code:

ppca_code = """
data {
int D; //number of dimensions
int N; //number of data
int Q; //number of principle components
vector[D] x[N]; //data
real a_vj; // w_j prior
real epsilon;// w_j mean
real a_sigma2; // sigma2 prior
real beta_sigma2;// sigma2 mean

}

transformed data {
matrix[D,D] S;
S = x[1] * x[1]';

for (n in 2:N){
S += x[n] * x[n]';
}
S = S/N;

}
parameters {
ordered[Q] v; // v_j
real<lower=0> sigma2; //data sigma2
matrix[Q,D] W; //projection matrix
}
model {
matrix[D,D] C; //covaraince matrix
for(j in 1:Q){
v[j] ~ inv_gamma(a_vj, epsilon * (a_vj -1));
W[j] ~ multi_normal(rep_vector(0,D), v[j] * diag_matrix(rep_vector(1, D)));
}

sigma2 ~ inv_gamma(a_sigma2, beta_sigma2);

C = crossprod(W)+ sigma2 * diag_matrix(rep_vector(1, D));

target += -  N/2 *(log_determinant (C) + trace( C\S));
}

generated quantities {
vector[D] y_hat[N]; //predictive
for (n in 1:N) {
y_hat[n] = multi_normal_rng(rep_vector(0,D), crossprod(W)+ sigma2 * diag_matrix(rep_vector(1, D)));
}
}

"""


And here is how I initialize the param:

init_list = []
for i_ in range(n_chains):
temp_dict = {

'v': sorted(v_star_list),
'sigma2': sigma2_star,
"W": W_star.T

}
init_list.append(temp_dict)



And after I ran

fit_standard = sm.optimizing(data=ppca_dat_standard,init=init_list)


I have the fit_standard found some value that is different from my init_list, which is the true \theta^* that has higher log_prob

btw, I just wanted to be a bit more specific about optimize