Hi everyone,
I am fitting a complex nonlinear financial model known as Log-Periodic Power Law Singularity (LPPLS) model, but I found that the successful chain convergence is highly dependent on the initial values of the chains. With a lucky choice of initial values, the multiple chains can converge successfully with a relatively fast sampling speed. However, when using a different set of initial values or random initial values, they will mostly lead to slow sampling and non-convergence of chains. Even with the same lucky choice of initial values that work for a particular dataset, they will still fail to lead to chain convergence when applying to a different dataset.
I am thinking that it may be not a sampling issue specific to the model I am using, but is a general issue for complex nonlinear models. For such models, the likelihood itself often exhibits multiple local maxima in space and hence multiple maxima in the log posterior density to be evaluated. Therefore, the sampling can be very sensitive to the initial values; that is to say, when given different sets of initial values, the sampler of different chains may be stuck around one or more local maxima (NOT SURE if it can be stuck around more than 1 local maximum, and if it does, is this the reason why we get multi-modal posterior distribution assuming no prior-data conflict?).
I wonder how we should deal with such an issue which seems to be quite common to complex nonlinear models. In frequentist inference for such complex nonlinear models, we often see that the cost function (using ML or LS estimator) with multiple local minima is solved by iterative optimization methods with different sets of initial values, which similarly generates different locally optimal solutions. Following this, one seemingly common approach is to select the locally optimal solution that gives the smallest value of the cost function as the best one out of these locally optimal solutions. This approach sounds reasonable to me provided that the number of different initial value sets is reasonably large (say, more than 5), although we may not necessarily get the global optimal solution.
I wonder in the Bayesian paradigm whether we should select the best chain out of these multiple non-converged chains that kind of represent different locally optimal solutions in a similar manner? If that’s the way we should do, is the selection criterion to be the chain that gives the maximum log posterior density compared to the other parallel chains?
I really appreciate it if anyone can give me some insights!
Alan
For you information, the LPPLS model is written as (parameters are A, B, C_1, C_2, m, t_c and \omega):
\mathrm{ln}p_t = A + B(t_c - t)^m + C_1(t_c - t)^m \cos(\omega \mathrm{ln}(t_c - t) ) + C_2(t_c - t)^m \sin(\omega \mathrm{ln}(t_c - t) ) + \varepsilon_t ,
and the Stan codes are:
data {
int<lower=0> N; // number of data/trading days
vector[N] t; // time index of trading days
vector<lower=0>[N] p; // asset price
}
parameters {
real<lower=0> A;
real<upper=0> B;
real<lower=-1, upper=1> C1;
real<lower=-1, upper=1> C2;
real<lower=max(t)+1> tc;
real<lower=0, upper=1> m;
real<lower=0> omega;
real<lower=0> sigma; // standard deviation of the noise of log price
}
model {
// priors
A ~ normal(0, 5);
B ~ normal(0, 2.5);
C1 ~ normal(0, 0.5);
C2 ~ normal(0, 0.5);
tc ~ normal(100, 50);
m ~ normal(0.5, 0.2);
omega ~ normal(10, 3);
sigma ~ normal(0, 2.5);
// likelihood
for (n in 1: N){
p[n] ~ lognormal(A + B*((tc-t[n])^m) + C1*((tc-t[n])^m)*cos(omega*log(tc-t[n])) + C2*((tc-t[n])^m)*sin(omega*log(tc-t[n])), sigma);
}
}