I am using algebra_solver to find a root used in approximating a term in my log likelihood. Unfortunately, when I started running four chains, two chains failed at the initial value because the norm of the algebraic function after running max_num_steps was still much higher than the function tolerance (1e-6). The other two chains having been moving extremely slowly (about 74 minutes for 1000 iterations using a data set of 1000 data points). I know the interval (which is a function of the parameters) in the real line where the root falls, so I suspect that a specialized root finding algorithm, such as toms748 from boost, can potentially be more much efficient.
First of all, I am wondering whether it is possible to incorporate external optimizers into my Stan program.
I have attached part of the error message from running the sampler for your reference:
SAMPLING FOR MODEL 'external' NOW (CHAIN 3).
Chain 3: Unrecoverable error evaluating the log probability at the initial value.
Chain 3: Exception: Exception: Exception: Exception: algebra_solver: the norm of the algebraic function is: 0.414458 but should be lower than the function tolerance: 1e-06. Consider decreasing the relative tolerance and increasing the max_num_steps. (in 'ellipsoid_gauss_likelihood.stan' at line 67; included from 'model44093be5caf1_external' at line 1)
(in 'ellipsoid_gauss_likelihood.stan' at line 117; included from 'model44093be5caf1_external' at line 1)
(in 'ellipsoid_gauss_likelihood.stan' at line 137; included from 'model44093be5caf1_external' at line 1)
(in 'model44093be5caf1_external' at line 44)
[1] "Error in sampler$call_sampler(args_list[[i]]) : "
[2] " Exception: Exception: Exception: Exception: algebra_solver: the norm of the algebraic function is: 0.414458 but should be lower than the function tolerance: 1e-06. Consider decreasing the relative tolerance and increasing the max_num_steps. (in 'ellipsoid_gauss_likelihood.stan' at line 67; included from 'model44093be5caf1_external' at line 1)"
Tagging @yizhang and @wds15 whom I believe have some experience with the algebra solver. One thing I did for a project was to write my own Newton solver from scratch as a function in the functions block. It worked quite well.
There is actually a Newton solver available, but in Stan math only at the moment. It’s the kinsol solver function. @charlesm93 is planning to expose this to users, but probably not straight away. If you are ok with a bit of c++ code to write, then you can use that one.
I doubt using Newton would help you much here, as Newton could stuck for the same reason dogleg(the algorithm behind algebra_solver) is stuck, and dogleg could actually perform better when the trust region limit is known, which is your case. Since you’re considering TOMS 748, is your problem 1d? If so could you share the equation? What’s the initial parameters to Stan for the equation(the ones that fail to converge)?
it’s just easier to try to constrain the interval or get a better prior before switching solvers.
Recognizing, per @yizhang’s comment, that this may not be a solution, I wrote a branch exposing the Newton solver. See branch feature/issue-311-newton_solver on Stanc3: https://github.com/stan-dev/stanc3. This is long overdue, especially considering how straightforward it is and I should have a PR very soon.
(@wds15 no variadic arguments yet. But this is next on deck)
I wonder how large the size of \theta is, because even with size(\theta)=3, we are looking at an equivalent problem of root finding of a 7-degree polynomial. I see two possible issues here. Numeric wise, the root finding process of high-order polynomial can be very sensitive to sampled \theta. Though both dogleg & Newton’s method are able to solve polynomial zero easily, the initial point has to be close enough. So I suspect the bound given is not sufficiently informative to get the iterations started. Statistics wise, large \theta size + a single output(root) can make \theta unidentifiable. The observed slow iterations and convergence failure may be the result of these issues.
Thank you very much for your detailed response. As of now I notice that the chains that mix well always take a long time to run whereas the chains that mix poorly finish much sooner. I am wondering whether this is related to the algebra_solver?