vincent-picaud Developer
October 10
Just to understand more of the Stan lib design and to make it easier to know where one can dig for improvement.
These questions reflect where I stand concerning my comprehension of the approach, maybe I am totally wrong.
1/ The only place where deterministic optimization is used is for log likelihood optimization ? (the “optimize” of CmdStan)
That’s where L-BFGS is applied. It’s deterministic
other than initialization.
2/ In Stan all optimization problems are unconstrained because the same changes of variables that transforms the ELBO domain of definition to R^n, are used everywhere (“optimize”, “variational” and “sampling”) ?
Not quite. You can define a constrained problem and you can
still optimize it, but it will require initialization within
support. This can sometimes work with sampling, but is more
stable with optimization.
3/ All in all, deterministic unconstrained optimization is not a critical part of Stan, because variational and sampling methods are much more important?
I think that depends on who you ask. I think most uses are
for sampling, second most uses for optimization and third most
for variational inference. I think this is largely due to our
users (mostly statisticians) and due to our lack of understanding
of variational inference.
The non-determinstic part of variational inference is in the
calculation of the gradient, not in mini-batching. That is, it’s
not a stochastic gradient.
4/ ELBO maximization is performed by stochastic gradient and this part is more critical and difficult (having a general approach for auto tuning…) but yet there is still no need for constrained methods?
Not in the release version of Stan. There was an experimental
version used for the paper, but I’m pretty sure the stochastic
version isn’t built into Stan or acccessible from the interfaces.
In other words, to contribute to Stan, you would be more happy with another stochastic gradient implementation than another deterministic one?
We’d be happy with anything that makes any of our systems faster
or more robust. Or adds new systems we haven’t thought about.
Andrew and Dustin are working on max marginal likelihood (which may
itself involve some stochastic components much like variational inference),
and Andrew and Aki and a whole crew of others are working on expectation
propagation, which is like variational inference, but optimizes the
reversed form of KL divergence. Neither of these have any Stan code
yet as far as I know—certainly nothing merged into the develop branch
of the repos.