vincent-picaud Developer

October 10

Just to understand more of the Stan lib design and to make it easier to know where one can dig for improvement.

These questions reflect where I stand concerning my comprehension of the approach, maybe I am totally wrong.

1/ The only place where deterministic optimization is used is for log likelihood optimization ? (the “optimize” of CmdStan)

That’s where L-BFGS is applied. It’s deterministic

other than initialization.

2/ In Stan all optimization problems are unconstrained because the same changes of variables that transforms the ELBO domain of definition to R^n, are used everywhere (“optimize”, “variational” and “sampling”) ?

Not quite. You can define a constrained problem and you can

still optimize it, but it will require initialization within

support. This can sometimes work with sampling, but is more

stable with optimization.

3/ All in all, deterministic unconstrained optimization is not a critical part of Stan, because variational and sampling methods are much more important?

I think that depends on who you ask. I think most uses are

for sampling, second most uses for optimization and third most

for variational inference. I think this is largely due to our

users (mostly statisticians) and due to our lack of understanding

of variational inference.

The non-determinstic part of variational inference is in the

calculation of the gradient, not in mini-batching. That is, it’s

not a stochastic gradient.

4/ ELBO maximization is performed by stochastic gradient and this part is more critical and difficult (having a general approach for auto tuning…) but yet there is still no need for constrained methods?

Not in the release version of Stan. There was an experimental

version used for the paper, but I’m pretty sure the stochastic

version isn’t built into Stan or acccessible from the interfaces.

In other words, to contribute to Stan, you would be more happy with another stochastic gradient implementation than another deterministic one?

We’d be happy with anything that makes any of our systems faster

or more robust. Or adds new systems we haven’t thought about.

Andrew and Dustin are working on max marginal likelihood (which may

itself involve some stochastic components much like variational inference),

and Andrew and Aki and a whole crew of others are working on expectation

propagation, which is like variational inference, but optimizes the

reversed form of KL divergence. Neither of these have any Stan code

yet as far as I know—certainly nothing merged into the develop branch

of the repos.