Quadratic optimizier

Hi,

Shosh and I are working on the development of a quadratic optimizer, with constraints (it’s a Karush–Kuhn–Tucker problem; it generalizes the Lagrangian optimization problem by allowing some of the constraints to be inequalities). I’m attaching the spec for a prototype function, in case anyone has feedback.

I would also like to hear from users who might be interested in such a function.
quadratic_program_gradient_doc.pdf (183.3 KB)

2 Likes

That sounds great.

Is this something you’re thinking about adding to Stan like the algebraic solver? If so, it’ll need an issue with a spec of the function being built (which can point to external refereces).

The name in the language will have to be something like quad_solver or quadratic_solver as it looks like we’re going down the route of having multiple solvers.

When we get lambda abstractions with closures, it’ll get much easier to drop in lightweight functions, so I wouldn’t worry too much about that.

P.S. I’d write that signature matrix H(real[] theta, real[] delta) if those are two real arrays. In the manual, I’m using (real[], real[]):matrix as the notation for a function type that takes two real array arguments and returns a matrix.

1 Like

I’ll edit the document to have the appropriate signatures. And yes, theta and delta need only be arrays, though we may want to have them as vectors. I’ll tinker on that. I was thinking of calling the function quadratic_optimizer – it’s a slightly different problem than solving equations.

The gitHub issue is here https://github.com/stan-dev/math/issues/888.

1 Like

There is one technical difficulty. Computing the sensitivities (the Jacobian matrix of the solution w.r.t to parameters) requires we know which elements in the solution vector are zero and which ones are non-zero.

Unfortunately, a numerical optimizer may return something like 1e-16 instead of 0. I’m not sure then how to flag an element as zero or non-zero, other than by introducing a tuning parameter, say \alpha, and a condition if (x[i] < alpha).

1 Like

Sounds like you need sparse vector type.

1 Like

Have we used sparse vector types in Stan?

I finished writing the “evaluation” function and wrote some basic unit tests. Somehow, the inequality constraint doesn’t seem to be (always) kicking in. @shoshievass, maybe you could take a look and see if you can reproduce the error in R.

The branch is feature/issue-888-quadratic_optimizer, and the files of interest are:

  • stan/math/rev/mat/functor/quadratic_optimizer.hpp
  • stan/math/rev/mat/functor/eiquadprog.hpp
  • test/unit/math/rev/mat/functor/quadratic_optimizer_test.hpp

The error is exposed in the last lines of the test file:

  // test the inequality constraint.
  theta << 0, 1;
  x = quadratic_optimizer(fh(), fv(), fa_0(), fb(), theta, delta, n);
  std::cout << x << "\n \n";

All the elements of x should be greater than 0, but x(1) = -1. I think the code is readable (but it’s all C++) but I can write out the math if needed.

I think it’s a numerical error issue.

The result I got from R is this: -3.143751e-16 2.000000e+00 0.000000e+00

Code below:

library(quadprogpp)

G = matrix( c( 2.1, 0.0, 1.0,
             1.5, 2.2, 0.0,
             1.2, 1.3, 3.1), 3, 3, byrow=T)

g0 = c(6, 1, 1)

CE = matrix(c(1, 2, -1), 3, 1)

ce0 = c(-4)

CI = matrix( c(1, 0, 0, -1,
       0, 1, 0, -1,
       0, 0, 1,  0), 3, 4, byrow=T)

ci0 = matrix( c(0, 0, 0, 10), 4, 1)

x = QP.Solve(G, g0, CI, ci0, CE, ce0)

I actually had another example in mind:

library(quadprogpp)

n = 2
theta = c(0, 1)

G = diag(n)
g0 = c(0, theta[2])

CI = diag(n)  # identity matrix
ci0 = c(0, 0)

# dummy - not needed (setting trivial conditions)
CE = matrix(rep(0, n*n), n, n)
ce0 = c(0, 0)

x = QP.Solve(G, g0, CI, ci0, CE, ce0)

The optimizer returns x = (0, -1) (in both R and C++) but it seems to me the conditions set by CI and ci0 should force x to only have positive elements. The problem is indeed:

\mathrm{min} \frac{1}{2} x_1^2 + \frac{1}{2} x_2^2 + x_2

subject to x_1 \ge 0 and x_2 \ge 0; unless I messed something up?

Oh I see. Maybe the optimizer fails when you have dummy constraints like this? I think it may cause an error because it creates an underdetermined matrix (sorry if I’m getting the linear algebra term wrong - a matrix with null space from lower than the number of rows), which might break the linear algebra in the quadprog algorithm. I had some similar errors when trying something similar earlier. Maybe we can check that the CE and CI matrices have full dims?

That is likely to induce a discontinuity in the overall function, which is not unsurprising given the nature of these optimization problems that try to parameters towards boundaries (which is fundamentally a discrete problem). This will cause issues going forwards because functions with any nontrivial discontinuities will be problematic for Hamiltonian Monte Carlo.

Yeah, this crossed my mind. My next move is to build a unit test which showcases this issue.

Here’s a simple example to illustrate the differentiation problem:

\underset{x_1, x_2}{\mathrm{min}}\frac{1}{2} x_1^2 + \frac{1}{2} x_2^2 + \theta x_2 subject to \bf{x} \ge 0 and x_1 = \phi.

x_1 is solved by the linear constraint.
Then x_2 = - \theta if \theta \ge 0, else x_2 = 0.

The limit for the partial derivative \partial x_2 / \partial \theta does not exist at \theta = 0.

More generally, any phase transition (point where x_i flips from being zero to non-zero) will cause an issue. This can be seen clearly by looking at the semi-analytical expression @shoshievass worked out. The problem seems mathematically ill-posed, but still, haven’t we already dealt with a comparable issue?

I know something similar came up in a PK/PD model with bolus dosing and lag times, but I don’t think we worked out a fix.

I don’t think it is mathematically ill-posed. It is a well-posed optimization problem that can have a boundary solution for some value of \theta. That can be difficult to deal with from a numerical perspective and even more so when embedded within NUTS.

Do you think there might be a path forward?

I think the issue is that QP.solve(.) doesn’t accept zero matrices for the equality constraints (this is something that can be solved w/ linear algebra in the C++ code - the error handling is pretty bad here).

However, you can exclude the equality constraints:

x = QP.Solve(G, g0, CI, ci0)

produces (0,0).

If you find what you believe to be a solution on the boundary, can you automatically re-run it with that parameter fixed to the boundary value?

1 Like

I’m not sure I understand the idea. Could you please elaborate on what exactly we’d re-run and why that would help w/ the Jacobian wrt model parameters that cause the solution to flip from hitting the boundary?

Also, I can’t think of a reparametrization that would maintain the optimality of the quadratic program in an unconstrained formulation of the program but maybe there is one? @betanalpha have you seen anything along these lines?

Maybe we can do a transformation and solve the optimization problem in an unconstrained space. Say, have the solution be x = e^\lambda and solve for \lambda, with no inequality constraints.

The issue is - I don’t see how that kind of transformation would preserve optimality on the unconstrained space…

What if we did something MH-y and rejected draws that led to negative numbers? From experience playing around w/ the unconstrained problem, it might be hard to draw unrejected params, but maybe not impossible?