Quadratic optimizier


#3

I’ll edit the document to have the appropriate signatures. And yes, theta and delta need only be arrays, though we may want to have them as vectors. I’ll tinker on that. I was thinking of calling the function quadratic_optimizer – it’s a slightly different problem than solving equations.

The gitHub issue is here https://github.com/stan-dev/math/issues/888.


#4

There is one technical difficulty. Computing the sensitivities (the Jacobian matrix of the solution w.r.t to parameters) requires we know which elements in the solution vector are zero and which ones are non-zero.

Unfortunately, a numerical optimizer may return something like 1e-16 instead of 0. I’m not sure then how to flag an element as zero or non-zero, other than by introducing a tuning parameter, say \alpha, and a condition if (x[i] < alpha).


#5

Sounds like you need sparse vector type.


#6

Have we used sparse vector types in Stan?


#7

I finished writing the “evaluation” function and wrote some basic unit tests. Somehow, the inequality constraint doesn’t seem to be (always) kicking in. @shoshievass, maybe you could take a look and see if you can reproduce the error in R.

The branch is feature/issue-888-quadratic_optimizer, and the files of interest are:

  • stan/math/rev/mat/functor/quadratic_optimizer.hpp
  • stan/math/rev/mat/functor/eiquadprog.hpp
  • test/unit/math/rev/mat/functor/quadratic_optimizer_test.hpp

The error is exposed in the last lines of the test file:

  // test the inequality constraint.
  theta << 0, 1;
  x = quadratic_optimizer(fh(), fv(), fa_0(), fb(), theta, delta, n);
  std::cout << x << "\n \n";

All the elements of x should be greater than 0, but x(1) = -1. I think the code is readable (but it’s all C++) but I can write out the math if needed.


#8

I think it’s a numerical error issue.

The result I got from R is this: -3.143751e-16 2.000000e+00 0.000000e+00

Code below:

library(quadprogpp)

G = matrix( c( 2.1, 0.0, 1.0,
             1.5, 2.2, 0.0,
             1.2, 1.3, 3.1), 3, 3, byrow=T)

g0 = c(6, 1, 1)

CE = matrix(c(1, 2, -1), 3, 1)

ce0 = c(-4)

CI = matrix( c(1, 0, 0, -1,
       0, 1, 0, -1,
       0, 0, 1,  0), 3, 4, byrow=T)

ci0 = matrix( c(0, 0, 0, 10), 4, 1)

x = QP.Solve(G, g0, CI, ci0, CE, ce0)

#9

I actually had another example in mind:

library(quadprogpp)

n = 2
theta = c(0, 1)

G = diag(n)
g0 = c(0, theta[2])

CI = diag(n)  # identity matrix
ci0 = c(0, 0)

# dummy - not needed (setting trivial conditions)
CE = matrix(rep(0, n*n), n, n)
ce0 = c(0, 0)

x = QP.Solve(G, g0, CI, ci0, CE, ce0)

The optimizer returns x = (0, -1) (in both R and C++) but it seems to me the conditions set by CI and ci0 should force x to only have positive elements. The problem is indeed:

\mathrm{min} \frac{1}{2} x_1^2 + \frac{1}{2} x_2^2 + x_2

subject to x_1 \ge 0 and x_2 \ge 0; unless I messed something up?


#10

Oh I see. Maybe the optimizer fails when you have dummy constraints like this? I think it may cause an error because it creates an underdetermined matrix (sorry if I’m getting the linear algebra term wrong - a matrix with null space from lower than the number of rows), which might break the linear algebra in the quadprog algorithm. I had some similar errors when trying something similar earlier. Maybe we can check that the CE and CI matrices have full dims?


#11

That is likely to induce a discontinuity in the overall function, which is not unsurprising given the nature of these optimization problems that try to parameters towards boundaries (which is fundamentally a discrete problem). This will cause issues going forwards because functions with any nontrivial discontinuities will be problematic for Hamiltonian Monte Carlo.


#12

Yeah, this crossed my mind. My next move is to build a unit test which showcases this issue.


#13

Here’s a simple example to illustrate the differentiation problem:

\underset{x_1, x_2}{\mathrm{min}}\frac{1}{2} x_1^2 + \frac{1}{2} x_2^2 + \theta x_2 subject to \bf{x} \ge 0 and x_1 = \phi.

x_1 is solved by the linear constraint.
Then x_2 = - \theta if \theta \ge 0, else x_2 = 0.

The limit for the partial derivative \partial x_2 / \partial \theta does not exist at \theta = 0.

More generally, any phase transition (point where x_i flips from being zero to non-zero) will cause an issue. This can be seen clearly by looking at the semi-analytical expression @shoshievass worked out. The problem seems mathematically ill-posed, but still, haven’t we already dealt with a comparable issue?

I know something similar came up in a PK/PD model with bolus dosing and lag times, but I don’t think we worked out a fix.


#14

I don’t think it is mathematically ill-posed. It is a well-posed optimization problem that can have a boundary solution for some value of \theta. That can be difficult to deal with from a numerical perspective and even more so when embedded within NUTS.


#15

Do you think there might be a path forward?


#16

I think the issue is that QP.solve(.) doesn’t accept zero matrices for the equality constraints (this is something that can be solved w/ linear algebra in the C++ code - the error handling is pretty bad here).

However, you can exclude the equality constraints:

x = QP.Solve(G, g0, CI, ci0)

produces (0,0).


#17

If you find what you believe to be a solution on the boundary, can you automatically re-run it with that parameter fixed to the boundary value?


#18

I’m not sure I understand the idea. Could you please elaborate on what exactly we’d re-run and why that would help w/ the Jacobian wrt model parameters that cause the solution to flip from hitting the boundary?

Also, I can’t think of a reparametrization that would maintain the optimality of the quadratic program in an unconstrained formulation of the program but maybe there is one? @betanalpha have you seen anything along these lines?


#19

Maybe we can do a transformation and solve the optimization problem in an unconstrained space. Say, have the solution be x = e^\lambda and solve for \lambda, with no inequality constraints.


#20

The issue is - I don’t see how that kind of transformation would preserve optimality on the unconstrained space…

What if we did something MH-y and rejected draws that led to negative numbers? From experience playing around w/ the unconstrained problem, it might be hard to draw unrejected params, but maybe not impossible?


#21

What if we did some sort of ‘finite differencing’ approach re: the state change? E.g. Recompute the algorithm a number of times to check for state changes? It would take a lot of time, but maybe we could figure out a sophisticated way to do this less-than-always but just enough to avoid getting stuck in regions of state on the boundary condition?


#22

These kinds of quadratic programming models are fundamentally not smooth at the boundaries. The boundaries can’t be moved to infinity because you want to put values there, and you can’t approximate the gradients because they do not exist. One way you can tell that this kind of approach can’t work is that it would implicitly allow you to do variable selection, which is known to be an issue.