Quadratic optimizier

Great – thanks. This is super helpful.

One more question. It seems like the function that gets exposed to stan must be restricted in what types it can accept and return, so that you can declare those types in the stan model file. What types would those be? The reason that I ask is that the function solve_quadprog in eiquadprog is pretty much exactly what I would want to expose to stan, but I have no idea if the variable types it accepts or returns (e.g. const MatrixXd & G and inline double) would be accepted by Stan.

It can be made to work, but usually entails getting past at least one error message. One thing I usually do is to try (and fail) to compile a Stan program that has

functions {
  matrix eigquadprog(matrix G, real theta); // not defined
}

with

stan_model("my_file.stan", allow_undefined = TRUE, verbose = TRUE)

Then, look at the generated C++ code that it prints for your eigquadprog function, which in this case is

30 : template <typename T0__, typename T1__>
31 : Eigen::Matrix<typename boost::math::tools::promote_args<T0__, T1__>::type, Eigen::Dynamic, Eigen::Dynamic>
32 : eigquadprog(const Eigen::Matrix<T0__, Eigen::Dynamic, Eigen::Dynamic>& G,
33 : const T1__& theta, std::ostream* pstream__);

and that is the function signature that you need to define in C++. The pstream__ argument is added for you so that you can print debug output and stuff.

2 Likes

Ok, so, when I compile the stan model

functions {
  
  vector solve_quadprog(matrix G, vector g0, matrix CE, vector ce0, 
    matrix CI, vector ci0, vector x);
  
}

I get the following signature in the generated C++ code.

template <typename T0__, typename T1__, typename T2__, typename T3__, typename T4__, typename T5__, typename T6__>
Eigen::Matrix<typename boost::math::tools::promote_args<T0__, T1__, T2__, T3__, typename boost::math::tools::promote_args<T4__, T5__, T6__>::type>::type, Eigen::Dynamic, 1>
solve_quadprog(const Eigen::Matrix<T0__, Eigen::Dynamic, Eigen::Dynamic>& G,
                   const Eigen::Matrix<T1__, Eigen::Dynamic, 1>& g0,
                   const Eigen::Matrix<T2__, Eigen::Dynamic, Eigen::Dynamic>& CE,
                   const Eigen::Matrix<T3__, Eigen::Dynamic, 1>& ce0,
                   const Eigen::Matrix<T4__, Eigen::Dynamic, Eigen::Dynamic>& CI,
                   const Eigen::Matrix<T5__, Eigen::Dynamic, 1>& ci0,
                   const Eigen::Matrix<T6__, Eigen::Dynamic, 1>& x, std::ostream* pstream__);

This does not match the signature in the eiquadprog function (naturally), which is:


inline double solve_quadprog(const MatrixXd & G,  const VectorXd & g0,  
                      const MatrixXd & CE, const VectorXd & ce0,  
                      const MatrixXd & CI, const VectorXd & ci0, 
                      VectorXd& x)

I unfortunately don’t have the ability to tell how much of a lift it would be to edit the C++ in eiquadprog so that it complies with the stan types. However, I would be willing to pay for a consulting engagement for someone with the know-how to interface this function with Stan. I think the roadmap is pretty clear (?) based on this thread – mainly it is to create a new function with the type signature I pasted above that is a wrapper around the solve_quadprog function linked-to above. Given that I am in a “don’t know what I don’t know” situation, I’m sure there’s complexity I’m overlooking. Let me know if you’re interested! If you’d rather chat privately, I’m at tom@gradientmetrics.com. Thank you.

1 Like

It just occurred to me that the experimental results @shoshievass and I got when fitting the model with a quadratic solver are not in this thread, but in another one. So I’m linking this here: Non-smooth posterior and KKT problem

As stated, I’m happy to help but I want to make sure we’re not running the same experiment again.

Hey everyone! Because of the hard work of @rok_cesnovar, we were able to write a C++ function that can be included with vanilla stan models using the normal procedure to incorporate external C++ functions. The example R code is here (apologies, we haven’t cleaned up the repo), which shows how to compile the stan model (which declares the function but does not define it) along with the externally-defined C++ function.

Thanks for everyone’s help here nailing down the steps to solve it, and for @rok_cesnovar’s help working with the C++ code.

4 Likes

A few quick comments:

  • this is only implements the solve_quadprog in such a way that its callable from Stan models (the previous effort of this thread were to implement the quadratic optimizer with its /rev implementation)
  • a heads up on licensing for those that might come to this thread at some later point: the solve_quadprog function is based on uQuadProg++ (as was the initial implementation from this thread) that is GPL licensed (the files contain the license details). I dont think that makes a difference if you use this with RStan and intend to redistribute, but might change your licensing plans if you would intend to redistribute with cmdstan.
1 Like

@rok_cesnovar @tvladeck I’m testing the quad_solver and the following model compiles. However, I want to pass in G from transformed parameters (see the commented line in transformed parameters). When I do that (and comment G out from the data block) I get the following error. Is it possible to pass in G from transformed parameters with slight modification to the solver code?

Error in compileCode(f, code, language = language, verbose = verbose) : 
  Compilation ERROR, function(s)/method(s) not created! file244c1a327cbb.cpp:6:36: warning: ISO C99 requires whitespace after the macro name
 #define STAN__SERVICES__COMMAND_HPP#include <boost/integer/integer_log2.hpp>
                                    ^
In file included from file244c1a327cbb.cpp:42:0:
C:/Users/spinkney/Code/misc/quadprog/quadprog.hpp: In instantiation of 'typename boost::math::tools::promote_args<T0__, T1__, T2__, T3__, T4__, typename boost::math::tools::promote_args<T5__, T6__>::type>::type model244c379a130b_test_namespace::solve_quadprog_1(const Eigen::Matrix<LhsScalar, -1, -1, 0>&, const Eigen::Matrix<RhsScalar, -1, 1>&, const Eigen::Matrix<LhsScalar, -1, -1, 0>&, const Eigen::Matrix<T_job_param, -1, 1>&, const Eigen::Matrix<T_V, -1, -1>&, const Eigen::Matrix<T_m0, -1, 1>&, Eigen::Matrix<T_m0, -1, 1>&) [with T0__ = stan::math::var; T1__ = double; T2__ = double; T3__ = double; T4__ = double; T5__ = double; T6__ = stan::math::var; typename boost::math::tool
In addition: Warning message:
In system(cmd, intern = !verbose) :
  running command 'C:/PROGRA~1/R/R-36~1.3/bin/x64/R CMD SHLIB file244c1a327cbb.cpp 2> file244c1a327cbb.cpp.err.txt' had status 1
Error in sink(type = "output") : invalid connection
functions {
  
  vector solve_quadprog(matrix G, vector g0, matrix CE, vector ce0,
                        matrix CI, vector ci0);
  
}

data {
    int<lower=1> K;                 // num predictors
    int<lower=0> T;                
    int<lower=0> J;                 
    vector[K] x;
    matrix[J, K] z;
    vector[K] v_prior;
    
    matrix[K, 1] CE;      
    vector[1] ce0;         
    matrix[K, 2 * K] CI;   
    vector[2 * K] ci0;    
    matrix[K, K] G;
}
transformed data {
  vector[K] g0 = -2 * x;
}
parameters {
    simplex[J] weight;
    real<lower=0> sigma; 
    positive_ordered[K] v_weights;
}
transformed parameters {
    vector[K] w_x_optim;
    row_vector[J] w_optim;
    // matrix[K, K] G = diag_matrix(2 * v_weights);
    
    w_x_optim = solve_quadprog(G, g0, CE, ce0, CI, ci0);
    w_optim = w_x_optim' / z';
}
model {
  sigma ~ normal(0.01, 0.05);
  weight ~ normal(w_optim, sigma);
}

The problem is that the underlying function expects matrices of doubles (data in Stan terms) but matrix parameters are matrices of stan::math::var.

I think there is a simple fix to this that should work, I can add that, not a big deal.

1 Like

Lifesaver! Thank you

1 Like

Would you know about how long this may take? I know that you’re busy on a lot of things and I really appreciate the help (plus this is an experimental feature!). I just need to set expectations for people I’m building this model for and let them know the delay and approximately for how long.

If there is anything I can do to help, let me know! Thanks again!

I am afraid I spoke too quickly and this is a bit more work than I thought. I wont have time to dig into this until Thursday afternoon/Friday.

Thanks for letting me know and, as always, don’t hesitate to let me know if you need anything from me to aid in ⛏️ and 🔧 the code

@spinkney Have you tested whether the posterior is continuous? If there is a discontinuity, you won’t be able to sample with HMC. This is what happened with the model @shoshievass and I tried to fit.

I spent some time working out conditions under which the derivative exists – and it’s not something you get in general when you have an inequality constraint. I haven’t made this public yet, but here are some results.

You want to solve for x, f(x, \psi) = 0 and propagate derivatives through the solution u = u(\psi) with respect to \psi. Mathematically this is only possible if \mathrm d u / \mathrm d \psi exists.

You also have the equality constraints, g_i(x) \le 0 for i = 1, ..., K and h(x) = 0. The Karush-Kuhn-Tucker (KKT) conditions provide necessary conditions for a feasible solution and tell you that, either an inequality doesn’t matter, i.e. remove it and you still get the same solution, or g_i(x) = 0. You then have boundary terms where the inequality doesn’t matter but g_i(x) = 0 nevertheless.

Formally, let u be the solution to the constrained optimization problem with K inequality constraints. Let u^{-i} be the solution to an identical optimization problem, without the i^\mathrm{th} constraint. We say u is on a boundary with respect to the i^\mathrm{th} inequality if u = u^{-i} and g_i(u) = 0.

Next introduce the Lagrange multiplier for the i^\mathrm{th} inequality, \mu_i.

Lemma: We are at a boundary point with respect to the i^\mathrm{th} inequality constraint if and only if \mu_i = 0 verifies the stationarity condition of the KKT.

And here’s the result to diagnose discontinuity.

Theorem: Let \nu contain all elements of \mu for which g_i(u) = 0. Suppose \mu_i = 0 solves the stationarity condition and g_i(u) = 0. In addition, suppose the numerical evaluation, through automatic differentiation, of the gradient, \partial \nu_i / \partial \psi is non-zero. Then the Jacobian \mathrm d u / \mathrm d \psi does not exist. On the other hand if \partial \nu_i / \partial \psi is zero for all elements of \nu, then \mathrm d u / \mathrm d \psi exists.

Typically, you avoid these problems if varying \psi doesn’t change whether or not the inequality affects the model – in which case you can rewrite the model without the constraint. On the other hand, if you have discontinuity boundaries, even when these have measure 0, you need to compute Hamiltonian trajectories through them and that will be an issue.

Do let me know if I misunderstood your problem setup, but I do believe the above applies.

PS: I do not mean to discourage experimentation. But having done something similar and thought about the problem, I did want to share.

5 Likes

I believe the posterior should be continuous as I’m solving a nested optimization problem that results in an optimal set of weights to produce a convex combination of the “true” output (which is continuous). I hope the inequality constraints shouldn’t be a problem as the region of exploration should be all around a positive continuous hull with a derivative existing at the degenerate case.

This is pretty cool and let’s us make sure we have a well formulated problem that Stan can handle.
If all my \mu are affine functions, doesn’t this imply that the jacobian is 1? So I think I may be good, unless I’m missing something.

Do you have more info on the types of problems where this works? Like, is it always the case if affine functions of the constraints work (or in wikipedia KKT page “linearity constraint qualification”)? It seems like the class of problems this works is much smaller than when not.

If you (inverse) transform U with V = a * U + b then the Jacobian is just dV/dU = a. If a is a constant then we don’t need to add it to Stan, but if a is a parameter, then we have to include the change-of-variables adjustment target += log(a);

Thanks @Bob_Carpenter, in my case a is just constant (data).

Exactly!

How do you define your \mu?

In the theorem I stated, \mu_i is the lagrange multiplier corresponding to the i^\mathrm{th} inequality constraint. You typically don’t deal with it explicitly; rather it is handled by the quadratic optimizer under the hood.

Here is a toy example to illustrate my point.

The goal is to minimize F(x) = x_1^2 + x_2^2 + x_1
subject to h(x) = 1 - x_1 = 0
and g(x) = x_1 - x_2 - \psi \le 0

In this example, \psi can be interpreted as a model parameter.

We can work this out analytically. From the equality constraint, x_1 = 1. We then minimize F(x) my making x_2 as close to 0 as possible subject to x_2 \ge 1 - \psi. There are two cases.

Case 1: \psi > 1, then x_2 = 0, and the inequality constraint does not matter.
Case 2: \psi \le 1, then x_2 = 1 - \psi.

Let’s now differentiate the solution, x, with respect to \psi. \mathrm d x_1 / \mathrm d \psi = 0. For x_2 we have

\begin{eqnarray*} \mathrm d x_2 / \mathrm d \psi & = & 0, \ \ \psi > 1 \\ & = & -1, \ \ \psi < 1 \\ & = & \mathrm{does \ not \ exist \ at} \ \psi = 1 \end{eqnarray*}

You would get the derivative if as you approached the boundary \psi = 1, the derivative of x_2 went to 0. This is what the theorem tells you, but as far as I can tell, this is a difficult condition to enforce – and things get more intricate when you have multiple constraints.

Now here’s the solution using the Lagrangian multipliers. We start by introducing the Lagrangian

\mathcal L(x, \omega, \mu) = F(x) + \omega h(x) + \mu g(x)

and solve for stationarity

\nabla_{x, \omega, \mu} \mathcal L = 0

So instead of solving for x, you solve for the augmented variable \tilde u = (x, \omega, \mu). You have some additional constraints: (i) primal feasibility, g(\tilde u) \le 0 and complementary slackness \mu g(\tilde u) = 0. Plug in the analytical solutions, and you’ll find that they work.

Now the equation f := \nabla \mathcal L = 0 gives you an implicit function theorem for the derivatives, namely \mathrm d \tilde u / \mathrm d \psi = - [\partial f / \partial \tilde u]^{-1} \partial f / \partial \psi. This is the term you need to control at the boundary. I hope this sheds some light, even though I omitted several steps.

So my problem is from Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Aspects
Journal of Economic Literature, forthcoming. Where I want to minimize the following wrt W = (w_2, \ldots, w_{J+1})'.

|| \mathbf{X_1} - \mathbf{X_0 W}|| = \left(\sum_{h=1}^k \nu_h (X_{h1} - w_2X_{h2} - \cdots - w_{J +1} X_{hJ+1})^2 \right) ^{1/2}

subject to h(w) = w_2 + \cdots + w_{J+1} = 1
and 0 \leq w_j \leq 1 for j \in \{2, \ldots, J+1\}

I’ve written the Stan program s.t. stan estimates \nu which gets fed in to the quadratic optimizer to produce \mathbf{W}. It seems that there should always be a solution for this problem that satisfies the constraints.

Hi @rok_cesnovar, first let me know if I should start a new thread or hop over to github to discuss this.

I can attempt to tackle updating the code to allow matrices of stan::math::var. I’m a noob with the Stan/math and C++ so I need a bit of guidance. Could you give me an outline of what files or things I would need to change/add?

I took a look at the adding functions to Stan wiki and I’m not sure if the “bit more work than I thought” comment is because you need to fork the stan/math repo, add the “things” and then compile Stan locally. Or if we can get away with continuing the current process by additional changes or files in the local directory that contains the quadprog.hpp file and calling includes in the R call to compile .stan file.