Quadratic optimizier

What if we did some sort of ‘finite differencing’ approach re: the state change? E.g. Recompute the algorithm a number of times to check for state changes? It would take a lot of time, but maybe we could figure out a sophisticated way to do this less-than-always but just enough to avoid getting stuck in regions of state on the boundary condition?

These kinds of quadratic programming models are fundamentally not smooth at the boundaries. The boundaries can’t be moved to infinity because you want to put values there, and you can’t approximate the gradients because they do not exist. One way you can tell that this kind of approach can’t work is that it would implicitly allow you to do variable selection, which is known to be an issue.

1 Like

Hello! I know this is a stale discussion but I found it as it is directly applicable to a problem I’m working on.

I am working on a state space model that is using a Kalman Filter. The issue is that we want to

  1. Constrain the states \alpha in “real” space to be above or below zero, and
  2. Use a QR decomposition as the predictors are highly correlated

To do 1 with 2 we need to ensure that R^{-1} \alpha > 0. If \gamma is our updated state, we need to replace it with \alpha s.t.

we minimize (\alpha - \gamma)^2
subject to R^{-1}\alpha > 0

which is a quadratic program.

I have looked through the code here, and am aware that you can include custom C++ code (e.g., here)

I am wondering if it would be as “trivial” as extracting the two files

stan/math/rev/mat/functor/eiquadprog.hpp and

and includeing them along with the stan code when it is compiled. Or, is this something where you have to compile the branch with this feature included and to use CmdStan? If the latter, I may have to try some other route, as I think that may be beyond me. Thank you!

Where did those two files come from? They’re not on the develop branch of Stan (also, the distinction among mat, arr, and scal has been collapsed on the develop branch.

To add a new function, you need to have a differentiable C++ function (either fully templated or specialized with analytic gradients) that only calls existing Stan functions (that includes all the C++ library functions). Then you need to declare the signature. You can do that with CmdStan or RStan.

It looks like @charlesm93 was the one who opened the issue, so maybe he’ll know more.

Apologies, I should have included some links. It is on this branch (issue-888-quadratic_optimizer), and here are the links to the two files:



I really appreciate your guidance; unfortunately my C++ skills are pretty much nil so it’s very hard for me to say whether or not the code at the links satisfies these constraints.

To add a new function, you need to have a differentiable C++ function (either fully templated or specialized with analytic gradients) that only calls existing Stan functions (that includes all the C++ library functions). Then you need to declare the signature. You can do that with CmdStan or RStan.

My guess is that it does not, since it appears that this implementation requires functions as arguments, which doesn’t seem to be supported in stan outside of the algebra solver

Then you need to declare the signature. You can do that with CmdStan or RStan.

I’m guessing (but just want to confirm) that you mean that you declare the signature inside of the stan program, and use this recipe for includeing the externally-defined C++ function.

Thanks again for the help here!

1 Like

Hi! I wrote these functions with @shoshievass a while ago, and we were able to interface them with Stan.

There is however a major problem: these kinds of quadratic programs typically introduce a discontinuity in the posterior distribution. So you can’t run Stan’s HMC, because it’ll get stuck at a boundary and ultimately it won’t give you a representative sample, unless your parameters are always on one side of the boundary (i.e. the boundary doesn’t matter).

I’m happy to help you, but I first need to make sure you won’t run into the above described issue. I strongly suggest going through the conversation (if you haven’t done so already). It may also be helpful to share a sketch Stan file to demonstrate how you would use the quadratic solver.

1 Like

Hi Charles, thank you so much for responding!

The model is a modified state space time series, and uses a Kalman Filter (actually a customized filter, but for all intents and purposes you can just think of it as a Kalman Filter) to compute the likelihood and to estimate the hidden states.

The issue is that we know that the states have to be strictly positive (or in some cases, strictly negative, but let’s ignore that).

The likelihood of the Kalman Filter is not conditional on the states, but rather the observed data and the system matrices that are fed in (y, S, and H below) (I’ve renamed the standard Q to S to avoid conflict with the concept of QR decomposition below). So, in contrast to your situation (I think), what’s being constrained here is not a parameter but rather a hidden state inside of the function that produces the likelihood. The parameters Q and S do not have these kinds of hard constraints, other than what is typical for a covariance matrix.

The sketch of the stan model is below. I’ve actually already built this model, but without a QR decomposition, and it seems to perform well at least as far as I can tell. To constrain the states in “normal” space, all I have to do is to take the max of the states and 0 at each recursion.

The issue is that my predictors (in Z) are highly correlated, and I’d like to apply the QR decomposition to Z before feeding it into the model. But when I do that, the constraints get more complex; it’s not simply that \alpha \ge 0 but rather R^{-1}\alpha \ge 0. This is where the quadratic program comes in. If at a recursion step we’re given an estimate of the states \gamma, we want to replace it with \alpha s.t. we minimze (\gamma - \alpha)^2 subject to R^{-1}\alpha \ge 0, which is the quadratic program.

functions {
  real custom_ssm_lpdf(y, Z, H, S, a1, P1) { 
    // constraining the states happens in here
data {
  vector[p] y[n];
  matrix[p, m] Z[n]; 
  vector[m] a1;
  matrix[m, m] P1;
parameters {
  cov_matrix[p] H;
  cov_matrix[m] S;
model {
  H ~ ...;
  Q ~ ...;
  y ~ custom_ssm_lpdf(Z, H, S, a1, P1);

I haven’t internalized your problem yet, but I can explain what @charlesm93 and I encountered in more detail. In our setting, the constrained objects were outcome variables (i.e. data).

We were modeling (essentially) the solution to a weighted portfolio optimization problem: each individual in our data chose a vector of bids (essentially portfolio weights) to maximize a risk-return tradeoff, which amounted to a quadratic program that is parameterized by the individual’s latent ‘type’.

The weighted sum of the bids had to equal a constant (which was data) – this is an easy linear constraint. However, each individual bid had to be non-negative (since, well…bids can’t be negative inr real life in this setting). In our data, the bids were always non-negative, of course.

The troublesome part was that depending on different parameters of our problem, the non-negativity constraint may bind on different components – not in making those components negative, but in making those components zero. This is where the non-existence of the Jacobian came in and would throw off the sampler.

From my brief read of your problem, it sounds like you might encounter a similar issue. For what it’s worth, I think there are solutions that can be found – I’ve made some progress in my own problem on understanding when and where the boundary constraint should bind in terms of the problem parameters, though I haven’t revisited the estimation problem as of yet.

1 Like

Thank you so much @shoshievass and @charlesm93 for (1) stepping in to help, especially after so much time has passed, and (2) to explain in such detail why this may not be such a good idea.

I think, regardless, I’d like to give it a shot, if only to scratch this itch. Probably it’ll go wrong in all the same ways.

So, to @charlesm93, you said that you were

were able to interface them with Stan.

I am curious about the details here. This is likely to be an ill-posed question (as I don’t know enough about how C++ programs compile) but here goes:

Stan allows the inclusion of external C++ functions, which as @Bob_Carpenter puts it:

[must be] differentiable … (either fully templated or specialized with analytic gradients) that only calls existing Stan functions (that includes all the C++ library functions). Then you need to declare the signature.

You “declare the signature” inside the stan functions block.

The process for including external C++ files is spelled out here. I am wondering if you could apply this process to the files you’ve defined, or if that’s just the wrong way to think about it.

Where I’m getting tripped up (again because I have little background on how C++ programs work) is whether or not the way you did it, the quadratic_optimizer function was compiled “inside of” stan, or rather that what you did was, or would have been, equivalent to compiling a stan model along with just those files.

I hope that makes sense but understand that it may not!

The process for including external C++ files is spelled out here

This looks cool. I wasn’t aware of this feature, so you could give it a try, and report whether it works. You can link to the C++ file we wrote, which gives you a differentiable function. I wonder if this will work on higher-order functions.

I am curious about the details here.

Long story short, the details are tricky, because we used a higher-order function, i.e. a function that takes other functions as arguments. You can’t directly use the functions we wrote because the math and stan repos have been refactored. That said, the math is pretty much done; for the stan part, you’ll want to edit the stanc3 compiler. You can follow the example of the 1d integral. It’ll probably be easier to use cmdStan instead of RStan.

Where I’m getting tripped up (again because I have little background on how C++ programs work) is whether or not the way you did it, the quadratic_optimizer function was compiled “inside of” stan, or rather that what you did was, or would have been, equivalent to compiling a stan model along with just those files.

The quadratic optimizer was compiled inside of Stan.

I think, regardless, I’d like to give it a shot, if only to scratch this itch.

I’m happy to help, but not interested in putting in the coding effort to repeat the experiment Shosh and I already did. There is a larger (and difficult) problem at hand here that involves working on HMC and its utility across boundaries; see the discussion above.

@charlesm93 you’re so kind to offer to help, but given what the situation is here, and the situation with my ability with C++ (zero), this doesn’t appear very tractable. I just don’t have much to offer in terms of getting these functions (a) updated to the current status of stan and/or (b) interfaced with stan, either by being compiled within stan or written to be included with a stan model.

I’ll think about this some more, but appreciate all the help on this thread. I’ll of course keep everyone updated if I happen to get things working.

What would help in determining the (short term) feasibility of the project is detailing the content of the custom_ssm_lpdf function. Specifically write where the function quadratic_optimizer() comes into play.

Hi Thomas,

C++ with zero experience (yeah count me in too) sounds really hard, but it is not really that hard after you have some idea how things work in Stan.

There are couple of things you might consider.

First add a new function in to a new .hpp file and then use external C++ example to include it into the Stan model. I think if you add include statement + add these files (quadratic_optimizer.hpp, eiquadprog.hpp) in the same folder with your external .hpp they should work as intended.

Stan model is compiled at the building phase. So first .stan is translated to .hpp file (and external c++ is actually injected here + path information), then C++ compiler gathers everything together (Stan math files etc) and compiles a binary which is used for sampling.

To add a new function to Stan language (which you don’t need to inject as external C++) is a bit more involved procedure where you would need to update a bunch of files manually. But tbh I’m not really too familiar with this process.

There is a problem with this: the stan_model includes are not included at the top of the file, but are instead included within the model namespace. This makes writing simple functions easier, but if you try to include an additional header from boost or Stan it fails with a long list of incomprehensible messages :-( I didn’t really find a workaround for that.

So user would need to manually edit the model hpp file?

In PyStan I have done so that I translate .stan to .hpp, then save it to a file, edit and read back to python and replace the hpp string in stanc output which can be given to StanModel instance.

I think this should be also possible with RStan.

I remember this not being easily possible in RStan as even when you modified the result of the stanc call and pass the result of stanc via stan_model(stanc_ret = my_ret), the C++ code my_ret is actually ignored and regenerated :-(. I wanted to file an issue for this, but then I didn’t really needed it and let it go.

I remember experimenting with closing and reopening the namespaces within the includes - something like

stan_model( ... 
  includes = paste0(
"} //end namespace
#include <boost/something/i/neeeded.hpp>
namespace ", 
stanc_result$model_cppname, "_namespace { // reopen namespace \n")

but don’t even remember if that worked in the end or not. Also it is a while since I tried this maybe a newer version of RStan lets you avoid those hacks.

If you #include "bar.hpp" in the foo.hpp C++ file that defines the foo function declared in the functions block of Stan language, it is supposed to work.

@charlesm93 absolutely. i have removed the parts of the filter that may be confusing and just kept this as an orthodox kalman filter (actually even a simpler one, since I don’t include a transition matrix or a state covariance selection matrix). The “constraining” happens near the end, after we update x after making an observation.

matrix conditional_dist_joint_normal(vector mean_state, vector mean_obs, vector obs, matrix var_state, matrix var_obs, matrix covar_state_obs) {
  // updates a joint normal distribution based on an observation
  int m = dims(mean_state)[1];
  matrix[m, m+1] ret;
  vector[m] mn;
  matrix[m, m] v;
  mn = mean_state + covar_state_obs * inverse(var_obs) * (obs - mean_obs);
  v = var_state - covar_state_obs * inverse(var_obs) * covar_state_obs';
  ret[,1] = mn;
  ret[,2:m+1] = v;

matrix kalman_filter(
  vector x1, // column vector
  matrix Px1, // m x m
  matrix Q, // m x m // process covariance
  matrix H, // observation variance
  matrix Z, // n x m // measurement equation
  matrix y, // observations (assumes dimension = 1)
  matrix Rinv, // not used, assumed to be needed to constrain states based on QR factorization
  vector lb) { // lower bound on constraint

    int n = dims(y)[1];
    int p = dims(y)[2];
    int m = dims(Z)[2];
    int n_cols_return = (m+p+1);
    int n_rows_return = (n+m);
    matrix[n_rows_return, n_cols_return] ret = rep_matrix(0, n_rows_return, n_cols_return); // this is the big matrix we'll return
    matrix[n, m] states;
    matrix[n, p] predictions;
    matrix[n, 1] loglik;
    vector[m] x;
    matrix[m, m] Px;
    row_vector [m] Zt;
    vector[p] mean_obs;
    matrix[p, p] var_obs;
    matrix[m, m+1] upd_x;
    x  = x1;
    Px = Px1;
    for(t in 1:n) {
      Zt = Z[t, ];
      if(t > 1) {
        x = x;
        Px = Px + Q;
        predictions[t, ] = to_row_vector(Zt * x);
      // likelihood
      mean_obs = Zt * x;
      var_obs  = Zt * Px * Zt' + H;
      loglik[t, 1] = normal_lpdf(y[t, 1] | mean_obs[1], sqrt(var_obs[1, 1]));
      upd_x  = conditional_dist_joint_normal(x, mean_obs, y[t, 1], Px, var_obs, Px * Zt');
      x  = upd_x[, 1];
      Px = upd_x[, 2:m+1];
      // this is the hypothetical function that would find a solution 
      // xhat such that
      // we minimize (x-xhat)^2, subject to
      // Rinv * xhat >= lb
      // x  = constrain_a(x, Rinv, lb);
      // record states
      states[t, ] = x';
  ret[1:n, 1:1]  = loglik;
  ret[1:n, 2:(p+1)] = predictions;
  ret[1:n, (p+2):(m+p+1)] = states;
  ret[(n+1):(n+m), (p+2):(m+p+1)] = Px;;

1 Like

So if I understand everything right, here’s the procedure:

  1. I create a new file quadratic_optimizer_exposed_to_stan.hpp
  2. In that file, I #include other_necessary_files.hpp (and presumably, those can #include still more files, etc)
  3. In the quadratic_optimizer_exposed_to_stan.hpp, I write a function that uses only existing stan functions and those that have been #included via the procedure above
  4. In the stan model file, I declare the function signature
  5. I use this procedure during model compilation to incorporate the function defined in quadratic_optimizer_exposed_to_stan.hpp into my stan model

However, because of step 4, I don’t believe simply applying these steps to the exisitng code would work, since that function accepts functors as arguments, and this type doesn’t exist in stan. If I am right about that, this means that regardless, a new function needs to be written to go down this route.

Additionally, in step 3, I am curious what defines the universe of callable functions. Is it all the functions defined in Stan, plus all the libraries that Stan depends on? For example, would an external function like this be able to call Eigen functions, like eiquadprog?

If so, would it be possible to essentially write a function that was a thin layer over Eigen::solve_quadprog to have it accept and return stan types? (I’m getting very far ahead of my skis now)

1 Like

That is true because the function signature you declare in step (4) cannot involve a function.

Yes, in C++, there are no limits on which C++ functions can be called, particularly if you #include their implementations yourself. In particular, all Eigen functions defined for dense matrices can be called without needing to #include more headers, but you have to prefix them with Eigen:: because they are in the Eigen namespace rather than the Stan one.

1 Like