Using stanheaders in my own R package

I have been using Rcpp and RcppArmadillo in my R packages for a while now, but now I’d like to use some AD. I need some help/guidance as I try to understand how to use stanheaders.

I am able to reproduce the gradient example from the stanheaders vignette, and include it in a new and otherwise empty R package.

For those who are interested, this is my makevars file (works even with windows):

CXX14="$(BINPREF)g++ $(M_ARCH) -std=c++1y"
CXX14FLAGS="-O3 -Wall" 
CXX14STD=-std=gnu1y
CXX11FLAGS=-O3-std=gnu++11
PKG_CXXFLAGS = -std=c++14

This is the linking to line in my DESCRIPTION file:

LinkingTo: Rcpp (>= 0.12.7), RcppEigen (>= 0.3.3.3.0), StanHeaders (>= 2.12.0), BH (>= 1.66.0)

I added a new .cpp file, and pasted the sample code from the vignette, calling the fuction gradExample:

#include <Rcpp.h>
#include <stan/math.hpp>
using namespace Rcpp;

// [[Rcpp::export]]
Eigen::VectorXd gradExample(Eigen::VectorXd x, Eigen::VectorXd a) {  // gradient by AD using Stan
    double fx;
    Eigen::VectorXd grad_fx;
    using stan::math::dot_self;
    stan::math::gradient([&a](auto x) { return dot_self( (x - a).eval() ); },
                         x, fx, grad_fx);
    return grad_fx;
}

It works fine, and just as expected.

As a next toy example, I wanted to move from scalars to vectors, so I decided to compute the partial derivative wrt x in Multivariate Normal distribution. I know, this is just a toy example to learn how stanheaders works. I call the function MultiNormalLpdfG.

// [[Rcpp::export]]
Eigen::VectorXd MultiNormalLpdfG(Eigen::VectorXd x, Eigen::VectorXd mu, Eigen::MatrixXd  sig) {  
  
  double fx;
  Eigen::VectorXd grad_fx;

  using stan::math::multi_normal_lpdf;
    stan::math::gradient([&mu,&sig](auto x) { return  multi_normal_lpdf(x,mu,sig) ; },
                       x, fx, grad_fx);

  return grad_fx;
}

Again, this works fine. It is a lot faster than using grad, and I also tested that results are consistent.

library(numDeriv)
library(microbenchmark)
library(mvtnorm)

microbenchmark(grad=grad(func = function(x){dmvnorm(x,c(0,0),diag(2)+.5,log=T)}, c(1:2))
, ad=MultiNormalLpdfG(1:2,c(0,0),diag(2)+.5)
)

So as long as I can put my target function into a lambda expression, using only stan math functions, I should be fine.

However, what if I need/want to create functions, and then call these functions? AD can be tricky, so I am curious about what’s possible here.
For my toy example, I create a function MultiNormalLpdftemp, which simply calls multi_normal_lpdf, but returns the result in a double.

// [[Rcpp::export]]
double MultiNormalLpdftemp(Eigen::VectorXd x, Eigen::VectorXd mu, Eigen::MatrixXd  sig) {  
  
  double out;
  using stan::math::multi_normal_lpdf;
  out=multi_normal_lpdf(x,mu,sig);
    
  return(out);
}

Now, what doesn’t work:
(1) lambda, calling custom function
MultiNormalLpdfG2 does not compile. I am calling MultiNormalLpdftemp instead of multi_normal_lpdf. I tried using to_var or not using to_var, but it does not compile.

Eigen::VectorXd MultiNormalLpdfG2(Eigen::VectorXd x, Eigen::VectorXd mu, Eigen::MatrixXd  sig) {

  double fx;
  Eigen::VectorXd grad_fx;
  using stan::math::to_var;
  stan::math::gradient([&mu,&sig](auto x) { return  (to_var(MultiNormalLpdftemp(x,mu,sig).eval())) ; },
                       x, fx, grad_fx);

  return grad_fx;
}

(2) calling custom function
I also tried using .grad() and .adj(), however, Eigen::VectorXd does not support overloading / .adj() and stan::math::var does not seem to work with vectors.

// [[Rcpp::export]]
Eigen::VectorXd MultiNormalLpdfG3(Eigen::VectorXd x_, Eigen::VectorXd mu, Eigen::MatrixXd  sig) {
  
  using stan::math::to_vector;
  //  stan::math::var  x = to_vector(x_);
  Eigen::VectorXd  x = to_vector(x_);
  
  // function itself
  stan::math::var lp = MultiNormalLpdftemp(x,mu,sig);
  
  // Gradient
  lp.grad();


  //stan::math::recover_memory();
  
  return x.adj();
}

I’d be really glad for hints/pointers/help. I intentionally kept the examples super minimal.

3 Likes

For this

Eigen::VectorXd MultiNormalLpdfG2(Eigen::VectorXd x, Eigen::VectorXd mu, Eigen::MatrixXd  sig) {

  double fx;
  Eigen::VectorXd grad_fx;
  using stan::math::to_var;
  stan::math::gradient([&mu,&sig](auto x) { return  (to_var(MultiNormalLpdftemp(x,mu,sig).eval())) ; },
                       x, fx, grad_fx);

  return grad_fx;
}

you don’t want to to_var the result of a call to a function. Rather, you want to promote some of the inputs to vars and call the function, which is a bit closer to your second example. Keep following the last example in the StanHeaders vignette, and you might get something like this, which compiles:

// [[Rcpp::depends(BH)]]
// [[Rcpp::depends(RcppEigen)]]
// [[Rcpp::depends(StanHeaders)]]
// [[Rcpp::plugins(cpp14)]]
#include <stan/math.hpp>
#include <Rcpp.h>
#include <RcppEigen.h>

// [[Rcpp::export]]
auto MultiNormalLpdfG3(Eigen::VectorXd x_, Eigen::VectorXd mu, Eigen::MatrixXd  sig) {
  
  stan::math::vector_v  x = stan::math::to_var(x_);
  
  // function itself
  stan::math::var lp = stan::math::multi_normal_lpdf(x, mu, sig);
  
  // Gradient
  std::vector<double> gradient;
  std::vector<stan::math::var> y;
  for (int i = 0; i < x.rows(); i++) y[i] = x.coeffRef(i);
  lp.grad(y, gradient);
  
  return gradient;
}
1 Like

Thanks.

It compiles, but it crashes. At least on my windows machine.
I am trying to track down why.

EDIT:

  for (int i = 0; i < x.rows(); i++) y[i] = x.coeffRef(i);

appears to be causing the code to crash when run.

So because grad()'s first argument needs to be std::vector, you copy the vector x into the vector of vars in an element-wise fashion.
(1) I think that this should work, so I am puzzled as to why it crashes
(2) Isn’t there a better way anyway? x and y must both be static, so why can’t we use a pointer of some sort?

Thanks again, bgoodri.
I had to modify your code for it to work. The size of y had to be set when initializing it. See working code below:

// [[Rcpp::export]]
std::vector<double> MultiNormalLpdfG3(Eigen::VectorXd x_, Eigen::VectorXd mu, Eigen::MatrixXd  sig) {
  
  stan::math::vector_v  x = stan::math::to_var(x_);
  
  // function itself
  stan::math::var lp = stan::math::multi_normal_lpdf(x, mu, sig);
  
  // Gradient
  std::vector<double> gradient;
  std::vector<stan::math::var> y(x.rows());
  for (int i = 0; i < x.rows(); i++) y[i] = x(i);
  lp.grad(y, gradient);
  
  return gradient;
}

However, the main goal was to use MultiNormalLpdftemp (as a placeholder for some custom function) instead of multi_normal_lpdf.

I modified the code accordingly:


// [[Rcpp::export]]
std::vector<double> MultiNormalLpdfG4(Eigen::VectorXd x_, Eigen::VectorXd mu, Eigen::MatrixXd  sig) {
  
  stan::math::vector_v  x = stan::math::to_var(x_);
  
  // function itself
  stan::math::var lp = MultiNormalLpdftemp(x, mu, sig);
  
  // Gradient
  std::vector<double> gradient;
  std::vector<stan::math::var> y(x.rows());
  for (int i = 0; i < x.rows(); i++) y[i] = x(i);
  lp.grad(y, gradient);
  
  return gradient;
}

However, I am getting a “cannot convert const stan::math::var to double in assigment” error.

I am also still a little confused over the mix of vectors and vectors of vars.

1 Like

If it is this,

// [[Rcpp::export]]
double MultiNormalLpdftemp(Eigen::VectorXd x, Eigen::VectorXd mu, Eigen::MatrixXd  sig) {  
  
  double out;
  using stan::math::multi_normal_lpdf;
  out=multi_normal_lpdf(x,mu,sig);
    
  return(out);
}

then, yeah, that won’t work because out is double and the result of multi_normal_lpdf is a var if any of its inputs are of var type. Just declare out as a stan::math::var.

I can rewrite the function like this:

stan::math::var MultiNormalLpdftemp(Eigen::VectorXd x, Eigen::VectorXd mu, Eigen::MatrixXd  sig) { 
stan::math::var out;
using stan::math::multi_normal_lpdf;
out=multi_normal_lpdf(x,mu,sig);
return(out);
}

However, the first argument is a VectorXd.

Then, this does not compile (cannot convert const stan::math::var to double in assignment):

// [[Rcpp::export]]
std::vector<double> MultiNormalLpdfG4(Eigen::VectorXd x_, Eigen::VectorXd mu, Eigen::MatrixXd  sig) {

stan::math::vector_v  x = stan::math::to_var(x_);

// function itself
stan::math::var lp = MultiNormalLpdftemp(x, mu, sig);

// Gradient
std::vector<double> gradient;
std::vector<stan::math::var> y(x.rows());

for (int i = 0; i < x.rows(); i++) y[i] = x(i);
lp.grad(y, gradient);

return gradient;
}

and changing stan::math::var lp = MultiNormalLpdftemp(x, mu, sig); to stan::math::var lp = MultiNormalLpdftemp(x_, mu, sig); gives the wrong result.

Right. If you declare out as a var, then some input to a function like multi_normal_lpdf must be of var type in order for it to know to take derivatives. What do you mean the answer is wrong?

Thanks.
OK, so what I gather is that (1) the output of the function (that’s to be differentiated) needs to be stan::math::var and the input that I want to differentiate with respect to needs to be some stan::math type, either stan::math::var for one-dimensional functions/scalar function input, or stan:;math::vector for vector inputs.

I still don’t have perfect clarity over all the classes and types, as it’s hard working through the API documentation. The elegant code and liberal use of ‘auto’ takes some time to get used to, but it’s starting to make more sense now.

So what I wrote then, was the following:

stan::math::var MultiNormalLpdftemp(stan::math::vector_v x, Eigen::VectorXd mu, Eigen::MatrixXd  sig) {  
  
  stan::math::var out;
  using stan::math::multi_normal_lpdf;
  out=multi_normal_lpdf(x,mu,sig);
  
  return(out);
}

// [[Rcpp::export]]
std::vector<double> MultiNormalLpdfG4(Eigen::VectorXd x_, Eigen::VectorXd mu, Eigen::MatrixXd  sig) {
  
  stan::math::vector_v  x = stan::math::to_var(x_);
  
  // function itself
  stan::math::var lp = MultiNormalLpdftemp(x, mu, sig);
  
  // Gradient
  std::vector<double> gradient;
  std::vector<stan::math::var> y(x.rows());
  
  for (int i = 0; i < x.rows(); i++) y[i] = x(i);
  lp.grad(y, gradient);
  
  return gradient;
}

This works, and gradients are correct.

I still have some questions:

  • Is my understanding (at least mostly) correct?

  • Should I put stan::math::recover_memory(); in there?

  • What are the boundaries of what I can do within MultiNormalLpdftemp? Going back and forth between double and stan::math::var breaks AD. The code below compiles, but the gradient is wrong.

stan::math::var MultiNormalLpdftemp3(stan::math::vector_v x, Eigen::VectorXd mu, Eigen::MatrixXd  sig) {  
  
  stan::math::var out;
  using stan::math::multi_normal_lpdf;
  double outtemp = multi_normal_lpdf(x,mu,sig).val();
  out = stan::math::to_var(outtemp);
  
  return(out);
}

// [[Rcpp::export]]
std::vector<double> MultiNormalLpdfG6(Eigen::VectorXd x_, Eigen::VectorXd mu, Eigen::MatrixXd  sig) {
  
  stan::math::vector_v  x = stan::math::to_var(x_);
  
  // function itself
  stan::math::var lp = MultiNormalLpdftemp3(x, mu, sig);
  
  // Gradient
  std::vector<double> gradient;
  std::vector<stan::math::var> y(x.rows());
  
  for (int i = 0; i < x.rows(); i++) y[i] = x(i);
  lp.grad(y, gradient);
  
  return gradient;
}
  • Speed/Pointers (see below) - is there a faster way to do this?
> microbenchmark(g4=MultiNormalLpdfG4(1:2,3:4,diag(2)),g=MultiNormalLpdfG(1:2,3:4,diag(2)))
Unit: microseconds
 expr    min     lq     mean median      uq    max neval cld
   g4 21.321 24.323 27.77077 27.476 30.0290 48.647   100   b
    g  8.408  9.009 10.82857  9.909 10.2105 51.950   100  a 

and again, thanks for your patience

Yes, in particular, you have to have var input in order to have var output and you have to have var output in order to differentiate with respect to the var input.

I wouldn’t if you are going to call that function over and over again. Maybe write another function that just frees up the memory that you can call when you are done.

The MultiNormalLpdftemp3 function does not make sense. You call multi_normal_lpdf with var input but immediately call the .val() method, which returns the double. At this point, all the gradient information is lost and calling to_var at the end doesn’t go back in time to restore the gradient.

1 Like

Thanks so much. I think I understand it much better now.

Just to be sure, let’s take a look at another quick toy example. Let’s suppose I wanted to get the gradient of the MNL likelihood wrt beta, and let’s suppose I am too lazy to implement the analytical solution.

I wrote this, based on the bayesm function:

//my target function: mnl likelihood 
stan::math::var llmnl(stan::math::vector_v beta, Eigen::VectorXd y, Eigen::MatrixXd X){
  
  int n = y.rows();
  int j = X.rows()/n;
  stan::math::vector_v Xbeta = X*beta;

  stan::math::vector_v xby(n);
  stan::math::vector_v denom(n);

  for(int i = 0; i<n;i++){  
    xby[i]=0.0;
    denom[i]=0.0;
    for(int p=0;p<j;p++) denom[i]=denom[i]+exp(Xbeta[i*j+p]);
    xby[i] = Xbeta[i*j+y[i]-1];
  }

  return(sum(xby - log(denom)));
}

//access to function itself
// [[Rcpp::export]]
double llmnltest(Eigen::VectorXd beta_, Eigen::VectorXd y, Eigen::MatrixXd X){
  stan::math::vector_v beta = stan::math::to_var(beta_);
  return(llmnl(beta, y, X).val());
}


//gradient
// [[Rcpp::export]]
std::vector<double> llmnlG(Eigen::VectorXd beta_, Eigen::VectorXd y, Eigen::MatrixXd X) {
  
  stan::math::vector_v beta = stan::math::to_var(beta_);
  
  // function itself
  stan::math::var lp = llmnl(beta, y, X);
  
  // Gradient
  std::vector<double> gradient;
  std::vector<stan::math::var> yg(beta.rows());
  
  for (int i = 0; i < beta.rows(); i++) yg[i] = beta(i);
  lp.grad(yg, gradient);
  
  return gradient;
}

I compared it to the numerical solution:

library(numDeriv)
grad(func = function(x){bayesm::llmnl(x,simout$y,simout$X)}, simout$beta)
llmnlG(simout$beta,simout$y,simout$X)

The results appear to agree (based on some simulated data example).

So it seems to work quite nicely. I guess the important thing to remember is that anything that’s not a constant needs to be some kind of (scalar,vector,matrix) var type to allow for AD.

What can I do to reduce overhead and copies?
Any other comments?

Pass anything bigger than a scalar by reference instead of by value.

Well, using pointers and the lambda expression, it runs like 14 times faster.

// [[Rcpp::export]]
Eigen::VectorXd llmnlGfast(Eigen::VectorXd beta, Eigen::VectorXd y, Eigen::MatrixXd X){
  double fx;
  Eigen::VectorXd grad_fx;
  
  
  stan::math::gradient([&y,&X](auto beta){ 
    int n = y.rows();
    int j = X.rows()/n;
    stan::math::vector_v Xbeta = X*beta;
    
    stan::math::vector_v xby(n);
    stan::math::vector_v denom(n);
    
    for(int i = 0; i<n;i++){  
      xby[i]=0.0;
      denom[i]=0.0;
      for(int p=0;p<j;p++) denom[i]=denom[i]+exp(Xbeta[i*j+p]);
      xby[i] = Xbeta[i*j+y[i]-1];
    }
    
    return(sum(xby - log(denom))); },
                       beta, fx, grad_fx);
  return grad_fx;
}

Interestingly, one could even accelerate optim with this:

microbenchmark(
  g=optim(c(0,0,0,0), llmnl,llmnlGfast,method = 'BFGS',control=list(fnscale=-1), y=simout$y,X=simout$X),
nog=optim(c(0,0,0,0), llmnl,method = 'BFGS',control=list(fnscale=-1), y=simout$y,X=simout$X))

yields

Unit: milliseconds
 expr      min       lq     mean   median       uq      max neval
    g 2.447735 2.535573 2.649400 2.587795 2.697065 3.793399   100
  nog 4.975157 5.110537 5.336379 5.254221 5.395941 8.608572   100

for some synthetic data (n=200)

However, I’d like to better understand advanced constructors of stan::mat::var.

There is not all that much to understand. It is just a C++ class that holds a double-precision value and a sensitivity that allows the autodiff to work. There is more about reverse mode AD in Stan at