Vectorized Autodiff with StanHeaders in Rcpp

This webpage demonstrates how one may use StanHeaders in Rcpp to perform algorithmic differentiation (AD). The illustration is for a single, scalar evaluation.

I would like to compute the gradient and hessian, each once, and evaluate them many times over my vector of (data) observations. The Stan manual notes that vectorized prior specifications are more computationally efficient due to AD, which I assume is precisely because of the opportunity to compute once and evaluate many. All to say, is it possible to compute a gradient and evaluate it at multiple data points using StanHeaders in Rcpp? Are you able to point me to any examples or provide a simple set-up from which I may build?

This question is motivated by this paper that corrects MCMC draws from a survey sampling weight-exponentiated pseudo posterior such that the credibility sets contract on frequentist confidence intervals. The correction employs the gradient and Hessian of the log likelihood.

That’s not it but

Yes. Nowdays, the easiest thing is to define a lambda function in C++ that evaluates your objective function. Then call stan::math::grad_hessian on it. There is a unit test at

Also, you should check that it actually works.

Thank you for your answer. So my objective function is “lp”.

  1. May I compute a function, assign it “lp” and then use grad_hessian() as a member function?
  2. “x” appear to be the parameters with which to compute the derivatives. How would I pass in a vector of multiple data observations - say “n” observations - and evaluate the derivatives (score function and hessian) for all of the observations for a single computation of the derivatives with respect to x?
    – I’d like to compute derivatives evaluated at a set of parameter values for multiple data points. My function, “f” (lp) is a function of both parameters and data.

No, or else I don’t understand the question

Your lambda function should be written capture the data by reference, but it is a function of the parameter vector.

You earlier wrote that I should write a lambda function to evaluate my objective function and then call stan::math::grad_hessian on it. You further wrote that I show write the lambda function that passes in the data by reference. Is the “g” function here an example of a lambda function (though data vector, a, is not passed by reference and it evaluates the gradient of “f”)?

That’s a regular function. A generalized lambda function would look something like

auto obj = [&x](auto theta) {
  ...
}

Thanks for your continued responses, Ben. Is there a minimal example available for a lambda function? It seems like this lambda function is where it call comes together in terms of inputting data, x, parameters, theta, and evaluating grad_hessian() on the objective function, “f”.

Even better if it (the lambda function) is implemented in Rcpp. Regarding the possibility of an Rcpp implementation, I wonder if I may pass in usual Eigen::VectorXd types, including inputting the data, by reference, and then defining a new variable for theta declared as “auto” by re-casting the input using stan::math::to_var().

It appears that Rstan has some of this functionality via the grad_log_prob() function for the stanfit object: log_prob and grad_log_prob functions — log_prob-methods • rstan.

So the idea is that you create a custom lp in a .stan file, use Rstan to sample with 0 chains, which throws a warning but still creates the stanfit object. Then use grad_log_prob() to get an S4 class function. Since the Stan model is called with the full data as an argument (vector for y and matrix for X) this seems to partially resolve the question of applying to a vector of input data.

I recently used some awkward string manipulations in R along with the deriv() function to create functions for the gradient and hessian of lp for a logistic regression for a toy aymptotics example. I know this is a simple example that others have asked about (Automatic differentiation with stan math - #16 by bbbales2) but it demonstrate the usefulness of stan and autodiff which readily applies to a general lp. See comparisons below.

One current limitation with this Rstan solution is that only the gradient of the lp and not the hessian is provided. Is there a way to get the hessian via Rstan, or is a more custom approach via Rcpp and StanHeaders the best option?

Another possible limitation is if we generated new data (y,X), currently we’d need to create another stanfit object with this data as input. Is there a way to create an object like that produced by grad_log_prob() that also accepts both the parameters and the data as input?

RCode:

    #compare deriv and Rstan for evaluating gradient and hessian
    #Logistic regression with d covariates using deriv#

    ##Define lp and calculate grad and hessian##

    #change d > 0 and repeat
    d <- 1

    xbeta <- paste("beta0 +", paste("beta", 1:d, "*", "x", 1:d, sep = "", collapse = " + "), sep = "")
    beta <- paste("beta", 0:d, sep = "")
    xd <- paste("x", 1:d, sep = "")
    frm.str <- paste("~ (y *log((1+ exp(-1*(",
    		xbeta,
    		")) )^-1) + (1-y)*log(1-(1+ exp(-1*(",
    		xbeta,
    		")) )^-1))", sep = "")

    arg <- c("y", xd, beta)

    #R function for gradient and hessian
    d1th <- deriv(as.formula(frm.str), namevec =beta, function.arg = arg, hessian = TRUE, func = TRUE)


    #Generate y and set beta

    n <- 100

    beta0 <- 1

    for(i in 1:d){
    	eval(parse(text = paste("x", i," <- rnorm(n,0,1)", sep = "") ) )
    	eval(parse(text = paste("beta", i," <- 1/i", sep = "") ) )
    }

    #linear predictor
    mu <- eval(parse(text = xbeta))
    theta <- plogis(mu)

    #generate y
    y1 <- rbinom(n = n, size = 1, prob = theta)

     

    #evaluate over a vector of y, x values, beta
    argv <-  paste(c("y1", paste("x", 1:d, sep = ""), paste("beta", 0:d, sep = "")), collapse = " , ")
    Dth <- eval(parse(text = paste("d1th(", argv ,")", collapse =" ") ) )

    #sum over observations to get total gradient and hessian
    lphat1 <- sum(Dth)
    Hhat1 <- colSums(attr(Dth, "hessian"), dims = 1)
    Ghat1 <- colSums(attr(Dth, "gradient"))

Output using deriv()

lphat1
[1] -44.42416
Ghat1
beta0 beta1
5.9741254 0.5186645

Using Rstan

#Reproduce in Rstan?
library(rstan)
mod  <- stan_model('logistic.stan')# compile stan code

y <- y1
X <- eval(parse(
	text = paste(c("cbind(1, ",paste("x", 1:d, sep = "", collapse =", "), " )"), collapse = "")
	))

k   <- dim(X)[2]
n   <- length(y)

out_stan  <- sampling(object = mod, data = list(y = as.vector(y), X = X, k = k, n = n),
                      pars = c("beta"),
                      chains = 0,
                      )

betav <- eval(parse(
		text = paste(c("c(", paste(beta, collapse =  ", "), ")"), collapse ="")  
		))

tfun <- function(beta) log_prob(out_stan, beta)
tgrfun <- function(beta) grad_log_prob(out_stan, beta)

tfun(betav)
tgrfun(betav)#compare to lphat1 and Ghat1

Output using Rstan

tgrfun(betav)#compare to lphat1 and Ghat1
[1] 5.9741254 0.5186645
attr(,“log_prob”)
[1] -44.42416

Stan file (logistic.stan)

 /* Logistic regression*/
data {
    int<lower=1> n; // number of observations
	int<lower=1> k; // number of linear predictors
    int<lower=0, upper = 1> y[n]; // Response variable
    matrix[n, k] X; // coefficient matrix
}

parameters{
  vector[k] beta; /* regression coefficients from linear predictor */
}

transformed parameters{
  vector[n] mu;
  mu    = X * beta; /* linear predictor */
} /* end transformed parameters block */

model{
  /*improper prior on beta in (-inf,inf)*/
  /* directly update the log-probability for sampling */
  target          += bernoulli_logit_lpmf(y | mu);
} /* end model{} block */
1 Like

In the “using Rstan” section of @mrwilli’s post, above, we may use the log_prob(stanfit object) as a function input to compute the Hessian in a couple of different ways. The approach of @mrwilli has the advantage of creating a Stan model object, once, and then using it, multiple times. The computation of the gradient and hessian remain at the R layer, relying on underlying C++ code embedded in the Stan package.

library(nlme)
fdHess(betav, function(x)log_prob(out_stan, x))$Hessian ## compare to Hhat1
optimHess(betav, function(x)log_prob(out_stan, x)) ## compare to Hhat1

.

If you go this route (which is what rstan::optimize does), you want to numerically differentiate the reverse mode autodiff gradient to get the Hessian. So, call grad_log_prob.