# Any guidance for how to use Stan math library autodiff? (new to C++)

I am working on implementing a HMC-within-Gibbs algorithm for a model I am working on** and am coding it using R and C++ (via Rcpp). For the HMC part I started off using numerical differentiation to calculate the gradients and then implemented autodiff using the C++ “autodiff” library (GitHub - autodiff/autodiff: automatic differentiation made easier for C++) as this seemed like the easiest one to use (i’m new to C++). However I’ve only found it to be ~2x faster than numerical diff and was hoping for a bigger speed-up. So I am looking at trying some other autodiff libraries like the Stan math library to see if they will be faster.

I was wondering what would I need to do to convert my function (below) for use with Stan math libraries’ autodiff?

are there any good guides out there for how to go about this for someone new to C++?

**Context - it’s a multivariate probit latent class model and I got into this by noticing that Stan is quite inefficient at sampling models like this which involve data augmentation (i.e. creating latent continuous data from the observed ordinal or binary data).

I first just focused on binary outcomes and implemented a Gibbs sampler - with an Metropolis-Hastings (MH) step for sampling the correlation matrices based on a method from a paper from Liu and Daniels (not a random-walk proposal).

I used base R and some Rcpp functions and compared to Stan, I get ~5-15x better n_eff per second using a simulated dataset with N=1000 and 6 binary outcomes. It does however require many more iterations than Stan - which only needs 200 warmup + 200 iter vs Gibbs/MH sampler needs many thousands of iterations - however, when taking time into account it is still much more efficient due to how long each iteration takes. In Stan I have used a parameterisation based on the one from @bgoodri which I have found to be a lot more efficient than the one in the Stan manual.

I have used the `rstan::Rhat` and `rstan::ess_bulk` functions in R from the rstan package to compute these diagnostics for both models to make it a fair comparison.

However, when moving onto ordinal outcomes (or mixed ordinal and binary), things get more complicated. For my example dataset, Gibbs sampling is very inefficient for the cutpoint parameters, to the point where for some of the parameters I only get a few n_eff -even after many thousands of iterations. I tried using different methods ranging from Gibbs sampling directly from the full conditionals (as in Albert & Chib 1993) and using a MH proposal based on Cowles 1996 which is meant to be more efficient, and also using the log-differences of the cutpoints and sampling those. However they are all inefficient for my example dataset. Stan, however, seems pretty efficient at sampling these parameters from the same dataset using the same priors. Hence the motivation for implementing a HMC-within-Gibbs sampler (i.e. using HMC for the cutpoints and Gibbs/MH for the rest of the parameters).

C++ code:

``````  double   fn_update_log_posterior_full(   Eigen::Matrix<double, -1, 1> current_logdiffs_set_mat, // differentiating w.r.t this (i.e. w.r.t class c and test t, for now)
std::vector<Eigen::Matrix<double, -1, -1 > > current_cutpoints_set_full_all_tests_class,
int test,
int l_class,
int n_bin_tests,
Eigen::Matrix<double, -1, -1>	 first_cutpoint_mat,
Eigen::Matrix<double, -1, -1>	 prev,
std::vector<Eigen::Matrix<double, -1, -1 > > Xbeta,
double prior_ind_dir,
Eigen::Matrix<double, -1, -1>	 y,
double prior_densities) {

int K = current_logdiffs_set_mat.rows() + 2;
int N = y.rows();
int n_class = prev.cols();
int n_ord_tests = y.cols() - n_bin_tests;
int n_tests =  y.cols();

Eigen::Matrix<double, -1, -1>	 current_cutpoints_set_full = fn_calculate_cutpoints(current_logdiffs_set_mat, first_cutpoint_mat, K);
std::vector<Eigen::Matrix<double, -1, -1 > > current_cutpoints_set_full_all_tests_class_update;

current_cutpoints_set_full_all_tests_class_update = current_cutpoints_set_full_all_tests_class;

for (int k = 1; k < K; k++) {
current_cutpoints_set_full_all_tests_class_update[l_class](test - n_bin_tests, k) = current_cutpoints_set_full(k,0);
}

double log_prob_out = 0.0;

for (int n = 0; n < N; n++) {
Eigen::Matrix<double, -1, -1>	 y1(n_tests, n_class);
Eigen::Matrix<double, -1, 1>	 lp(n_class); // colvec
Eigen::Matrix<double, -1, -1>	 bound_bin(n_class, n_tests);

for (int t = 0; t < n_bin_tests; t++) {
int index = y(n,t);
for (int c = 0; c < n_class; c++) {
bound_bin(c, t) = R::pnorm( - Xbeta[c](n, t), 0, 1, true, false);
if (index == 1) {
y1(t,c) = stan::math::log1m(bound_bin(c, t));
} else { // y == 0
y1(t,c) = log(bound_bin(c, t));
}
}
}
for (int t = n_bin_tests; t < n_tests; t++) {
int index = y(n,t);
for (int c = 0; c < n_class; c++) {
if ((index != 1) && (index != (K))) {
double C_m_Xbeta_upper = current_cutpoints_set_full_all_tests_class_update[c](t - n_bin_tests,index)    - Xbeta[c](n, t);
double C_m_Xbeta_lower =  current_cutpoints_set_full_all_tests_class_update[c](t - n_bin_tests,index-1)    - Xbeta[c](n, t);
double prob_C_m_Xbeta_upper = R::pnorm(C_m_Xbeta_upper, 0, 1, true, false);
double prob_C_m_Xbeta_lower = R::pnorm(C_m_Xbeta_lower, 0, 1, true, false);
y1(t,c) = log(prob_C_m_Xbeta_upper - prob_C_m_Xbeta_lower);
} else if (index == 1) {
double C_m_Xbeta_upper =  current_cutpoints_set_full_all_tests_class_update[c](t - n_bin_tests,index)    - Xbeta[c](n, t);
double prob_C_m_Xbeta_upper = R::pnorm(C_m_Xbeta_upper, 0, 1, true, false);
y1(t,c) = log(prob_C_m_Xbeta_upper);
} else {  // y == K
double C_m_Xbeta_lower = current_cutpoints_set_full_all_tests_class_update[c](t - n_bin_tests,index-1)    - Xbeta[c](n, t);
double prob_C_m_Xbeta_lower = R::pnorm(C_m_Xbeta_lower, 0, 1, true, false);
y1(t,c) = stan::math::log1m(prob_C_m_Xbeta_lower);
}
}
}

for (int c = 0; c < n_class; c++) {
Eigen::Matrix<double, 1, -1>	 y1_colsums = y1.colwise().sum(); // row vector

lp(c) = y1_colsums(c) + log(prev(0,c));
}

double log_posterior = lp(class_ind(n));

log_prob_out += log_posterior;
} // end of n loop

log_prob_out += prior_densities;

return(log_prob_out);

}

``````
3 Likes

So I have re-written my function to use `stan::math::var` for parameters and `double` for data and it compiles fine.

Now i’m working on trying to run it from R and to do this I need to be able to export it to R or create an intermediate function which can take inputs from R (e.g. by coding eveything as doubles) and then converts them to sdtan::math::var to then pass on to the log-posterior function.

Related to this question - is there a way to cast / convert from `stan::math::var` to `double`?

e.g. I’m trying to do this:

``````
Eigen::Matrix<double, -1, -1> 	 prev; // a function input (function to be exported to R using [[Rcpp::export]]

// convert to stan::math::var matrix in order to use with stan math library
Eigen::Matrix<stan::math::var, -1, -1>	 prev_var = prev.cast<stan::math::var>();

``````

However I get an error saying “invalid static_case” from type “const stan::math::var” to type “double”.

I need a way to be able to convert from `double ` to `stan::math::var` so that I can call the function to compute gradients for from within R. Attempting to export the function using stan::math::var directly to R using `// [[Rcpp::export]]` doesn’t work.

Update:

I have got reverse autodiff working now. However it is much slower than it should be. Is there something obviously wrong with this function which is making it slower?

code:

``````
// [[Rcpp::export]]
fn_update_log_posterior_full_grouped(  std::vector<double>  current_logdiffs_set_full_all_tests_class_vec_double,
int K, // max ord category, for now K same for all ord tests
int n_bin_tests,
Eigen::Matrix<double, -1, -1>	 first_cutpoint_mat,
Eigen::Matrix<double, -1, -1>	 prev,
std::vector<Eigen::Matrix<double, -1, -1 > > Xbeta,
Eigen::Matrix<double, -1, 1>  class_ind,
double prior_ind_dir,
Eigen::Matrix<double, -1, -1>	 y,
double prior_densities) {

int N = y.rows();
int n_class = prev.cols();
int n_ord_tests = y.cols() - n_bin_tests;
int n_tests =  y.cols();

int total_log_diffs = current_logdiffs_set_full_all_tests_class_vec_double.size();
std::vector<stan::math::var>   current_logdiffs_set_full_all_tests_class_vec(total_log_diffs,0.0);

for (int i = 0; i < total_log_diffs; i++) {
current_logdiffs_set_full_all_tests_class_vec[i] = current_logdiffs_set_full_all_tests_class_vec_double[i];
}

//   Eigen::Matrix<double, -1, 1>	current_logdiffs_set_full_all_tests_class_vec_double = current_logdiffs_set_full_all_tests_class_vec.cast<double>();

// convert vector of log-diffs to 3d array
std::vector<Eigen::Matrix<stan::math::var, -1, -1 > >  current_logdiffs_set_full_all_tests_class = fn_convert_vec_of_logdiffs_to_3d_array_AD(current_logdiffs_set_full_all_tests_class_vec,
K,
n_class,
n_ord_tests);

// calculate cutpoints to use in likelihood
std::vector<Eigen::Matrix<stan::math::var, -1, -1 > >    current_cutpoints_set_full_all_tests_class = fn_calculate_cutpoints_all_tests_all_class_AD(
current_logdiffs_set_full_all_tests_class,
first_cutpoint_mat,
K,
n_class,
n_ord_tests);

stan::math::var log_prob_out = 0.0;

Eigen::Matrix<double, -1, -1> log_prev = stan::math::log(prev);

for (int n = 0; n < N; ++n) {

Eigen::Matrix<stan::math::var, -1, -1 >  y1(n_tests, n_class);
Eigen::Matrix<stan::math::var, -1, 1>	 y1_colsums(n_class);

for (int t = 0; t < n_bin_tests; ++t) {
int index = y(n,t);
for (int c = 0; c < n_class; ++c) {
if (index == 1) {
y1(t,c) = stan::math::log(1 - stan::math::Phi_approx( Xbeta[c](n, t) ));
} else { // y == 0
y1(t,c) = stan::math::log(stan::math::Phi_approx(- Xbeta[c](n, t) ));
}
}
}
for (int t = n_bin_tests; t < n_tests; ++t) {
int index = y(n,t);
for (int c = 0; c < n_class; ++c) {
if ((index != 1) && (index != (K))) {
stan::math::var upper = current_cutpoints_set_full_all_tests_class[c](t - n_bin_tests,index)    - Xbeta[c](n, t);
stan::math::var lower =  current_cutpoints_set_full_all_tests_class[c](t - n_bin_tests,index-1)    - Xbeta[c](n, t);
y1(t,c) = stan::math::log( stan::math::Phi_approx(upper) - stan::math::Phi_approx(lower) );
} else if (index == 1) {
stan::math::var upper  =  current_cutpoints_set_full_all_tests_class[c](t - n_bin_tests,index)    - Xbeta[c](n, t);
y1(t,c) = stan::math::log( stan::math::Phi_approx(upper)  );
} else {  // y == K
stan::math::var lower = current_cutpoints_set_full_all_tests_class[c](t - n_bin_tests,index-1)    - Xbeta[c](n, t);
y1(t,c) = stan::math::log( 1 -  stan::math::Phi_approx(lower) );
}
}
}

y1_colsums = y1.colwise().sum();

stan::math::var lp =  y1_colsums(class_ind(n)) + log_prev(0,class_ind(n));
log_prob_out += lp;

}  //end of n loop

log_prob_out += prior_densities;

auto log_prob = log_prob_out.val();

std::vector<stan::math::var> theta;

for (int i = 0; i < total_log_diffs; i++) {
theta.push_back(current_logdiffs_set_full_all_tests_class_vec[i]);
}

Eigen::Matrix<double, -1, -1> out_mat = mat_out_test(total_log_diffs, 2); // mat of zero's

for (int i = 0; i < total_log_diffs; i++) {
}

out_mat(0,1) = log_prob;

return(out_mat);

}

``````

@CerulloE, lots to unpack there. I’m glad you got it working a bit.

is there a way to cast / convert from `stan::math::var` to `double`?

Yes, there is. For most of the conversions you’re looking at, we’ve implemented `stan::math::to_var(...)`, which created a promoted copy of the argument to that function.

Note: this is a “promotion” and not a “cast.” We can’t just take a primitive `double` in C++ and cast it to a `stan::math::var` type. It has to be constructed somehow.

Is there something obviously wrong with this function which is making it slower?

If this function is being used repeatedly, you’ll need to do some autodiff stack management. The specific thing you must do is to call `stan::math::recover_memory();` after calling `log_prob_out.grad(...)`. That will definitely slow down the function.

I wasn’t able to follow the different functions you have to see if they’re implemented the way we would in the math library. If you have a minimum working example, I can try to be a little more helpful. (What’s there now are parts of a complete thing and I can only inspect the code. It’s a lot more useful when there are examples that others can execute.)

2 Likes

Thank you @syclik ! It’s very fast now

1 Like

@syclik I was wondering is it also possible to find the 2nd order derivatives / Hessian?

Hey, I was wondering how do I promote the other way around (i.e. from stan::math::var to double)?

Yes! There’s a `hessian` function in the math library that should work for most function calls in the math library.

1 Like

Just to give some background, we’re treating going from `double` to `stan::math::var` as a “type promotion.” Please see references like this one: Implicit conversions - cppreference.com

The other way is sometimes referred to as type demotion, but lots of discussion is about rules for casting to other types.

The answer is to use the `.val()` method on the `stan::math::var`. So… something like:

``````stan::math::var x(1.2345);

double x_val = x.val();
``````

There are also functions that will pull out values out of structures too:

``````stan::math::value_of(...)
stan::math::value_of_rec(...)   // recursive
``````
1 Like

Thank you! in general how much longer does computing the Hessian take compared to first order derivatives/gradients? (no worries if you aren’t sure as i’ll find out soon once I implemented it :))

Hey @syclik

How would I modify the code above to compute Hessians rather than just the gradient vector?

I.e what would I replace the following with:

``````
std::vector<stan::math::var> theta;

for (int i = 0; i < total_log_diffs; i++) {
theta.push_back(current_logdiffs_set_full_all_tests_class_vec[i]);
}