I am working on implementing a HMC-within-Gibbs algorithm for a model I am working on** and am coding it using R and C++ (via Rcpp). For the HMC part I started off using numerical differentiation to calculate the gradients and then implemented autodiff using the C++ “autodiff” library (GitHub - autodiff/autodiff: automatic differentiation made easier for C++) as this seemed like the easiest one to use (i’m new to C++). However I’ve only found it to be ~2x faster than numerical diff and was hoping for a bigger speed-up. So I am looking at trying some other autodiff libraries like the Stan math library to see if they will be faster.

I was wondering what would I need to do to convert my function (below) for use with Stan math libraries’ autodiff?

are there any good guides out there for how to go about this for someone new to C++?

**Context - it’s a multivariate probit latent class model and I got into this by noticing that Stan is quite inefficient at sampling models like this which involve data augmentation (i.e. creating latent continuous data from the observed ordinal or binary data).

I first just focused on binary outcomes and implemented a Gibbs sampler - with an Metropolis-Hastings (MH) step for sampling the correlation matrices based on a method from a paper from Liu and Daniels (not a random-walk proposal).

I used base R and some Rcpp functions and compared to Stan, I get ~5-15x better n_eff per second using a simulated dataset with N=1000 and 6 binary outcomes. It does however require many more iterations than Stan - which only needs 200 warmup + 200 iter vs Gibbs/MH sampler needs many thousands of iterations - however, when taking time into account it is still much more efficient due to how long each iteration takes. In Stan I have used a parameterisation based on the one from @bgoodri which I have found to be a lot more efficient than the one in the Stan manual.

I have used the `rstan::Rhat`

and `rstan::ess_bulk`

functions in R from the rstan package to compute these diagnostics for both models to make it a fair comparison.

However, when moving onto ordinal outcomes (or mixed ordinal and binary), things get more complicated. For my example dataset, Gibbs sampling is very inefficient for the cutpoint parameters, to the point where for some of the parameters I only get a few n_eff -even after many thousands of iterations. I tried using different methods ranging from Gibbs sampling directly from the full conditionals (as in Albert & Chib 1993) and using a MH proposal based on Cowles 1996 which is meant to be more efficient, and also using the log-differences of the cutpoints and sampling those. However they are all inefficient for my example dataset. Stan, however, seems pretty efficient at sampling these parameters from the same dataset using the same priors. Hence the motivation for implementing a HMC-within-Gibbs sampler (i.e. using HMC for the cutpoints and Gibbs/MH for the rest of the parameters).

C++ code:

```
double fn_update_log_posterior_full( Eigen::Matrix<double, -1, 1> current_logdiffs_set_mat, // differentiating w.r.t this (i.e. w.r.t class c and test t, for now)
std::vector<Eigen::Matrix<double, -1, -1 > > current_cutpoints_set_full_all_tests_class,
int test,
int l_class,
int n_bin_tests,
Eigen::Matrix<double, -1, -1> first_cutpoint_mat,
Eigen::Matrix<double, -1, -1> prev,
std::vector<Eigen::Matrix<double, -1, -1 > > Xbeta,
double prior_ind_dir,
Eigen::Matrix<double, -1, -1> y,
double prior_densities) {
int K = current_logdiffs_set_mat.rows() + 2;
int N = y.rows();
int n_class = prev.cols();
int n_ord_tests = y.cols() - n_bin_tests;
int n_tests = y.cols();
Eigen::Matrix<double, -1, -1> current_cutpoints_set_full = fn_calculate_cutpoints(current_logdiffs_set_mat, first_cutpoint_mat, K);
std::vector<Eigen::Matrix<double, -1, -1 > > current_cutpoints_set_full_all_tests_class_update;
current_cutpoints_set_full_all_tests_class_update = current_cutpoints_set_full_all_tests_class;
for (int k = 1; k < K; k++) {
current_cutpoints_set_full_all_tests_class_update[l_class](test - n_bin_tests, k) = current_cutpoints_set_full(k,0);
}
double log_prob_out = 0.0;
for (int n = 0; n < N; n++) {
Eigen::Matrix<double, -1, -1> y1(n_tests, n_class);
Eigen::Matrix<double, -1, 1> lp(n_class); // colvec
Eigen::Matrix<double, -1, -1> bound_bin(n_class, n_tests);
for (int t = 0; t < n_bin_tests; t++) {
int index = y(n,t);
for (int c = 0; c < n_class; c++) {
bound_bin(c, t) = R::pnorm( - Xbeta[c](n, t), 0, 1, true, false);
if (index == 1) {
y1(t,c) = stan::math::log1m(bound_bin(c, t));
} else { // y == 0
y1(t,c) = log(bound_bin(c, t));
}
}
}
for (int t = n_bin_tests; t < n_tests; t++) {
int index = y(n,t);
for (int c = 0; c < n_class; c++) {
if ((index != 1) && (index != (K))) {
double C_m_Xbeta_upper = current_cutpoints_set_full_all_tests_class_update[c](t - n_bin_tests,index) - Xbeta[c](n, t);
double C_m_Xbeta_lower = current_cutpoints_set_full_all_tests_class_update[c](t - n_bin_tests,index-1) - Xbeta[c](n, t);
double prob_C_m_Xbeta_upper = R::pnorm(C_m_Xbeta_upper, 0, 1, true, false);
double prob_C_m_Xbeta_lower = R::pnorm(C_m_Xbeta_lower, 0, 1, true, false);
y1(t,c) = log(prob_C_m_Xbeta_upper - prob_C_m_Xbeta_lower);
} else if (index == 1) {
double C_m_Xbeta_upper = current_cutpoints_set_full_all_tests_class_update[c](t - n_bin_tests,index) - Xbeta[c](n, t);
double prob_C_m_Xbeta_upper = R::pnorm(C_m_Xbeta_upper, 0, 1, true, false);
y1(t,c) = log(prob_C_m_Xbeta_upper);
} else { // y == K
double C_m_Xbeta_lower = current_cutpoints_set_full_all_tests_class_update[c](t - n_bin_tests,index-1) - Xbeta[c](n, t);
double prob_C_m_Xbeta_lower = R::pnorm(C_m_Xbeta_lower, 0, 1, true, false);
y1(t,c) = stan::math::log1m(prob_C_m_Xbeta_lower);
}
}
}
for (int c = 0; c < n_class; c++) {
Eigen::Matrix<double, 1, -1> y1_colsums = y1.colwise().sum(); // row vector
lp(c) = y1_colsums(c) + log(prev(0,c));
}
double log_posterior = lp(class_ind(n));
log_prob_out += log_posterior;
} // end of n loop
log_prob_out += prior_densities;
return(log_prob_out);
}
```