Adding custom C++ function to R package using RStan

karimn · April 13, 2020, 3:10pm

Hi,

I’m trying write my own C++ function to add to my model that I’m putting in an R package. I followed the step-by-step guide and everything works great, but now that I’m trying to figure out where to put a new C++ function. I read the instructions on how to interface with external C++ code but I’m not sure how to apply that to package. (BTW, that RStan guide on how to interface with C++ seems to have an error: there’s a line that reads “The sinc.hpp file reads as” and then there is no code after it).

Some background on why I’m trying to do this: I’m want to create my own custom version of csr_matrix_times_vector() where the vector is actually a vector of log probabilities and I need to something like log_sum_exp() when I do the dot product step. I’m looking at the code for these functions on github and I think I understand what is going on but if anyone knows of some pitfalls I should avoid, please let me know.

Thanks,
Karim

bgoodri · April 13, 2020, 4:33pm

Can’t go into details right now, but we do something very similar in rstanarm

that you can copy the structure of. If you used rstantools to create the package, then it should work.

karimn · April 13, 2020, 5:08pm

Thanks, Ben, this is helpful. I did use rstantools.

karimn · April 14, 2020, 3:32am

I managed to successfully build my package having added a new C++ function. I kind of mixed what csr_matrix_times_vector() and log_sum_exp() do. I was wondering if you see anything obviously wrong with this code? This is my first time dabbling with this. This is what I want to do: given a dense n-vector of log probabilities, I want to calculate the log_sum_exp() of m, of their subsets using a sparse m\times n boolean matrix, where each row identifies which log probabilities to calculate over. Also, is there any advantage to writing this in C++ compared to just doing the same loops in Stan function?

Thanks,
Karim

template<typename T>
inline
Eigen::Matrix<T, Eigen::Dynamic, 1>
csr_log_sum_exp(int m, int n,
                const std::vector<int>& v,
                const std::vector<int>& u,
                const Eigen::Matrix<T, Eigen::Dynamic, 1>& b,
                std::ostream* pstream__) {
  Eigen::Matrix<T, Eigen::Dynamic, 1> result(m);
  result.setZero();

  for (int row = 0; row < m; ++row) {
    int idx = csr_u_to_z(u, row); // u[row + 1] - u[row]
    int row_end_in_w = (u[row] - stan::error_index::value) + idx;

    Eigen::Matrix<T, Eigen::Dynamic, 1> b_sub(idx);
    b_sub.setZero();

    int i = 0;

    for (int nze = u[row] - stan::error_index::value; nze < row_end_in_w; ++nze, ++i) {
      b_sub.coeffRef(i) = b.coeffRef(v[nze] - stan::error_index::value);
    }

    const double max = b_sub.maxCoeff();

    result.coeffRef(row) = max + std::log((b_sub.array() - max).exp().sum());
  }

  return result;
}

bgoodri · April 14, 2020, 5:40am

I think it is actually worse. If you are going to the trouble of writing custom C++, then you need to be handling the derivatives analytically rather than letting it autodiff.

karimn · April 14, 2020, 1:34pm

Oh, no. I feared as much. Ok, I’ll need to learn how to handle this analytically. Thanks for your help.

BTW, I was going by what’s implemented in the below files. I noticed that it’s different from what is in https://mc-stan.org/math/.

github.com

stan-dev/math/blob/develop/stan/math/prim/fun/csr_matrix_times_vector.hpp

#ifndef STAN_MATH_PRIM_FUN_CSR_MATRIX_TIMES_VECTOR_HPP
#define STAN_MATH_PRIM_FUN_CSR_MATRIX_TIMES_VECTOR_HPP

#include <stan/math/prim/err.hpp>
#include <stan/math/prim/fun/csr_u_to_z.hpp>
#include <stan/math/prim/fun/Eigen.hpp>
#include <stan/math/prim/fun/dot_product.hpp>
#include <vector>

namespace stan {
namespace math {

/**
 * @defgroup csr_format Compressed Sparse Row matrix format.
 *  A compressed Sparse Row (CSR) sparse matrix is defined by four
 *  component vectors labeled w, v, and u.  They are defined as:
 *    - w: the non-zero values in the sparse matrix.
 *    - v: column index for each value in w,  as a result this
 *      is the same length as w.
 *    - u: index of where each row starts in w,  length

This file has been truncated. show original

github.com

stan-dev/math/blob/develop/stan/math/prim/fun/log_sum_exp.hpp

#ifndef STAN_MATH_PRIM_FUN_LOG_SUM_EXP_HPP
#define STAN_MATH_PRIM_FUN_LOG_SUM_EXP_HPP

#include <stan/math/prim/meta.hpp>
#include <stan/math/prim/fun/constants.hpp>
#include <stan/math/prim/fun/Eigen.hpp>
#include <stan/math/prim/fun/log1p_exp.hpp>
#include <cmath>
#include <vector>

namespace stan {
namespace math {

/**
 * Calculates the log sum of exponentials without overflow.
 *
 * \f$\log (\exp(a) + \exp(b)) = m + \log(\exp(a-m) + \exp(b-m))\f$,
 *
 * where \f$m = max(a, b)\f$.
 *

This file has been truncated. show original

karimn · April 14, 2020, 2:02pm

Also, now finding the wiki here helpful, for anyone who has some of the same questions I have.

Topic		Replies	Views
Building a Rstan package with custom c++ code RStan	4	457	September 26, 2019
R packages using rstan - extend via stanreg? Developers	3	843	February 1, 2017
Making arbitrary C++ functions available to Stan Modeling	14	2000	June 12, 2018
Struggling with user-defined custom c++ function for binning real data into sequence of integer counts Interfaces user-defined-functions	7	1008	May 16, 2022
Calling grad_log_prob directly from Rcpp/C++? General	16	76	October 16, 2024

Adding custom C++ function to R package using RStan

Related topics