Calling grad_log_prob directly from Rcpp/C++?

CerulloE · October 10, 2024, 4:58pm

Hi,

Is there a way to call rstan::grad_log_prob or cmdstanr::grad_log_prob functions directly from Rcpp / C++? - preferably without having to transfer data to and from R and C++ every single time it’s called from within Rcpp?

spinkney · October 10, 2024, 10:23pm

Bridgestan has this R Interface — BridgeStan documentation

CerulloE · October 10, 2024, 10:58pm

can you use this in C++ directly? (without R just all via Rcpp?)

I found this on the website:

“Namely, you can declare a function in your Stan model and then define it in a separate C++ file. This requires passing the --allow-undefined flag to the Stan compiler when building your model. The USER_HEADER variable must point to the C++ file containing the function definition. By default, this will be the file user_header.hpp in the same directory as the Stan model.”

However this only works for Stan functions, not any general Stan model. So users would have to be able to write their model as a Stan function.

It would be good if there were a way to do it more generally, not just for Stan functions?

I.e. expose the internal log_prob_grad function that the Stan shared object file contains to a C++ function that anybody can use?

spinkney · October 11, 2024, 2:01pm

You could define your entire model as a Stan function. Otherwise, @roualdes @Bob_Carpenter @WardBrian might have the answer.

WardBrian · October 11, 2024, 2:44pm

If you’re living in C++, you can either directly use the model class and Stan math library, or use the bridgestan C API documented here C API — BridgeStan documentation

CerulloE · October 11, 2024, 2:45pm

Yeah I could - but (ideally) I want this for my R package where people can just plug in any Stan model without having to do any “extra work” - coding some models as Stan functions isn’t easy as e.g. they might have to code the unconstraied_theta → constrained_theta transforms themselves (as Stan functions), etc. This is because I need the input to the function to just be the data and a vector of unconstrained parameters, regardless of the model.

So my attempt to do it more generally so far has been half successful - I can get the log_prob but not the gradient. Here’s what i’ve done:


Eigen::VectorXd run_stan_log_prob_grad(const std::string &model_so_file, 
                                       const std::string  &r_data_as_json, 
                                       const Eigen::VectorXd &params) {
 
 
 
 // load the Stan model from the .so file using BridgeStan
 char* error_msg = nullptr; 
 const char* data = r_data_as_json.c_str();
 unsigned int seed = 12345;
 
 // Use bs_model_construct 
 bs_model* model = bs_model_construct(data, seed, &error_msg);
 if (!model) {
   std::string error = "Error loading model: ";
   if (error_msg) {
     error += std::string(error_msg);
     bs_free_error_msg(error_msg);
   }
   throw std::runtime_error(error);
 }
  
 // convert Eigen::VectorXd parameters to raw array for BridgeStan
 std::vector<double> params_vec(params.data(), params.data() + params.size());
 
 // prepare space for the log probability and gradient
 double log_prob_val = 0.0;
 std::vector<double> grad_vec(params.size());
 
 // calc. log_prob_grad using the BridgeStan API 
 int result = bs_log_density_gradient(model,
                                      true,  ///  propto
                                      true, ///  jacobian
                                      params_vec.data(), 
                                      &log_prob_val, 
                                      grad_vec.data(), 
                                      &error_msg);
 if (result != 0) {
   std::string error = "Error computing log_prob_grad: ";
   if (error_msg) {
     error += std::string(error_msg);
     bs_free_error_msg(error_msg);
   }
   bs_model_destruct(model);
   throw std::runtime_error(error);
 }
 
 
 /// make Eigen output combining log_prob and gradient
 Eigen::VectorXd output(params.size() + 1); 
 output(0) = log_prob_val;
 for (int i = 0; i < grad_vec.size(); ++i) {
   output(i + 1) = grad_vec[i];
 }
 
 //  free the model after the computation
 bs_model_destruct(model);
 
 return output;
 
}

 
 
// [[Rcpp::export]]
Eigen::VectorXd run_model_from_R(std::string  model_so_file,
                                 const  std::string &r_data_as_json,  
                                 Eigen::VectorXd params) {

 // Call the actual C++ function
 return run_stan_log_prob_grad(model_so_file, r_data_as_json, params);

}

edit: the R wrapper function is there just so I can test it from R, but i’ll be primarily calling the “run_stan_log_prob_grad” function directly from C++ (however the model data will be inputted from R by the user)

WardBrian · October 11, 2024, 2:51pm

At first glance that looks mostly correct, except for the lack of loading the .so (which may be complicated if you want to support *nix and Windows in the same code). What issue are you having?

Also note that if you intend to perform multiple evaluations with the same data, it is very critical for performance that you move the construction and destruction outside the core loop

CerulloE · October 11, 2024, 2:53pm

Oh sorry I think I forgot to mention - so basically my output is just the log_prob (which is correct) as the first element, but the rest (i.e., the gradient) always comes out as 0’s… maybe there’s something wrong with the .so file, as getting that to work was a pain (I dont think cmdstanr automatically generates the files as shared objects?)

WardBrian · October 11, 2024, 3:12pm

I can’t immediately spot an error in your code, but note that you can make it simpler by just re-using the memory from the eigen vectors:

 Eigen::VectorXd output(params.size() + 1); 
 int result = bs_log_density_gradient(model,
                                      true,  ///  propto
                                      true, ///  jacobian
                                      params.data(),  // no need for params_vec
                                      output.data(), 
                                      output.data() + 1, 
                                      &error_msg);

This may fix the issue “by accident” by over-writing an issue we cant spot

CerulloE · October 11, 2024, 4:15pm

do you know if there’s another way / a “standard” way to do what i’m trying to do? (with or without BridgeStan, but I imagine it’d be easier with it)?

CerulloE · October 11, 2024, 4:22pm

Also is there some kind of option in cmdstan to automatically compile the model into so file/shared object or easily convert it to one? (if not, can you make bridgestan work without .so files?)

WardBrian · October 11, 2024, 4:31pm

I’m not sure what you’re asking in your first reply — the closest thing we have to a “standard” way of getting the log density and gradient of a generic stan model is bridgestan. You could do it yourself in C++, but most of what you’d end up doing is re-writing parts of bridgestan to do it

Bridgestan’s repo has an example of how to use it in a statically linked context, and you can also use the same object file from a model for both cmdstan and bridgestan if you run the linking step for both still

CerulloE · October 11, 2024, 5:07pm

Hey sorry for the confusion - I got it working now. I reinstalled R including bridgestan, and it works now. Not sure what I did different but yeah.

Just to clarify, when you say: " it is very critical for performance that you move the construction and destruction outside the core loop", does that mean - for example - that I can call the construction once at the beginning, call the model N times, then call the destruction once at the end of the sampling?

WardBrian · October 11, 2024, 5:18pm

Yes, that’s precisely what to do for the best performance

CerulloE · October 11, 2024, 5:35pm

Also fyi, to get it to compile at all, I had to make a “libbridgestan.so” file from the “bridgestan.o” file that comes with the R package, and then link it to a Stan model .so file (I dont think it matters which model I use - seems arbitrary) by adding the paths of both of them to PKG_LIBS - otherwise i’d get this error:

Error in dyn.load("/tmp/RtmptjdJAc/sourceCpp-x86_64-pc-linux-gnu-1.0.13/sourcecpp_817007d1b0744/sourceCpp_49.so") : 
  unable to load shared object '/tmp/RtmptjdJAc/sourceCpp-x86_64-pc-linux-gnu-1.0.13/sourcecpp_817007d1b0744/sourceCpp_49.so':
  /tmp/RtmptjdJAc/sourceCpp-x86_64-pc-linux-gnu-1.0.13/sourcecpp_817007d1b0744/sourceCpp_49.so: undefined symbol: _Z9new_modelRN4stan2io11var_contextEjPSo

Not sure if that’s a bug or not but probably something i’m doing wrong. Oh well at least it works though.

WardBrian · October 11, 2024, 5:41pm

Without knowing more about your build process it’s hard to say more, but that shouldn’t be necessary on BridgeStan’s end

WardBrian · October 16, 2024, 7:37pm

For people who are trying similar ideas in the future, I just extended the bridgestan C example folder to also include an example of loading the bridgestan library at runtime, rather than linking it into your program at build:

Topic		Replies	Views
Annoucing BridgeStan: a bridge to a Stan model's log_prob and gradient functions Publicity cmdstan	25	2282	March 21, 2023
Expose model hessian in Rstan RStan rstan	10	1688	February 11, 2017
Using R function in Rstan Modeling	6	2357	May 15, 2018
Adding custom C++ function to R package using RStan RStan	6	763	April 14, 2020
Interface to Stan's algorithms from C++ and beyond Developers	5	792	January 19, 2021

Calling grad_log_prob directly from Rcpp/C++?

Related topics