Boost function in rstan

I need to be able to estimate the value from a 2-parameter gamma distribution, similar to the qgamma() function in R. I found a previous discussion:

which led me to use the gamma_distribution function from the Boost math library. I now have the following in my model:

functions {
  real myqgamma(real p, real shape, real scale) {
   #include <boost/math/distributions/gamma.hpp> 
      boost::math::gamma_distribution<> dist(shape, scale); 
      return quantile(dist, p);
  }
}
data {
...

When I try to compile using rstan, I get the following error:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Variable "boost" does not exist.
 error in 'model1d543668683eda_e7e888a175f3b27d2c07fb33565d90c5' at line 5, column 11

Is what I am attempting possible? Does this function need to be in a separate file? Any help would be very much appreciated.

Should be possible, but not the way you’re currently doing it. See this: https://mc-stan.org/rstan/articles/external.html.

Thank you! I saw that, but was worried I was on the wrong track. I will give it a much deeper read.

Am I correct in thinking the answer is approx 80% of the way through, creating a separate .hpp file and loading it using the includes option in stan_model?

Yeah, I’m inclined to say you won’t have much trouble implementing it. But if you do, please post here and we’ll work out the kinks. This would be useful for many applications until inverse cdfs and related functions are added to the code base.

1 Like

I’m getting closer. I followed the example and have stripped everything back to its bare minimum. I am trying to run the following example:

require(rstan)

model_code <-
  '
  functions {
    real standard_normal_rng() {
      return normal_rng(0,1);
   }
  }
'

stan_model(model_code = model_code, allow_undefined = TRUE, includes = paste0('\n#include "',  file.path(getwd(), 'myqgamma.hpp'), '"\n'))

where myqgamma.hpp contains the following:

#include <boost/math/distributions/gamma.hpp> 

double myqgamma(double p, double shape, double scale) {
  boost::math::gamma_distribution<> dist(shape, scale); 
  return quantile(dist, p);
}

I get the following error:

Error in compileCode(f, code, language = language, verbose = verbose) : 
  Compilation ERROR, function(s)/method(s) not created! file1e4df46976a433.cpp:6:9: warning: ISO C++11 requires whitespace after the macro name
    6 | #define STAN__SERVICES__COMMAND_HPP#include <boost/integer/integer_log2.hpp>
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /home/jhstagge/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Core:383,
                 from /home/jhstagge/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Dense:1,
                 from /home/jhstagge/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13,
                 from <command-line>:
/home/jhstagge/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/src/Core/arch/SSE/PacketMath.h:60:39: warning: ignoring attributes on template argument β€˜__m128’ [-Wignored-attributes]
   60 | template<> struct is_arithmetic<__m128>  { enum { value = true }; };
      |                                       ^
/home/jhstagge/R/x86_6
In addition: Warning message:
In system(cmd, intern = !verbose) :
  running command '/usr/lib64/R/bin/R CMD SHLIB file1e4df46976a433.cpp 2> file1e4df46976a433.cpp.err.txt' had status 1
Error in sink(type = "output") : invalid connection

I have tracked down this error message to previous posts, for example. I updated rstan, BH, and StanHeaders to make sure I am using the most current versions.

When I remove the includes command, the model compiles properly. So, I don’t think it is an issue with the compiler/rstan, I think it must be something in myqgamma.hpp file.

I feel like I’m almost there. Any thoughts?

I should provide a little more system info:

writeLines(readLines(file.path(Sys.getenv("HOME"), ".R/Makevars")))

CXX14FLAGS=-O3 -march=native -mtune=native
CXX14FLAGS += -fPIC

CXX14FLAGS=-O3 -march=native -mtune=native
CXX14FLAGS += -fPIC
R> devtools::session_info("rstan")
─ Session info ───────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 4.0.1 (2020-06-06)
 os       Manjaro Linux               
 system   x86_64, linux-gnu           
 ui       X11                         
 language (EN)                        
 collate  en_US.utf8                  
 ctype    en_US.utf8                  
 tz       America/New_York            
 date     2020-06-30                  

─ Packages ───────────────────────────────────────────────────────────────────────────────────────
 package      * version   date       lib source        
 assertthat     0.2.1     2019-03-21 [1] CRAN (R 4.0.0)
 backports      1.1.8     2020-06-17 [1] CRAN (R 4.0.0)
 BH             1.72.0-3  2020-01-08 [1] CRAN (R 4.0.1)
 callr          3.4.3     2020-03-28 [1] CRAN (R 4.0.0)
 checkmate      2.0.0     2020-02-06 [1] CRAN (R 4.0.0)
 cli            2.0.2     2020-02-28 [1] CRAN (R 4.0.0)
 colorspace     1.4-1     2019-03-18 [1] CRAN (R 4.0.0)
 crayon         1.3.4     2017-09-16 [1] CRAN (R 4.0.0)
 desc           1.2.0     2018-05-01 [1] CRAN (R 4.0.0)
 digest         0.6.25    2020-02-23 [1] CRAN (R 4.0.0)
 ellipsis       0.3.1     2020-05-15 [1] CRAN (R 4.0.0)
 evaluate       0.14      2019-05-28 [1] CRAN (R 4.0.0)
 fansi          0.4.1     2020-01-08 [1] CRAN (R 4.0.0)
 farver         2.0.3     2020-01-16 [1] CRAN (R 4.0.0)
 ggplot2        3.3.2     2020-06-19 [1] CRAN (R 4.0.0)
 glue           1.4.1     2020-05-13 [1] CRAN (R 4.0.0)
 gridExtra      2.3       2017-09-09 [1] CRAN (R 4.0.0)
 gtable         0.3.0     2019-03-25 [1] CRAN (R 4.0.0)
 inline         0.3.15    2018-05-18 [1] CRAN (R 4.0.0)
 isoband        0.2.2     2020-06-20 [1] CRAN (R 4.0.0)
 labeling       0.3       2014-08-23 [1] CRAN (R 4.0.0)
 lattice        0.20-41   2020-04-02 [2] CRAN (R 4.0.1)
 lifecycle      0.2.0     2020-03-06 [1] CRAN (R 4.0.0)
 loo            2.2.0     2019-12-19 [1] CRAN (R 4.0.0)
 magrittr       1.5       2014-11-22 [1] CRAN (R 4.0.0)
 MASS           7.3-51.6  2020-04-26 [2] CRAN (R 4.0.1)
 Matrix         1.2-18    2019-11-27 [2] CRAN (R 4.0.1)
 matrixStats    0.56.0    2020-03-13 [1] CRAN (R 4.0.0)
 mgcv           1.8-31    2019-11-09 [1] CRAN (R 4.0.0)
 munsell        0.5.0     2018-06-12 [1] CRAN (R 4.0.0)
 nlme           3.1-148   2020-05-24 [2] CRAN (R 4.0.1)
 pillar         1.4.4     2020-05-05 [1] CRAN (R 4.0.0)
 pkgbuild       1.0.8     2020-05-07 [1] CRAN (R 4.0.0)
 pkgconfig      2.0.3     2019-09-22 [1] CRAN (R 4.0.0)
 pkgload        1.1.0     2020-05-29 [1] CRAN (R 4.0.0)
 praise         1.0.0     2015-08-11 [1] CRAN (R 4.0.0)
 prettyunits    1.1.1     2020-01-24 [1] CRAN (R 4.0.0)
 processx       3.4.2     2020-02-09 [1] CRAN (R 4.0.0)
 ps             1.3.3     2020-05-08 [1] CRAN (R 4.0.0)
 R6             2.4.1     2019-11-12 [1] CRAN (R 4.0.0)
 RColorBrewer   1.1-2     2014-12-07 [1] CRAN (R 4.0.0)
 Rcpp           1.0.4.6   2020-04-09 [1] CRAN (R 4.0.0)
 RcppEigen      0.3.3.7.0 2019-11-16 [1] CRAN (R 4.0.1)
 RcppParallel   5.0.2     2020-06-24 [1] CRAN (R 4.0.1)
 rlang          0.4.6     2020-05-02 [1] CRAN (R 4.0.0)
 rprojroot      1.3-2     2018-01-03 [1] CRAN (R 4.0.0)
 rstan          2.19.3    2020-02-11 [1] CRAN (R 4.0.1)
 rstudioapi     0.11      2020-02-07 [1] CRAN (R 4.0.0)
 scales         1.1.1     2020-05-11 [1] CRAN (R 4.0.0)
 StanHeaders    2.21.0-5  2020-06-09 [1] CRAN (R 4.0.1)
 testthat       2.3.2     2020-03-02 [1] CRAN (R 4.0.0)
 tibble         3.0.1     2020-04-20 [1] CRAN (R 4.0.0)
 utf8           1.1.4     2018-05-24 [1] CRAN (R 4.0.0)
 vctrs          0.3.1     2020-06-05 [1] CRAN (R 4.0.0)
 viridisLite    0.3.0     2018-02-01 [1] CRAN (R 4.0.0)
 withr          2.2.0     2020-04-20 [1] CRAN (R 4.0.0)

[1] /home/jhstagge/R/x86_64-pc-linux-gnu-library/4.0
[2] /usr/lib/R/library

@bgoodri halp.

Before Ben comes to rescue, there are few things missing from the implementation. For instance, myqgamma needs a β€œpstream” argument, like so

double myqgamma(double p, double shape, double scale, std::ostream* pstream__) {
...
}

Also, if shape and scale vary, i.e., are estimated parameters, you’ll need to implement them as var so they can be autodiffed with respect to.

Take a look at this thread for some details.

I also think the stan program needs to define myqgamma before it’s used, like so:

functions {
  real mygamma(real p, real shape, real scale);
  real standard_normal_rng() {
    return normal_rng(0,1);
  }
}
....

Not sure if this helps. I tried your suggestion and swapped over to a Windows instance (just in case). It produced slightly different error messages.


Error in compileCode(f, code, language = language, verbose = verbose) : 
  Compilation ERROR, function(s)/method(s) not created! file2cc451971e87.cpp:6:36: warning: ISO C99 requires whitespace after the macro name
 #define STAN__SERVICES__COMMAND_HPP#include <boost/integer/integer_log2.hpp>
                                    ^
In file included from C:/Users/Documents/R/R-3.5.3/library/BH/include/boost/math/distributions/gamma.hpp:13:0,
                 from G:/Documents/work_folder/projects_research/code/spi_paper/myqgamma.hpp:1,
                 from file2cc451971e87.cpp:39:
C:/Users/Documents/R/R-3.5.3/library/BH/include/boost/math/distributions/fwd.hpp:16:1: error: expected identifier before 'namespace'
 namespace boost{ namespace math{
 ^
C:/Users/Documents/R/R-3.5.3/library/BH/include/boost/math/distributions/fwd.hpp:16:1: error: expected '>' before 'namespace'
C:/Users/Documents/R/R-3.5.3/library/BH/include/boost/math/distributions/fwd.hpp:16:16: error: expected unqualified-id before '{' token
 namespac
In addition: Warning message:
In system(cmd, intern = !verbose) :
  running command 'C:/Users/Documents/R/R-3.5.3/bin/x64/R CMD SHLIB file2cc451971e87.cpp 2> file2cc451971e87.cpp.err.txt' had status 1
Error in sink(type = "output") : invalid connection
> 

I have tried several approaches, but am now thinking it might be easiest if I follow the suggestion here, avoid Boost altogether and use the algebraic solver to calculate:

I have coded an extremely silly toy example to test out this approach and it appears to work. I’m sure there are many more efficient ways to do this (vectorization, etc.), but this appears to get the job done for now. Any improvements would be much appreciated.

require(rstan)
n <- 500
rand_y <- rgamma(n, shape = 0.5, scale = 5)

fit_data <- list(N = n, y = rand_y)

mc <- 
'
functions {
vector qgamma(vector y,        // unknowns
              vector theta,    // parameters
              real[] x_r,      // data (real)
              int[] x_i) {     // data (integer)
  vector[1] z;
  z[1] = gamma_cdf(y, theta[1], 1/theta[2]) - x_r[1];
  return z;
}
}
data {
int<lower=0> N;
vector[N] y;
}
parameters {
real<lower=0> shape;
real<lower=0> scale;
}
model {
shape ~ gamma(0.5,0.2);
scale ~ gamma(0.5,0.2);

for (n in 1:N)
  y[n] ~ gamma(shape, 1/scale);
}
generated quantities {
vector[1] qgamma_result;
vector[2] theta = [shape, scale]';
vector[1] y_guess = [0.5]';

int x_i[0];
real x_r[1]; // Return the 40th percentile

x_r[1] = 0.4;

qgamma_result = algebra_solver(qgamma, y_guess, theta, x_r, x_i);
}
'

mod <- stan_model(model_code = mc, data = fit_data,  iter = 2000, init =list(list(shape = 0.5, scale = 10)), chains = 1)

print(fit)
plot(fit)

require(dplyr)
est_df <- data.frame(shape =  rstan::extract(fit, "shape")$shape, scale = rstan::extract(fit, "scale")$scale, qgamma_draw = rstan::extract(fit, "qgamma_result")$qgamma_result)
est_df <- est_df %>% mutate(qgamma_correct = qgamma(0.4, shape = shape, scale = scale))
head(est_df)
1 Like