Problem implementing Zipf (power law) distribution -- external C++

Yeah, I might start with “let integrate_1d autodiff that”, but if you look at the code @bbbales2 wrote, what it does is first integrate the function in doubles and then integrate the function where one parameter is a var and everything else is a double, and then does that for each of the parameters. So, if you were to translate that R code to C++, you could do it somewhat more efficiently by integrating both functions in doubles with exp_sinh quadrature.

Right. Here’s the latest iteration of zeta.hpp, for inclusion using the Rstan mechanism:

#include <boost/math/special_functions/zeta.hpp>
#include <boost/math/special_functions/digamma.hpp>
#include <boost/math/special_functions/gamma.hpp>
#include <boost/math/quadrature/exp_sinh.hpp>
using boost::math::isfinite;

inline double zeta(const double& s, std::ostream* pstream__){
  return boost::math::zeta(s);
}

inline var zeta(const var& s_var, std::ostream* pstream__) {
  double s = s_var.val();
  double f = zeta(s, pstream__);
  boost::math::quadrature::exp_sinh<double> integrator;
  auto int_rep_zeta_prime = [&](double x) {
    double ans = std::pow(x, s-1) *( std::log(x) - boost::math::digamma(s))/(std::exp(x) -1);
    if(!isfinite(ans)) return 0.0; // weird patching to get quadrature working
    return ans;
  };
  double tolerance = std::sqrt(std::numeric_limits<double>::epsilon());
  double error = 0.0;
  double L1 = 0.0;
  size_t levels;
  double deriv = integrator.integrate(int_rep_zeta_prime, tolerance, &error, &L1, &levels)/boost::math::tgamma(s);
  return var(new precomp_v_vari(f, s_var.vi_, deriv));
}

I don’t actually use quadrature to evaluate zeta because there’s no need (unless the implementation of zeta in boost is jinxed). Gives me that (templating?) error:

Error in dyn.load(libLFile) : 
  unable to load shared object '/tmp/RtmpUaG4TC/file4c872949fc9c.so':
  /tmp/RtmpUaG4TC/file4c872949fc9c.so: undefined symbol: _ZN45model4c87906d8bc_discrete_power_law_namespace4zetaIN4stan4math3varEEEN5boost4math5tools12promote_argsIT_fffffE4typeERKS8_PSo
Calls: stan_model ... cxxfunctionplus -> <Anonymous> -> cxxfunction -> dyn.load
Execution halted

The actual output with verbose = TRUE does not contain any errors that I could find.
@bbbales2, @Bob_Carpenter @seantalts what am I missing?

EDIT: The derivative was missing a \frac{1}{\Gamma(s)} term and the integral representation needed patching to work with exp_sinh. Shouldn’t affect any of the problems discussed.

The people you tagged probably can’t help you with RStan. Try CmdStan as mentioned above - it’s an easier environment to debug these things in.

I’m really not sure. Since it’s failing with linkage, my two wild guesses would be:

  1. Get rid of the inline keywords on both functions? inline affects linkage, but I think we want inline here? So I’m not sure

  2. Do things with templating and see if the linking changes

template<typename T>
inline double zeta(const T& s, std::ostream* pstream__){
  return boost::math::zeta(s);
}

template<>
inline var zeta(const var& s_var, std::ostream* pstream__) {
  ...
}

I think it’s saying zeta isn’t defined.

1 Like

That is supposed to work. If you add LDFLAGS += --no-undefined to ~/.R/Makevars you get a slightly more informative error message (an a linker error rather than an error from dyn.load) like:

file5e8d2fcaf228.o: In function `boost::math::tools::promote_args<double, float, float, float, float, float>::type model5e8d11fab598_discrete_power_law_namespace::power_law<double>(int const&, double const&, std::ostream*)':
file5e8d2fcaf228.cpp:(.text._ZN46model5e8d11fab598_discrete_power_law_namespace9power_lawIdEEN5boost4math5tools12promote_argsIT_fffffE4typeERKiRKS5_PSo[_ZN46model5e8d11fab598_discrete_power_law_namespace9power_lawIdEEN5boost4math5tools12promote_argsIT_fffffE4typeERKiRKS5_PSo]+0x2f): undefined reference to `boost::math::tools::promote_args<double, float, float, float, float, float>::type model5e8d11fab598_discrete_power_law_namespace::zeta<double>(double const&, std::ostream*)'

I don’t understand why it thinks it needs a definition of
boost::math::tools::promote_args<double, float, float, float, float, float>::type blah_blah_namespace::zeta<double>(double const&, std::ostream*)
as opposed to what is defined which is simply

double zeta(const double& s, std::ostream* pstream__) {
  return boost::math::zeta(s);
}

And then it does it again for the var specialization

file5e8d2fcaf228.o: In function `boost::math::tools::promote_args<stan::math::var, float, float, float, float, float>::type model5e8d11fab598_discrete_power_law_namespace::power_law<stan::math::var>(int const&, stan::math::var const&, std::ostream*)':
file5e8d2fcaf228.cpp:(.text._ZN46model5e8d11fab598_discrete_power_law_namespace9power_lawIN4stan4math3varEEEN5boost4math5tools12promote_argsIT_fffffE4typeERKiRKS8_PSo[_ZN46model5e8d11fab598_discrete_power_law_namespace9power_lawIN4stan4math3varEEEN5boost4math5tools12promote_argsIT_fffffE4typeERKiRKS8_PSo]+0x109): undefined reference to `boost::math::tools::promote_args<stan::math::var, float, float, float, float, float>::type model5e8d11fab598_discrete_power_law_namespace::zeta<stan::math::var>(stan::math::var const&, std::ostream*)'

OK. I have the problem surrounded. This works, i.e. just declare real zeta(real alpha); but don’t define a power_law function that calls it:

functions{
    real zeta(real s);
    /*TODO: implement rng for prior and posterior predictive checks*/
}
data{
  int<lower=0> K; // number of unique values
  int values[K];
  int<lower=0> frequencies[K];
}
parameters{
  real <lower=1> alpha;
}
model{
  real constant = log(zeta(alpha));
  for (k in 1:K) {
    target += frequencies[k] * (-alpha * log(values[k]) - constant);
  }
} 

That is actually better (iff alpha is a scalar) because it only evaluates the zeta function once. But it does not answer the question (for @Bob_Carpenter or @seantalts) of why you can declare a zeta function, define it in an external C++ file, but not call it from another Stan function without incurring a linker error. In other words, the original Stan program fails to link, even though it should inline to the same thing

functions{
    real zeta(real s);
    real power_law_lpmf(int x, real alpha){
     return (-alpha * log(x) - log(zeta(alpha)) );
    }
    /*TODO: implement rng for prior and posterior predictive checks*/
}
data{
  int<lower=0> K; // number of unique values  
  int values[K];
  int<lower=0> frequencies[K]; 
}
parameters{
  real <lower=1> alpha;
}
model{
  for (k in 1:K) target += frequencies[k] * power_law_lpmf(values[k] | alpha);
}

with errors like

3 Likes

My goal with tagging you and Bob and Ben wasn’t just to ask for help. For example, Bob was the one who initially suggested the Zipf stuff be added to Stan [even though the issue was posted by Daniel Lee] and I wanted him to see the - shy, admittedly - progress on the maths side of things.
I’m hoping that whatever I do here for my own stuff, which will be using Rstan, can then with a bit of work be added to the math library.

By the way, I tried CmdStan and it wasn’t much more intuitive or gave much clearer [to me] error messages. As it unfolded, I would never guessed what the root cause was.

Thanks so much for all the effort you put into this, Ben. This compiles at least. Still some stuff to iron out, but progress in sight.

Yup, nice catch.

1 Like

I appreciate your sentiment here - someone may indeed come along and find that your code helps them with the full implementation. Cross-posting a reference to this discussion or your code on the github issue would help with that. The core maintainers are a bit overworked so we’d be unlikely to just pick up the issue, but if you (or anyone else reading this) wanted to learn a bit more about contributing to the Math library we’d be happy to help you turn your prototype into production code.

I didn’t mean to imply that anything about C++ development would be intuitive, just that CmdStan has fewer layers of complexity to deconstruct when debugging. So even if you didn’t understand its output, posting that here would have helped the people you were tagging. I actually didn’t know you could even include external C++ files with RStan until reading this thread and this vignette.

You’re sure the external C++ file was included properly and had all the proper overloads defined? Putting up the full model file and include file as a gist on github with the compiler invocations might shed some light on it.

1 Like

I should have added that “with a bit of work” meant “with a bit of work from me”. I do intend to contribute this to math. But I needed to solve my problem first in order to get on with my research, and that was writing a .hpp that I could call from RStan.
I’ve read the materials you linked and they’re helpful. Due to my C++ illiteracy and inherent thickness, it’ll probably take a while until I can produce anything close to semi-production code for you guys to review.
My main doubt, which I guess applies to both yourself and @Bob_Carpenter is: would it be OK to compute the gradient using the integral representation (and hence an exp_sinh quadrature call)? My fear is that this might be rather too slow and/or inaccurate for the math library.

Let me know if this helps. I’ll edit the gist to add any info you might judge helpful.

1 Like

I think Ben or another RStan person would know more here - what you’ve posted doesn’t seem to show zeta.hpp getting included in the compiler invocations or the generated hpp. There must be more happening during the RStan invocation to link those up.

Well, there is a linker error if you try to call the C++ function from within another Stan function, so I must have overlooked something but I don’t know what that is.

Here is the C++ with var and a double specialization of the declared but not defined zeta function in a Stan program.

stan::math::var zeta(stan::math::var const& s, std::ostream* pstream__) {
  double s_ = value_of(s);
  double f = boost::math::zeta(s_);
  double d = 0;
  double k = 2;
  double term = 1;
  // http://functions.wolfram.com/ZetaFunctionsandPolylogarithms/Zeta/20/01/01/0001/MainEq1.L.gif
  while (term > 1e-16) {
    term = log(k) / pow(k, s_);
    d -= term;
    k++;
  }
  return var(new precomp_v_vari(f, s.vi_, d));
}

double zeta(const double& s, std::ostream* pstream__) {
  return boost::math::zeta(s);
}

This Stan program works

functions{
    real zeta(real s);
    /*TODO: implement rng for prior and posterior predictive checks*/
}
data{
  int<lower=0> K; // number of unique values  
  int values[K];
  int<lower=0> frequencies[K]; 
}
parameters{
  real <lower=1> alpha;
}
model{
  real constant = log(zeta(alpha));
  for (k in 1:K) {
    target += frequencies[k] * (-alpha * log(values[k]) - constant);
  }
}

with the usual

stan_model(file = "discrete_power_law.stan",  allow_undefined = TRUE,
           includes = paste0('\n#include "', file.path(getwd(), 'zeta.hpp'), '"\n')

but the original Stan model has a linker error

functions{
    real zeta(real s);
    real power_law_lpmf(int x, real alpha){
     return (-alpha * log(x) - log(zeta(alpha)) );
    }
    /*TODO: implement rng for prior and posterior predictive checks*/
}
data{
  int<lower=0> K; // number of unique values  
  int values[K];
  int<lower=0> frequencies[K]; 
}
parameters{
  real <lower=1> alpha;
}
model{
  for (k in 1:K) target += frequencies[k] * power_law_lpmf(values[k] | alpha);
}

So, the only difference is that in the working case, the zeta function is called in the model block directly, whereas in the non-working case, the model block calls power_law_lpmf which calls zeta. The compiler should inline the power_law_lpmf function anyway, in which case, I have no idea why one links correctly and the other has undefined symbols.

2 Likes

Here’s an update on this bug . Or at least I believe it’s a bug [and also believe it’s the same bug]. If try to compile the following program:

functions {
  real foo(real x, real xc, real[] theta, real[] x_r, int[] x_i) {
    return exp(normal_lpdf(x | theta[1], theta[2]));
  }
}
data {
  real x_r[2];
  int x_i[10];
}
parameters {
  real<lower=0> theta[2];
}
transformed parameters{
  real int_foo = integrate_1d(foo, 0.0, positive_infinity(), theta, x_r, x_i, 0.01);
}
model {
  theta ~ normal(0, 1);
}

I get

$ grep "error:" integrate1d_errors.txt 
/home/luiz/R/x86_64-pc-linux-gnu-library/3.6/StanHeaders/include/stan/math/rev/arr/functor/integrate_1d.hpp:139:22: error: no match for call to ‘(const model5ef64a6ba6bc_test_integrate1d_namespace::foo_functor__) (double, double, const std::vector<stan::math::var>&, const std::vector<double>&, const std::vector<int>&, std::ostream&)’
/home/luiz/R/x86_64-pc-linux-gnu-library/3.6/StanHeaders/include/stan/math/rev/arr/functor/integrate_1d.hpp:145:21: error: no match for call to ‘(const model5ef64a6ba6bc_test_integrate1d_namespace::foo_functor__) (double, double, const std::vector<stan::math::var>&, const std::vector<double>&, const std::vector<int>&, std::ostream&)’
/home/luiz/R/x86_64-pc-linux-gnu-library/3.6/StanHeaders/include/stan/math/rev/arr/functor/integrate_1d.hpp:36:15: error: no match for call to ‘(const model5ef64a6ba6bc_test_integrate1d_namespace::foo_functor__) (const double&, const double&, std::vector<stan::math::var>&, const std::vector<double>&, const std::vector<int>&, std::ostream&)’
/usr/include/c++/7/functional:641:24: error: no matching function for call to ‘__invoke(const model5ef64a6ba6bc_test_integrate1d_namespace::foo_functor__&, double&, double, const std::vector<double>&, const std::vector<double>&, const std::vector<int>&, std::basic_ostream<char>&)’
/usr/include/c++/7/bits/invoke.h:89:5: error: no type named ‘type’ in ‘struct std::__invoke_result<const model5ef64a6ba6bc_test_integrate1d_namespace::foo_functor__&, double&, double, const std::vector<double, std::allocator<double> >&, const std::vector<double, std::allocator<double> >&, const std::vector<int, std::allocator<int> >&, std::basic_ostream<char, std::char_traits<char> >&>’
/usr/include/c++/7/functional:641:24: error: no matching function for call to ‘__invoke(const model5ef64a6ba6bc_test_integrate1d_namespace::foo_functor__&, double, double, const std::vector<double>&, const std::vector<double>&, const std::vector<int>&, std::basic_ostream<char>&)’
/usr/include/c++/7/bits/invoke.h:89:5: error: no type named ‘type’ in ‘struct std::__invoke_result<const model5ef64a6ba6bc_test_integrate1d_namespace::foo_functor__&, double, double, const std::vector<double, std::allocator<double> >&, const std::vector<double, std::allocator<double> >&, const std::vector<int, std::allocator<int> >&, std::basic_ostream<char, std::char_traits<char> >&>’
/usr/include/c++/7/functional:641:24: error: no matching function for call to ‘__invoke(const model5ef64a6ba6bc_test_integrate1d_namespace::foo_functor__&, double&, double&, const std::vector<double>&, const std::vector<double>&, const std::vector<int>&, std::basic_ostream<char>&)’
/usr/include/c++/7/bits/invoke.h:89:5: error: no type named ‘type’ in ‘struct std::__invoke_result<const model5ef64a6ba6bc_test_integrate1d_namespace::foo_functor__&, double&, double&, const std::vector<double, std::allocator<double> >&, const std::vector<double, std::allocator<double> >&, const std::vector<int, std::allocator<int> >&, std::basic_ostream<char, std::char_traits<char> >&>’

Full log is here (1.4 MB) .

Interface is Rstan although I don’t think it’s an interface-specific problem.
Tagging @mitzimorris and @Bob_Carpenter as they’ve been quite helpful about these things in the past.

Thanks for taking a look.

Yeah, there was something weird about the print stream for integrate_1d. I think @bbbales2 knows about it.

Is this a showstopper bug? Or something with an annoying workaround? Either way I’ll take a look later today.

And by showstopper I mean something where an annoying workaround would be helpful

Depends. For the zipf thing the annoying workaround is given by @bgoodri in the post I mentioned. The integrate_1d problem is a showstopper unless you can/want to implement the stuff in external C++. Would that qualify as an annoying workaround?

Ooow that’s as showstoppy as anything— thanks tor the tldr. My bad for missing this

I believe these two issues are connected. But I’m illiterate enough on C++ to be completely off about this. Just a heads up ;-)

In https://cran.r-project.org/web/packages/StanHeaders/vignettes/stanmath.html I had to use auto to get the integrate_1d example to work. All of the other functions, passing a pointer to a std::ostream worked.