Problem implementing Zipf (power law) distribution -- external C++

bgoodri · February 6, 2019, 3:58am

That is supposed to work. If you add LDFLAGS += --no-undefined to ~/.R/Makevars you get a slightly more informative error message (an a linker error rather than an error from dyn.load) like:

file5e8d2fcaf228.o: In function `boost::math::tools::promote_args<double, float, float, float, float, float>::type model5e8d11fab598_discrete_power_law_namespace::power_law<double>(int const&, double const&, std::ostream*)':
file5e8d2fcaf228.cpp:(.text._ZN46model5e8d11fab598_discrete_power_law_namespace9power_lawIdEEN5boost4math5tools12promote_argsIT_fffffE4typeERKiRKS5_PSo[_ZN46model5e8d11fab598_discrete_power_law_namespace9power_lawIdEEN5boost4math5tools12promote_argsIT_fffffE4typeERKiRKS5_PSo]+0x2f): undefined reference to `boost::math::tools::promote_args<double, float, float, float, float, float>::type model5e8d11fab598_discrete_power_law_namespace::zeta<double>(double const&, std::ostream*)'

I don’t understand why it thinks it needs a definition of
boost::math::tools::promote_args<double, float, float, float, float, float>::type blah_blah_namespace::zeta<double>(double const&, std::ostream*)
as opposed to what is defined which is simply

double zeta(const double& s, std::ostream* pstream__) {
  return boost::math::zeta(s);
}

And then it does it again for the var specialization

file5e8d2fcaf228.o: In function `boost::math::tools::promote_args<stan::math::var, float, float, float, float, float>::type model5e8d11fab598_discrete_power_law_namespace::power_law<stan::math::var>(int const&, stan::math::var const&, std::ostream*)':
file5e8d2fcaf228.cpp:(.text._ZN46model5e8d11fab598_discrete_power_law_namespace9power_lawIN4stan4math3varEEEN5boost4math5tools12promote_argsIT_fffffE4typeERKiRKS8_PSo[_ZN46model5e8d11fab598_discrete_power_law_namespace9power_lawIN4stan4math3varEEEN5boost4math5tools12promote_argsIT_fffffE4typeERKiRKS8_PSo]+0x109): undefined reference to `boost::math::tools::promote_args<stan::math::var, float, float, float, float, float>::type model5e8d11fab598_discrete_power_law_namespace::zeta<stan::math::var>(stan::math::var const&, std::ostream*)'

bgoodri · February 6, 2019, 4:16am

OK. I have the problem surrounded. This works, i.e. just declare real zeta(real alpha); but don’t define a power_law function that calls it:

functions{
    real zeta(real s);
    /*TODO: implement rng for prior and posterior predictive checks*/
}
data{
  int<lower=0> K; // number of unique values
  int values[K];
  int<lower=0> frequencies[K];
}
parameters{
  real <lower=1> alpha;
}
model{
  real constant = log(zeta(alpha));
  for (k in 1:K) {
    target += frequencies[k] * (-alpha * log(values[k]) - constant);
  }
}

That is actually better (iff alpha is a scalar) because it only evaluates the zeta function once. But it does not answer the question (for @Bob_Carpenter or @seantalts) of why you can declare a zeta function, define it in an external C++ file, but not call it from another Stan function without incurring a linker error. In other words, the original Stan program fails to link, even though it should inline to the same thing

functions{
    real zeta(real s);
    real power_law_lpmf(int x, real alpha){
     return (-alpha * log(x) - log(zeta(alpha)) );
    }
    /*TODO: implement rng for prior and posterior predictive checks*/
}
data{
  int<lower=0> K; // number of unique values  
  int values[K];
  int<lower=0> frequencies[K]; 
}
parameters{
  real <lower=1> alpha;
}
model{
  for (k in 1:K) target += frequencies[k] * power_law_lpmf(values[k] | alpha);
}

with errors like

maxbiostat · February 6, 2019, 9:11am

My goal with tagging you and Bob and Ben wasn’t just to ask for help. For example, Bob was the one who initially suggested the Zipf stuff be added to Stan [even though the issue was posted by Daniel Lee] and I wanted him to see the - shy, admittedly - progress on the maths side of things.
I’m hoping that whatever I do here for my own stuff, which will be using Rstan, can then with a bit of work be added to the math library.

By the way, I tried CmdStan and it wasn’t much more intuitive or gave much clearer [to me] error messages. As it unfolded, I would never guessed what the root cause was.

maxbiostat · February 6, 2019, 9:22am

Thanks so much for all the effort you put into this, Ben. This compiles at least. Still some stuff to iron out, but progress in sight.

Yup, nice catch.

seantalts · February 6, 2019, 12:43pm

I appreciate your sentiment here - someone may indeed come along and find that your code helps them with the full implementation. Cross-posting a reference to this discussion or your code on the github issue would help with that. The core maintainers are a bit overworked so we’d be unlikely to just pick up the issue, but if you (or anyone else reading this) wanted to learn a bit more about contributing to the Math library we’d be happy to help you turn your prototype into production code.

I didn’t mean to imply that anything about C++ development would be intuitive, just that CmdStan has fewer layers of complexity to deconstruct when debugging. So even if you didn’t understand its output, posting that here would have helped the people you were tagging. I actually didn’t know you could even include external C++ files with RStan until reading this thread and this vignette.

You’re sure the external C++ file was included properly and had all the proper overloads defined? Putting up the full model file and include file as a gist on github with the compiler invocations might shed some light on it.

maxbiostat · February 6, 2019, 12:58pm

I should have added that “with a bit of work” meant “with a bit of work from me”. I do intend to contribute this to math. But I needed to solve my problem first in order to get on with my research, and that was writing a .hpp that I could call from RStan.
I’ve read the materials you linked and they’re helpful. Due to my C++ illiteracy and inherent thickness, it’ll probably take a while until I can produce anything close to semi-production code for you guys to review.
My main doubt, which I guess applies to both yourself and @Bob_Carpenter is: would it be OK to compute the gradient using the integral representation (and hence an exp_sinh quadrature call)? My fear is that this might be rather too slow and/or inaccurate for the math library.

Let me know if this helps. I’ll edit the gist to add any info you might judge helpful.

seantalts · February 6, 2019, 2:15pm

I think Ben or another RStan person would know more here - what you’ve posted doesn’t seem to show zeta.hpp getting included in the compiler invocations or the generated hpp. There must be more happening during the RStan invocation to link those up.

bgoodri · February 6, 2019, 2:29pm

Well, there is a linker error if you try to call the C++ function from within another Stan function, so I must have overlooked something but I don’t know what that is.

Here is the C++ with var and a double specialization of the declared but not defined zeta function in a Stan program.

stan::math::var zeta(stan::math::var const& s, std::ostream* pstream__) {
  double s_ = value_of(s);
  double f = boost::math::zeta(s_);
  double d = 0;
  double k = 2;
  double term = 1;
  // http://functions.wolfram.com/ZetaFunctionsandPolylogarithms/Zeta/20/01/01/0001/MainEq1.L.gif
  while (term > 1e-16) {
    term = log(k) / pow(k, s_);
    d -= term;
    k++;
  }
  return var(new precomp_v_vari(f, s.vi_, d));
}

double zeta(const double& s, std::ostream* pstream__) {
  return boost::math::zeta(s);
}

This Stan program works

functions{
    real zeta(real s);
    /*TODO: implement rng for prior and posterior predictive checks*/
}
data{
  int<lower=0> K; // number of unique values  
  int values[K];
  int<lower=0> frequencies[K]; 
}
parameters{
  real <lower=1> alpha;
}
model{
  real constant = log(zeta(alpha));
  for (k in 1:K) {
    target += frequencies[k] * (-alpha * log(values[k]) - constant);
  }
}

with the usual

stan_model(file = "discrete_power_law.stan",  allow_undefined = TRUE,
           includes = paste0('\n#include "', file.path(getwd(), 'zeta.hpp'), '"\n')

but the original Stan model has a linker error

functions{
    real zeta(real s);
    real power_law_lpmf(int x, real alpha){
     return (-alpha * log(x) - log(zeta(alpha)) );
    }
    /*TODO: implement rng for prior and posterior predictive checks*/
}
data{
  int<lower=0> K; // number of unique values  
  int values[K];
  int<lower=0> frequencies[K]; 
}
parameters{
  real <lower=1> alpha;
}
model{
  for (k in 1:K) target += frequencies[k] * power_law_lpmf(values[k] | alpha);
}

So, the only difference is that in the working case, the zeta function is called in the model block directly, whereas in the non-working case, the model block calls power_law_lpmf which calls zeta. The compiler should inline the power_law_lpmf function anyway, in which case, I have no idea why one links correctly and the other has undefined symbols.

maxbiostat · June 22, 2019, 12:00pm

Here’s an update on this bug . Or at least I believe it’s a bug [and also believe it’s the same bug]. If try to compile the following program:

functions {
  real foo(real x, real xc, real[] theta, real[] x_r, int[] x_i) {
    return exp(normal_lpdf(x | theta[1], theta[2]));
  }
}
data {
  real x_r[2];
  int x_i[10];
}
parameters {
  real<lower=0> theta[2];
}
transformed parameters{
  real int_foo = integrate_1d(foo, 0.0, positive_infinity(), theta, x_r, x_i, 0.01);
}
model {
  theta ~ normal(0, 1);
}

I get

$ grep "error:" integrate1d_errors.txt 
/home/luiz/R/x86_64-pc-linux-gnu-library/3.6/StanHeaders/include/stan/math/rev/arr/functor/integrate_1d.hpp:139:22: error: no match for call to ‘(const model5ef64a6ba6bc_test_integrate1d_namespace::foo_functor__) (double, double, const std::vector<stan::math::var>&, const std::vector<double>&, const std::vector<int>&, std::ostream&)’
/home/luiz/R/x86_64-pc-linux-gnu-library/3.6/StanHeaders/include/stan/math/rev/arr/functor/integrate_1d.hpp:145:21: error: no match for call to ‘(const model5ef64a6ba6bc_test_integrate1d_namespace::foo_functor__) (double, double, const std::vector<stan::math::var>&, const std::vector<double>&, const std::vector<int>&, std::ostream&)’
/home/luiz/R/x86_64-pc-linux-gnu-library/3.6/StanHeaders/include/stan/math/rev/arr/functor/integrate_1d.hpp:36:15: error: no match for call to ‘(const model5ef64a6ba6bc_test_integrate1d_namespace::foo_functor__) (const double&, const double&, std::vector<stan::math::var>&, const std::vector<double>&, const std::vector<int>&, std::ostream&)’
/usr/include/c++/7/functional:641:24: error: no matching function for call to ‘__invoke(const model5ef64a6ba6bc_test_integrate1d_namespace::foo_functor__&, double&, double, const std::vector<double>&, const std::vector<double>&, const std::vector<int>&, std::basic_ostream<char>&)’
/usr/include/c++/7/bits/invoke.h:89:5: error: no type named ‘type’ in ‘struct std::__invoke_result<const model5ef64a6ba6bc_test_integrate1d_namespace::foo_functor__&, double&, double, const std::vector<double, std::allocator<double> >&, const std::vector<double, std::allocator<double> >&, const std::vector<int, std::allocator<int> >&, std::basic_ostream<char, std::char_traits<char> >&>’
/usr/include/c++/7/functional:641:24: error: no matching function for call to ‘__invoke(const model5ef64a6ba6bc_test_integrate1d_namespace::foo_functor__&, double, double, const std::vector<double>&, const std::vector<double>&, const std::vector<int>&, std::basic_ostream<char>&)’
/usr/include/c++/7/bits/invoke.h:89:5: error: no type named ‘type’ in ‘struct std::__invoke_result<const model5ef64a6ba6bc_test_integrate1d_namespace::foo_functor__&, double, double, const std::vector<double, std::allocator<double> >&, const std::vector<double, std::allocator<double> >&, const std::vector<int, std::allocator<int> >&, std::basic_ostream<char, std::char_traits<char> >&>’
/usr/include/c++/7/functional:641:24: error: no matching function for call to ‘__invoke(const model5ef64a6ba6bc_test_integrate1d_namespace::foo_functor__&, double&, double&, const std::vector<double>&, const std::vector<double>&, const std::vector<int>&, std::basic_ostream<char>&)’
/usr/include/c++/7/bits/invoke.h:89:5: error: no type named ‘type’ in ‘struct std::__invoke_result<const model5ef64a6ba6bc_test_integrate1d_namespace::foo_functor__&, double&, double&, const std::vector<double, std::allocator<double> >&, const std::vector<double, std::allocator<double> >&, const std::vector<int, std::allocator<int> >&, std::basic_ostream<char, std::char_traits<char> >&>’

Full log is here (1.4 MB) .

Interface is Rstan although I don’t think it’s an interface-specific problem.
Tagging @mitzimorris and @Bob_Carpenter as they’ve been quite helpful about these things in the past.

Thanks for taking a look.

bgoodri · June 22, 2019, 3:15pm

Yeah, there was something weird about the print stream for integrate_1d. I think @bbbales2 knows about it.

bbbales2 · June 22, 2019, 4:26pm

Is this a showstopper bug? Or something with an annoying workaround? Either way I’ll take a look later today.

bbbales2 · June 22, 2019, 4:27pm

And by showstopper I mean something where an annoying workaround would be helpful

maxbiostat · June 22, 2019, 4:33pm

Depends. For the zipf thing the annoying workaround is given by @bgoodri in the post I mentioned. The integrate_1d problem is a showstopper unless you can/want to implement the stuff in external C++. Would that qualify as an annoying workaround?

bbbales2 · June 22, 2019, 4:38pm

Ooow that’s as showstoppy as anything— thanks tor the tldr. My bad for missing this

maxbiostat · June 22, 2019, 4:50pm

I believe these two issues are connected. But I’m illiterate enough on C++ to be completely off about this. Just a heads up ;-)

bgoodri · June 22, 2019, 5:16pm

In https://cran.r-project.org/web/packages/StanHeaders/vignettes/stanmath.html I had to use auto to get the integrate_1d example to work. All of the other functions, passing a pointer to a std::ostream worked.

bbbales2 · June 22, 2019, 7:58pm

@maxbiostat, which version of Rstan are you using?

In Rstan 2.18.2 (the current Cran version), and I was not able to reproduce the error. Instead I got another weird one (error message posted at the end).

I upgraded to Rstan 2.19.1 via an install from Github and was able to compile the model (https://github.com/stan-dev/rstan/wiki/Installing-RStan-on-Linux#installing-rstan-from-source)

Pystan on pip (2.19.0) and cmdstan from develop also compiled the model with no errors.

If it doesn’t bork your setup, maybe try installing 2.19.1?

Here is the error message I got from 2.18.2:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

variable "foo" does not exist.
  error in 'modelb0131aec23f9_model' at line 14, column 33
  -------------------------------------------------
    12: }
    13: transformed parameters{
    14:   real int_foo = integrate_1d(foo, 0.0, positive_infinity(), theta, x_r, x_i, 0.01);
                                        ^
    15: }
  -------------------------------------------------

Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model 'model' due to the above error.

maxbiostat · June 22, 2019, 11:11pm

@bbbales2, I have tried following the steps in the link you provided, and whilst the installation proceeded without issues, I still can’t compile the program. I’ll post some system specs so maybe you and @bgoodri can help’me figure out what’s going on.

> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=pt_BR.UTF-8       LC_NUMERIC=C               LC_TIME=pt_BR.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=pt_BR.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=pt_BR.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=pt_BR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rstan_2.19.1       StanHeaders_2.18.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1         rstudioapi_0.10    magrittr_1.5       tidyselect_0.2.5   munsell_0.5.0      colorspace_1.4-1   R6_2.4.0          
 [8] rlang_0.3.4        dplyr_0.8.1        plyr_1.8.4         tools_3.6.0        parallel_3.6.0     pkgbuild_1.0.3     grid_3.6.0        
[15] gtable_0.3.0       loo_2.1.0          cli_1.1.0          matrixStats_0.54.0 lazyeval_0.2.2     assertthat_0.2.1   tibble_2.1.1      
[22] crayon_1.3.4       processx_3.3.1     gridExtra_2.3      purrr_0.3.2        callr_3.2.0        ggplot2_3.1.1      ps_1.3.0          
[29] glue_1.3.1         inline_0.3.15      compiler_3.6.0     pillar_1.4.0       scales_1.0.0       prettyunits_1.0.2  stats4_3.6.0      
[36] pkgconfig_2.0.2

and

$ uname -a
Linux gauss 4.15.0-1039-oem #44-Ubuntu SMP Sun May 19 07:35:44 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.4.0-1ubuntu1~18.04.1' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)

Thanks again.

bbbales2 · June 23, 2019, 2:27pm

Do you have anything in your ~/.R/Makevars that might be interfering with the build? Was there anything custom that went along with the C++ code above?

Can you clone a copy of cmdstan and see if that works? Should be able to do:

git clone --recursive https://github.com/stan-dev/cmdstan.git
cd cmdstan
[move a copy of your model into the cmdstan folder and call it model.stan]
make -j4 model

If that builds (you can type ./model and it’s an actual executable that exists and gives you some help info), then that points to a difference in cmdstan/Rstan.

The computer I installed 2.19.1 on was an 18.04. I wasn’t able to remote into it just now, but I’ll stop by and check it later today and check the sessionInfo() stuff.

maxbiostat · June 23, 2019, 3:11pm

I can confirm that it works* with cmdstan. It seems I was wrong, this is indeed an interface-specific issue. Or at least machine-specific. When we nail it down I might change the tag to RStan so as to facilitate searching in the future.

*I had to modify the program slightly in order to instantiate x_r and x_i but that’s not related to any compilation issues.

Topic		Replies	Views
Adapt a probability mass function for increased speed Modeling	5	565	October 15, 2021
Discrete Power Law distributions in Stan? Modeling	3	710	September 12, 2019
Require real return type for probability functions error Modeling rstan , techniques , specification	4	909	May 29, 2020
Cannot use a (new) function for a rstantools based package RStan	8	638	June 2, 2020
Create (and debug) a new _lpmf function Modeling	9	1360	May 9, 2017

Problem implementing Zipf (power law) distribution -- external C++

Related topics