Problem implementing Zipf (power law) distribution -- external C++

Here’s my R configuration:

> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rstan_2.19.1          StanHeaders_2.18.1-10

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1         rstudioapi_0.10    magrittr_1.5       tidyselect_0.2.5   munsell_0.5.0      colorspace_1.4-1  
 [7] R6_2.4.0           rlang_0.3.4        dplyr_0.8.1        tools_3.6.0        parallel_3.6.0     pkgbuild_1.0.3    
[13] grid_3.6.0         gtable_0.3.0       loo_2.1.0          cli_1.1.0          matrixStats_0.54.0 lazyeval_0.2.2    
[19] assertthat_0.2.1   tibble_2.1.3       crayon_1.3.4       processx_3.3.1     gridExtra_2.3      purrr_0.3.2       
[25] callr_3.2.0        ggplot2_3.2.0      ps_1.3.0           glue_1.3.1         inline_0.3.15      compiler_3.6.0    
[31] pillar_1.4.1       scales_1.0.0       prettyunits_1.0.2  stats4_3.6.0       pkgconfig_2.0.2

and it looks like I have gcc 7.4 too:

gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.4.0-1ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04)

I don’t think this is in code generation or anything. I compared the code from your error log to the code I’m coming with 2.19.1 and the diff is:

bbales2@frog:~/Downloads$ diff orig.cpp new.cpp 
5c5
<    6 : #define STAN__SERVICES__COMMAND_HPP// Code generated by Stan version 2.18.1
---
>    6 : #define STAN__SERVICES__COMMAND_HPP// Code generated by Stan version 2.19.1
9c9
<   10 : namespace model5ef64a6ba6bc_test_integrate1d_namespace {
---
>   10 : namespace model6efb25fd15c9_model_namespace {
24,25c24,25
<   25 :     reader.add_event(0, 0, "start", "model5ef64a6ba6bc_test_integrate1d");
<   26 :     reader.add_event(20, 18, "end", "model5ef64a6ba6bc_test_integrate1d");
---
>   25 :     reader.add_event(0, 0, "start", "model6efb25fd15c9_model");
>   26 :     reader.add_event(20, 18, "end", "model6efb25fd15c9_model");
68c68
<   69 : class model5ef64a6ba6bc_test_integrate1d : public prob_grad {
---
>   69 : class model6efb25fd15c9_model : public prob_grad {
73c73
<   74 :     model5ef64a6ba6bc_test_integrate1d(stan::io::var_context& context__,
---
>   74 :     model6efb25fd15c9_model(stan::io::var_context& context__,
79c79
<   80 :     model5ef64a6ba6bc_test_integrate1d(stan::io::var_context& context__,
---
>   80 :     model6efb25fd15c9_model(stan::io::var_context& context__,
97c97
<   98 :         static const char* function__ = "model5ef64a6ba6bc_test_integrate1d_namespace::model5ef64a6ba6bc_test_integrate1d";
---
>   98 :         static const char* function__ = "model6efb25fd15c9_model_namespace::model6efb25fd15c9_model";
149c149
<  150 :     ~model5ef64a6ba6bc_test_integrate1d() { }
---
>  150 :     ~model6efb25fd15c9_model() { }
303c303
<  304 :         static const char* function__ = "model5ef64a6ba6bc_test_integrate1d_namespace::write_array";
---
>  304 :         static const char* function__ = "model6efb25fd15c9_model_namespace::write_array";
372c372
<  373 :         return "model5ef64a6ba6bc_test_integrate1d";
---
>  373 :         return "model6efb25fd15c9_model";
425c425
<  426 : typedef model5ef64a6ba6bc_test_integrate1d_namespace::model5ef64a6ba6bc_test_integrate1d stan_model;
---
>  426 : typedef model6efb25fd15c9_model_namespace::model6efb25fd15c9_model stan_model;
431,433c431,433
<  432 : RCPP_MODULE(stan_fit4model5ef64a6ba6bc_test_integrate1d_mod){
<  433 :   Rcpp::class_<rstan::stan_fit<model5ef64a6ba6bc_test_integrate1d_namespace::model5ef64a6ba6bc_test_integrate1d,
<  434 :                boost::random::ecuyer1988> >("stan_fit4model5ef64a6ba6bc_test_integrate1d")
---
>  432 : RCPP_MODULE(stan_fit4model6efb25fd15c9_model_mod){
>  433 :   Rcpp::class_<rstan::stan_fit<model6efb25fd15c9_model_namespace::model6efb25fd15c9_model,
>  434 :                boost::random::ecuyer1988> >("stan_fit4model6efb25fd15c9_model")
438c438
<  439 :             &rstan::stan_fit<model5ef64a6ba6bc_test_integrate1d_namespace::model5ef64a6ba6bc_test_integrate1d, boost::random::ecuyer1988>::call_sampler)
---
>  439 :             &rstan::stan_fit<model6efb25fd15c9_model_namespace::model6efb25fd15c9_model, boost::random::ecuyer1988>::call_sampler)
440c440
<  441 :             &rstan::stan_fit<model5ef64a6ba6bc_test_integrate1d_namespace::model5ef64a6ba6bc_test_integrate1d, boost::random::ecuyer1988>::param_names)
---
>  441 :             &rstan::stan_fit<model6efb25fd15c9_model_namespace::model6efb25fd15c9_model, boost::random::ecuyer1988>::param_names)
442c442
<  443 :             &rstan::stan_fit<model5ef64a6ba6bc_test_integrate1d_namespace::model5ef64a6ba6bc_test_integrate1d, boost::random::ecuyer1988>::param_names_oi)
---
>  443 :             &rstan::stan_fit<model6efb25fd15c9_model_namespace::model6efb25fd15c9_model, boost::random::ecuyer1988>::param_names_oi)
444c444
<  445 :             &rstan::stan_fit<model5ef64a6ba6bc_test_integrate1d_namespace::model5ef64a6ba6bc_test_integrate1d, boost::random::ecuyer1988>::param_fnames_oi)
---
>  445 :             &rstan::stan_fit<model6efb25fd15c9_model_namespace::model6efb25fd15c9_model, boost::random::ecuyer1988>::param_fnames_oi)
446c446
<  447 :             &rstan::stan_fit<model5ef64a6ba6bc_test_integrate1d_namespace::model5ef64a6ba6bc_test_integrate1d, boost::random::ecuyer1988>::param_dims)
---
>  447 :             &rstan::stan_fit<model6efb25fd15c9_model_namespace::model6efb25fd15c9_model, boost::random::ecuyer1988>::param_dims)
448c448
<  449 :             &rstan::stan_fit<model5ef64a6ba6bc_test_integrate1d_namespace::model5ef64a6ba6bc_test_integrate1d, boost::random::ecuyer1988>::param_dims_oi)
---
>  449 :             &rstan::stan_fit<model6efb25fd15c9_model_namespace::model6efb25fd15c9_model, boost::random::ecuyer1988>::param_dims_oi)
450c450
<  451 :             &rstan::stan_fit<model5ef64a6ba6bc_test_integrate1d_namespace::model5ef64a6ba6bc_test_integrate1d, boost::random::ecuyer1988>::update_param_oi)
---
>  451 :             &rstan::stan_fit<model6efb25fd15c9_model_namespace::model6efb25fd15c9_model, boost::random::ecuyer1988>::update_param_oi)
452c452
<  453 :             &rstan::stan_fit<model5ef64a6ba6bc_test_integrate1d_namespace::model5ef64a6ba6bc_test_integrate1d, boost::random::ecuyer1988>::param_oi_tidx)
---
>  453 :             &rstan::stan_fit<model6efb25fd15c9_model_namespace::model6efb25fd15c9_model, boost::random::ecuyer1988>::param_oi_tidx)
454c454
<  455 :             &rstan::stan_fit<model5ef64a6ba6bc_test_integrate1d_namespace::model5ef64a6ba6bc_test_integrate1d, boost::random::ecuyer1988>::grad_log_prob)
---
>  455 :             &rstan::stan_fit<model6efb25fd15c9_model_namespace::model6efb25fd15c9_model, boost::random::ecuyer1988>::grad_log_prob)
456c456
<  457 :             &rstan::stan_fit<model5ef64a6ba6bc_test_integrate1d_namespace::model5ef64a6ba6bc_test_integrate1d, boost::random::ecuyer1988>::log_prob)
---
>  457 :             &rstan::stan_fit<model6efb25fd15c9_model_namespace::model6efb25fd15c9_model, boost::random::ecuyer1988>::log_prob)
458c458
<  459 :             &rstan::stan_fit<model5ef64a6ba6bc_test_integrate1d_namespace::model5ef64a6ba6bc_test_integrate1d, boost::random::ecuyer1988>::unconstrain_pars)
---
>  459 :             &rstan::stan_fit<model6efb25fd15c9_model_namespace::model6efb25fd15c9_model, boost::random::ecuyer1988>::unconstrain_pars)
460c460
<  461 :             &rstan::stan_fit<model5ef64a6ba6bc_test_integrate1d_namespace::model5ef64a6ba6bc_test_integrate1d, boost::random::ecuyer1988>::constrain_pars)
---
>  461 :             &rstan::stan_fit<model6efb25fd15c9_model_namespace::model6efb25fd15c9_model, boost::random::ecuyer1988>::constrain_pars)
462c462
<  463 :             &rstan::stan_fit<model5ef64a6ba6bc_test_integrate1d_namespace::model5ef64a6ba6bc_test_integrate1d, boost::random::ecuyer1988>::num_pars_unconstrained)
---
>  463 :             &rstan::stan_fit<model6efb25fd15c9_model_namespace::model6efb25fd15c9_model, boost::random::ecuyer1988>::num_pars_unconstrained)
464c464
<  465 :             &rstan::stan_fit<model5ef64a6ba6bc_test_integrate1d_namespace::model5ef64a6ba6bc_test_integrate1d, boost::random::ecuyer1988>::unconstrained_param_names)
---
>  465 :             &rstan::stan_fit<model6efb25fd15c9_model_namespace::model6efb25fd15c9_model, boost::random::ecuyer1988>::unconstrained_param_names)
466c466
<  467 :             &rstan::stan_fit<model5ef64a6ba6bc_test_integrate1d_namespace::model5ef64a6ba6bc_test_integrate1d, boost::random::ecuyer1988>::constrained_param_names)
---
>  467 :             &rstan::stan_fit<model6efb25fd15c9_model_namespace::model6efb25fd15c9_model, boost::random::ecuyer1988>::constrained_param_names)
469c469
<  470 :             &rstan::stan_fit<model5ef64a6ba6bc_test_integrate1d_namespace::model5ef64a6ba6bc_test_integrate1d, boost::random::ecuyer1988>::standalone_gqs)
---
>  470 :             &rstan::stan_fit<model6efb25fd15c9_model_namespace::model6efb25fd15c9_model, boost::random::ecuyer1988>::standalone_gqs)
476c476
<  477 : SEXP file5ef64c40b4f4( ) ;
---
>  477 : SEXP file6efb6d0aa33f( ) ;
481,482c481,482
<  482 : SEXP file5ef64c40b4f4(  ){
<  483 :  return Rcpp::wrap("test_integrate1d");
---
>  482 : SEXP file6efb6d0aa33f(  ){
>  483 :  return Rcpp::wrap("model");

I think the differences are all to do with you naming your model “test_integrate1d” and me naming my model “model”.

What is in your ~/.R/Makevars? Mine is empty.

With the verbose=TRUE argument, there should be some output that looks like:

Compilation argument:
 /usr/lib/R/bin/R CMD SHLIB file6efb6d0aa33f.cpp 2> file6efb6d0aa33f.cpp.err.txt 
g++ -std=gnu++14 -I"/usr/share/R/include" -DNDEBUG   -I"/home/bbales2/R/x86_64-pc-linux-gnu-library/3.6/Rcpp/include/"  -I"/home/bbales2/R/x86_64-pc-linux-gnu-library/3.6/RcppEigen/include/"  -I"/home/bbales2/R/x86_64-pc-linux-gnu-library/3.6/RcppEigen/include/unsupported"  -I"/home/bbales2/R/x86_64-pc-linux-gnu-library/3.6/BH/include" -I"/home/bbales2/R/x86_64-pc-linux-gnu-library/3.6/StanHeaders/include/src/"  -I"/home/bbales2/R/x86_64-pc-linux-gnu-library/3.6/StanHeaders/include/"  -I"/home/bbales2/R/x86_64-pc-linux-gnu-library/3.6/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS    -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-VAQCff/r-base-3.6.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c file6efb6d0aa33f.cpp -o file6efb6d0aa33f.o

And breaking that g++ command out into multiple lines so it’s easier to read:

g++
-std=gnu++14
-I"/usr/share/R/include"
-DNDEBUG
-I"/home/bbales2/R/x86_64-pc-linux-gnu-library/3.6/Rcpp/include/"
-I"/home/bbales2/R/x86_64-pc-linux-gnu-library/3.6/RcppEigen/include/"
-I"/home/bbales2/R/x86_64-pc-linux-gnu-library/3.6/RcppEigen/include/unsupported"
-I"/home/bbales2/R/x86_64-pc-linux-gnu-library/3.6/BH/include"
-I"/home/bbales2/R/x86_64-pc-linux-gnu-library/3.6/StanHeaders/include/src/"
-I"/home/bbales2/R/x86_64-pc-linux-gnu-library/3.6/StanHeaders/include/"
-I"/home/bbales2/R/x86_64-pc-linux-gnu-library/3.6/rstan/include"
-DEIGEN_NO_DEBUG
-DBOOST_DISABLE_ASSERTS
-fpic
-g
-O2
-fdebug-prefix-map=/build/r-base-VAQCff/r-base-3.6.0=.
-fstack-protector-strong
-Wformat
-Werror=format-security
-Wdate-time
-D_FORTIFY_SOURCE=2
-g
-c file6efb6d0aa33f.cpp
-o file6efb6d0aa33f.o

Does your 2.19.1 show the compilation argument? The 2.18.1 output is missing this I think. Given the code is the same an our OSes are so close, it seems like there must be a different argument being passed to something somewhere.

I was slightly disobedient. When using cmdstan I refused to name my file model.stan as you instructed, instead giving it the name test_integrate1d.stan. It compiled fine. I just tried compiling from RStan using the renamed file and I get errors all the same.
Here’s my (RStan) compilation argument, which I will try to break up as you did:

g++
 -std=gnu++14
 -I"/usr/local/lib/R/include" -DNDEBUG
 -I"/home/luiz/R/x86_64-pc-linux-gnu-library/3.6/Rcpp/include/"
 -I"/home/luiz/R/x86_64-pc-linux-gnu-library/3.6/RcppEigen/include/"
 -I"/home/luiz/R/x86_64-pc-linux-gnu-library/3.6/RcppEigen/include/unsupported"
 -I"/home/luiz/R/x86_64-pc-linux-gnu-library/3.6/BH/include"
 -I"/home/luiz/R/x86_64-pc-linux-gnu-library/3.6/StanHeaders/include/src/"
 -I"/home/luiz/R/x86_64-pc-linux-gnu-library/3.6/StanHeaders/include/"
 -I"/home/luiz/R/x86_64-pc-linux-gnu-library/3.6/rstan/include"
 -DEIGEN_NO_DEBUG
 -DBOOST_DISABLE_ASSERTS
 -I/usr/local/include
 -fpic
 -g
 -O2
 -c file683b69d5d548.cpp
 -o file683b69d5d548.o

I see I’m probably missing

-fdebug-prefix-map=/build/r-base-VAQCff/r-base-3.6.0=.
-fstack-protector-strong
-Wformat
-Werror=format-security
-Wdate-time
-D_FORTIFY_SOURCE=2

Mine doesn’t even exist.

$ cat  ~/.R/Makevars
cat: /home/luiz/.R/Makevars: No such file or directory

Many thanks for looking into this.

Weird. I wouldn’t expect any of those things to matter.

Can you check your stdc++ version? The errors you pulled out seem pretty specific and there’s some weird c++ going on there. Could be a mistake.

Also can you post the error log with the 2.19.1 build?

Like this:

bbales2@frog:~$ dpkg --search /usr/include/c++
libstdc++-7-dev:amd64: /usr/include/c++
bbales2@frog:~$ dpkg -s libstdc++-7-dev
Package: libstdc++-7-dev
Status: install ok installed
Priority: optional
Section: libdevel
Installed-Size: 15342
Maintainer: Ubuntu Core developers <ubuntu-devel-discuss@lists.ubuntu.com>
Architecture: amd64
Multi-Arch: same
Source: gcc-7
Version: 7.4.0-1ubuntu1~18.04
...

Here’s it:

luiz@gauss:~$ dpkg --search /usr/include/c++
libstdc++-7-dev:amd64: /usr/include/c++

luiz@gauss:~$ dpkg -s libstdc++-7-dev
Package: libstdc++-7-dev
Status: install ok installed
Priority: optional
Section: libdevel
Installed-Size: 15342
Maintainer: Ubuntu Core developers <ubuntu-devel-discuss@lists.ubuntu.com>
Architecture: amd64
Multi-Arch: same
Source: gcc-7
Version: 7.4.0-1ubuntu1~18.04.1
Provides: libstdc++-dev
Depends: gcc-7-base (= 7.4.0-1ubuntu1~18.04.1), libgcc-7-dev (= 7.4.0-1ubuntu1~18.04.1), libstdc++6 (>= 7.4.0-1ubuntu1~18.04.1), libc6-dev (>= 2.13-0ubuntu6)
Suggests: libstdc++-7-doc
Conflicts: libg++2.8-dev, libg++27-dev, libg++272-dev (<< 2.7.2.8-1), libstdc++2.10-dev (<< 1:2.95.3-2), libstdc++2.8-dev, libstdc++2.9-dev, libstdc++2.9-glibc2.1-dev, libstdc++3.0-dev
Description: GNU Standard C++ Library v3 (development files)
 This package contains the headers and static library files necessary for
 building C++ programs which use libstdc++.
 .
 libstdc++-v3 is a complete rewrite from the previous libstdc++-v2, which
 was included up to g++-2.95. The first version of libstdc++-v3 appeared
 in g++-3.0.
Homepage: http://gcc.gnu.org/
Original-Maintainer: Debian GCC Maintainers <debian-gcc@lists.debian.org>

error log (1.4 MB)

Can you update your StanHeaders? Mine says StanHeaders_2.18.1-10 and yours says StanHeaders_2.18.1. I just assumed the -10 was something local, but those are actually different package versions.

I most definitely can. But don’t know how. update.packages("StanHeaders") does not produce the desired outcome, presumably because StanHeaders_2.18.1-10 is on github not on CRAN.

Oh really, maybe remove and reinstall then. It’s listed here: https://cran.r-project.org/web/packages/StanHeaders/index.html

remove.packages("StanHeaders")
install.packages("StanHeaders")
1 Like

Du-Oh. I suppose I should remove “expert-level R” from my CV. Doing this gets StanHeaders up to date and solves the compilation problem.

Thanks a bunch, @bbbales2!

I dunno why update.packages didn’t work either. Glad this worked though!

Hi everyone,

I found this thread most helpful in some of my current work, which requires inference on a discrete power law distribution. However, I do need to be able to vary the ‘q’ or ‘y_min’ parameter of the discrete power law, which means passing a different q value to the zeta function. (If it matters, I care about being able to arbitrarily fix y_min, but don’t necessarily need to do inference on it as a model parameter.)

Since, as far as I can tell, boost only supports the Riemann zeta function (where q is fixed at 1) rather than the Hurwitz zeta (where q is an argument), I don’t think the above solution will work. Is that right? Is the best approach to follow the example in this thread, but also include a non-boost

I see that there are rigorous c++ implementations of Hurwitz zeta outside of boost, e.g. https://link.springer.com/article/10.1007%2Fs11075-014-9893-1

Thanks for any guidance!

Hi, I think you’re right that the solution given above is not going to work for the Hurwitz zeta function.

For a start, you can try modifying the C++ program to include the full gradient of the Hurwitz zeta, using the integral representation given here.

I’m busy right now but willing to offer help, should you need any. I might be able to prepare an MWE in Rstan later.

Hi maxbiostat,

Thanks so much for your reply - your offer to help is much appreciated! I could definitely use some assistance to do the C++ implementation.

Hi all,

With many thanks to maxbiostat, we have a working implementation of the discrete Pareto in Stan with variable (fixed at arbitrary values) y_min. Some code, tests, and examples in both python and R are here:

Please note, as explained in the readme there, that the numerical approximation we are using for the Hurwitz zeta (based on quadrature integration) gives very poor results at alpha << 2. It seems straightforward to replace this with a proper implementation of the Hurwitz zeta using the Euler-Maclaurin summation formula which would relieve this problem, but after a few failed attempts I realized I would need some C++ help to get that working! There is at least one simple pre-existing C implementation (via cephes, as is used in scipy) that should work.

2 Likes