Rstan - contribute to build a bullet proof Makevars initialiser

In (i) trying to build R packages with rstantools, (ii) trying to make them install by other users, and (iii) reading at this forum, I see that a big barrier exists in setting up the right installation environment (e.g., Makevars) for Linux, Windows and MacOS.

I see that the settings are pretty OS and hardware specific. Would it be possible to collect a detailed case specific documentation of all the setting that work for each user (machine and OS) in a way we can programmatically set the right environment based on a list of “if” system conditions?

I think smoothing the installation process is fundamental for expanding the use of Rstan to downstream packages.

@bgoodri, @jonah if all this is possible, I would like to contribute as much as I can in collecting user cases and build a script.

I think there is a lot of confusion around this issue. To build a package to upload to CRAN:

  1. You don’t need a Makevars file locally, except on some Linux servers, and if you are using a Linux server you had best know how to configure your environment.
  2. Most of the reasons why you should nevertheless have a Makevars file are to specify some non-default options that improve performance but are contrary to CRAN’s policies for CRAN packages.
  3. AFAIK, the sections of the wiki on the Makevars file for Linux and Windows are correct, and the one that gets generated automatically on a Mac when you use the Mac toolchain installer is fine as well.

It is somewhat different if you are telling people to install a package with Stan programs from source that is just on GitHub or something rather than CRAN, but even then only on Windows and hopefully not for much longer. The reason is that Windows will try to install multiarch (even though that is useless) and that R for Windows by default does not know about the existence of the C++14 standard. But the wiki page for Windows takes care of this.

Thanks for the clarification.

I don’t know how representative my working community is (biology and bioinformatics), but I share what is often our user case.

I see two (I believe) misconceptions here

  1. the use of servers is limited just to “IT knowledgeable people” (see below)
  2. the use of github is just for internal development (see below)

Services such as Rstudio server have granted the access to great hardware to (very many) biologists, statisticians and bioinformaticians. Although most of them are familiar with high level R libraries, do not know what C++ is (I barely do) and would expect that devtools::install_* would install the package without the need to contact the IT department to troubleshoot possible problems. To put it simply, doesn’t matter how better Stan based tools are better than others, if it takes too much effort/time to install, people will walk away. This is sad but true.

In our community Github is a incredibly used repository for many packages (which my eventually go to bioconductor/CRAN after months/years of heavy use by the community).

My point is that devtools::install_github should go smoothly for all users for this package have enough success to make it to more demanding repositories. If I understand correctly the function of .travis.yml,

  - sudo add-apt-repository ppa:ubuntu-toolchain-r/test -y
  - sudo apt-get update -q
  - mkdir -p ~/.R/
  - echo "CXX = `R CMD config CXX`" >> ~/.R/Makevars
  - echo "CXXFLAGS = `R CMD config CXXFLAGS` -pedantic -g0 -flto -stdlib=libc++" >> ~/.R/Makevars
  - echo "LDFLAGS += -flto -stdlib=libc++" >> ~/.R/Makevars
  - export CLANG_EXTRA_ARG=""
  - if [[ $CXX = "clang++" ]] ;  then export CLANG_EXTRA_ARG=" -Qunused-arguments -fcolor-diagnostics " ; fi
  - sed -i.bak "s/ g++/ ${CXX}${CLANG_EXTRA_ARG}/" ~/.R/Makevars
  - sed -i.bak "s/O[0-3]/O$CXX_OLEVEL/" ~/.R/Makevars

I see an attempt to cope with the initialization issue. If this is the case, why do not transform that script to be a comprehensive algorithm that will satisfy 90% of the users? This function for me is extremely important, and is currently an incredible bottleneck.

I am thankful for that effort, however most of the times I have to navigate forums to understand why the default scripts in that wiki do not work. An example is

But I find online endless combination of flags depending on the system configuration, of which many of them could maybe be checked within those wiki script (that might be merged to make a great .travis.yml script)

Github is establishing itself as major R repository for permanent code, it is growing incredibly fast and should be considered as main tool for data scientists. The issue is that a code has to work flawlessly from Github before make it to CRAN/Bioconductor. At the moment this puts a very tall barrier to biology Stan based research. To put it (too) simply, “nobody” except me can install or use my packages, unfortunately. This is frustrating after 2, 3 years of work. I don’t want to sound to harsh, I am still grateful for rstan. That’s why I would like to personally improve this issue.

The travis.yml is pretty specific for the Travis servers. I wouldn’t generalize too much from it, especially to Windows.

It is true that if Windows users are installing such a package from source, then at the moment (there is a rumor it could change next week) they will have to cluebat R into using the only compiler it knows about. I think the configuration on the wiki is correct, even though it does not include the ${BINPREF} thing. In the past, we have run into problems where some people had a file with ${BINPREF} and then some other process assumed it needed to be added and then compilation failed because ${BINPREF}${BINPREF}g++ did not exist.

It is true that without the ${BINPREF} thing, installation from source will fail on Windows when trying to install the 32-bit version, which no one in their right mind should be doing anyway for a random GitHub beta package. To avoid multiarch, you need to pass an argument, rather than tweaking the file. For example,

remotes::install_github("fate-ewi/bayesdfa", INSTALL_opts = "--no-multiarch")