Stan R package repository

Is there any interest among the stan-dev R package developers for making a independent package repository for the official stan-dev R packages?

I am not proposing this with eliminating CRAN in mind, but as an easier way of publishing new R package versions before they reach CRAN or for R packages that have no interest in getting on CRAN.

This is a much simpler alternative to using devtools::install_github(), allows distributing binary packages and also allows CRAN-hosted packages to use these packages even if not on CRAN.

The users would add our repository to their list of repositories:

options(repos = c("https://packages.mc-stan.org", getOption("repos"))

and then install.packages("cmdstanr") or install.packages("rstan") would install the newest version available on our repository. The alternative would be to specify our repository only for specific stuff.

install.packages("cmdstanr", repos = c("https://packages.mc-stan.org"))

There are two ways we could go about doing this:

  • use a Github repository as a package repository

Any Github repository can be a static website. Hence any Github repository can also be used for hosting R packages. Example: https://github.com/RcppCore/drat
In this case

install.packages("Rcpp", repos="https://RcppCore.github.io/drat")

would install from this repository. The downside of this is that Github was never really meant for hosting huge tarball files, so this is probably only useful for a small number of packages with not a lot of turnaround. Which might be our case, I dont know.

  • self hosted static web page

In this case we need to have a static web page (for example http://packages.mc-stan.org) hosted somewhere. Maybe we could add that to the same machine that hosts https://jenkins.mc-stan.org/ and add the packages sub-domain to it but I am not sure how busy the machine is right now and if it could handle the additional load. @serban-nicusor how busy is the machine running https://jenkins.mc-stan.org/?

  • static web page hosted on AWS S3 or some other hosting provider

This obviously brings some additional costs, I am not exactly sure how much though, as its hard to estimate all the different traffic numbers you need to input in the hosting pricing calculator to get a ballpark number for monthly pricing. I think @ahartikainen mentioned once that azure and NumFocus have some kind of agreement for open-source projects and free services. I am not sure if that is still relevant.

Huge thanks to @dpastoor for the initial information on this in the cmdstanr thread.

@dpastoor would you maybe know a ballpark figure of the ROpenSci AWS S3 traffic numbers or charge? Ours would obivously smaller than rOpenSCI.

cc: @jonah @bgoodri @paul.buerkner

9 Likes

Hey @rok_cesnovar I think we have enough resources on the Jenkins machine to host a static web page if that’s just for getting packages from time to time and not a continuous usage of resources. LInk me if you find a good web-ui package manager or if we need ours and I can put it up!

1 Like

So for example, rOpenSci has http://packages.ropensci.org/ which is just a Jenkins landing page. But the UI does not really matter as it would be used inside R not from the browser. The main thing is that src/contrib returns a list of packages, that src/contrib/PACKAGES returns a file and that we have tar.gz files in src/contrib.

Example:

Rstan source package is 1 MB, its MacOS binary is ~20MB and its Windows binary I think ~5MB. And that package is probably among the largest we would host (rstanarm is a bit bigger but this is the general ballpark for largest packages) .

RStan has 85K downloads per month on CRAN, this would definitely much less than that. Would it handle lets says 1k per month or 10k?

2 Likes

Just wanted to comment that I am very much in favor of this proposal, especially to already use cmdstanr and posterior in our other CRAN hosted packages.

3 Likes

At least for Azure DevOps (similar to Github Actions), but I don’t know if they have Azure credits too, probably.

1 Like

So the main question is probably which of the stan-dev R packages would want to use this. We will use some variation of this for cmdstanr and posterior. If those will be the only two, I think making a Github repository will suffice (with packages.mc-stan.org redirecting there). But if other R packages would want to use this as well it would make more sense to host this somewhere else.

Me too!

I’m not sure if there’s a need for this for packages like bayesplot, shinystan, loo, projpred (although I’m not opposed to it for those packages and maybe there are compelling reasons), but I definitely think this could be useful for rstanarm. We often have fixes/changes to rstanarm that are difficult for people to use until they make it into a CRAN release because install_github() is a pain with rstanarm (it has to compile all the models). But we don’t do too many CRAN releases of rstanarm, so it would be nice to be able to offer some binaries that can be easily installed when people are waiting for bugfixes or new features that aren’t on CRAN. @bgoodri what do you think (for rstanarm and rstan)?

3 Likes

We (really 99% Rok) have started this at https://github.com/stan-dev/r-packages. So far it has the latest cmdstanr and posterior. For example, this now works:

install.packages("cmdstanr", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
5 Likes

I really hope this repository now does not mean we start to dump CRAN.

We do have a policy in place that only things from CRAN can be installed into our production R easily. I could imagine that other companies have similar things in place which would not at all play nice with such a develop repository.

The long rstan version lockdown is about to be resolved… hopefully sustainably.

1 Like

No, this is only meant so that we can publish binaries for new features/bugfixes ahead of time. And that CRAN packages like brms can use posterior/cmdstanr before those reach CRAN.

Or for example a lot users want to use feature/survival branch of rstanarm which is not on CRAN. We could build those binaries for Mac/Win and host them here.

Careful there, dont jinx Ben :)

1 Like

I would be careful here. At the moment when brms uses posterior/cmdstanr, then brms won’t be able to be put on CRAN as well. Or did I miss something here?

Sure… but it’s still a pity that this functionality is not on CRAN which is the real solution here. My concern is that this repository is encouraging to move away from CRAN. If this is more a dev only thing… fine, but for that I would have thought that Docker is better (but I did not yet have time to seriously work that out nor is it a given that devs are fluent with Docker… sure).

I did put a lot of work into getting this resolved (OK, Ben probably a lot more), but I am optimistic here.

So I hope the common sense is still that things are headed to CRAN wrt to R (but it’s not me to make these decisions, of course).

I was fearing this would turn in to a CRAN vs. anti-CRAN. So lets not go there, please. No one is suggesting we ditch CRAN. I am aware of the safety-related procedures that prevent the use of non-CRAN stuff.

CRAN packages can have the Additional_repositories field to include non-CRAN packages if they are strictly optional. So if cmdstanr is an optional backend that would work. See Host cmdstanr on 'drat' for use in CRAN packages before release · Issue #168 · stan-dev/cmdstanr · GitHub and the links there.

Yes, but there are probably good reasons why its not on CRAN right now? I am guessing, I dont know. I read something about RAM usage, but dont know the details. And since rstanarm and install_github dont work well for non-power users, this is a really really simple solution for:

  • publishing fixes before they reach CRAN

Stan R packages structure makes publishing on CRAN a slow process and users want to try out new things and start to build from source which leads to a bunch of support questions that could be avoided.

  • publishing packages that will never be able to be published on CRAN due to some CRAN-limitation
  • publishing packages in alpha/beta/pre-CRAN stage (see posterior, cmdstanr)

Not sure why you feel that way. The first line on the repo says: “A place for publishing new versions of (some) stan-dev R packages before they reach CRAN and for stan-dev R packages and versions where releasing on CRAN is not a (current) goal.”

I would actually argue that by adding this repository, we could get more things or CRAN faster, rather than less, as it could lead to less support questions.

I really hope it gets through! Getting stanc3 in Rstan would be huge.

1 Like

There’s always Python :-)

It sounds to me like the plan is to have packages like BRMS depend on non-CRAN packages like CmdStanR. Does that work for @wds15?

That shouldn’t affect anything for @wds15. We won’t add a dependency on a non-CRAN package unless that dependency is a “soft” dependency (i.e., not actually required but can be used if present).

Also, we should be able to get CmdStanR onto CRAN eventually. (But don’t worry: it won’t have the issues RStan has on CRAN since CmdStanR doesn’t use StanHeaders and we’re not distributing CmdStan with CmdStanR. The only thing that would go onto CRAN is the CmdStanR R code.)

+1 on that!

There are a ton of features now in cmdstan which make it really urgent from my view to get cmdstanR on CRAN, but I am not able to judge on its readiness (but on its need).

1 Like

Yeah I agree. Not quite ready yet, but almost!

But isn’t CmdStan that soft dependency? I have a hard time believing that it’d be OK to a security-minded sysadmin would require R software to come from CRAN, then open the backdoor to download , compile, and execute arbitrary C++ code.

So I’m very surprised by the reply:

If I had a nickel for every time I guessed wrong about a large company’s behavior, I’d have enough money to buy dinner.

Maybe I’m misunderstanding, but I don’t see the problem. brms, for example, wouldn’t offer a function to install cmdstan. If cmdstanr and cmdstan happen to be installed it will just be capable of using them. So if Novartis doesn’t want to allow cmdstanr then they will have no risk of cmdstan being used even if they allow a package like brms that is capable of using it.

Won’t that make things challenging for brms users?

Will brms depend on cmdstanr? If so, does that mean as soon as brms is installed, the cmdstan installer that’s part of cmdstanr will also be included?

My guess is that users with security-conscious IT departments will need to get approval for both cmdstanr and cmdstan. Just approving CRAN shouldn’t be good enough to get a working cmdstanr because it depends on a foreign executable.

No, that’s what I’ve been trying to say ;) It will only depend on rstan. cmdstanr is not required. perhaps it was my initial use of “soft dependency” that was the problem. brms will not depend on cmdstanr or cmdstan. It would be available as an optional add-on, but Novartis can just not whitelist cmdstanr and there’s no problem.