When I was a new R user, it felt very sketchy downloading packages from non-CRAN sources, such as dev versions from GitHub repos. I’m comfortable doing that now, but when I write material aimed at new users, I think it’s reasonable to expect some of those new users will have some of the apprehensions I used to have. Installing packages from CRAN feels official, and at a bare minimum it means there has been at least one more set of eyes on the software. Since I’m generally not qualified to evaluate software with any rigor, I appreciate it when developers pass through bottlenecks like CRAN.
I felt the same way when I started out. I think now with R-universe (which didn’t exist when I was new to R) there’s a middle ground where the package you’re installing is from GitHub but R-universe runs automated checks on the packages similar to CRAN and reports errors and warnings (e.g. if you look at the Stan page you can see most packages are green (OK), one is yellow (warning) and one is red (error) and you can check all the details of the checks to see what the warning or error was about). So even if you wouldn’t expect a student to check all that, you can check it before recommending a package to them. I’m not saying this is ideal, just that it’s a decent option that didn’t used to exist.
CRAN does have some additional checks and stricter policies and sometimes a human will look at the code (but almost never to check that the code does what the author says it does). Probably the most important difference is that CRAN runs reverse dependency checks, which is mostly a good thing but can hold up progress on packages like RStan because the checks don’t strictly test whether you break important parts of other packages, they just fail if any test fails and package authors are free to write any tests they want (e.g. tests that fail if the wording even just slightly changes in a warning message). When you have hundreds of reverse dependencies this can become a big burden, but of course we’re happy so many package use Stan!. (Edit: this isn’t the only thing holding up new RStan releases, just one of many.)
Certainly reasonable.
Yeah, this is understandable and those bottlenecks are often good. But there are cases when those bottlenecks can lead to a worse situation. Of course those situations are hard for a new user to identify, so it’s definitely tricky! Like @avehtari was saying, in the case of Stan you’re getting a CRAN package in RStan but you’re getting an out-of-date version of Stan compared to cmdstanr. In addition to new features, the newer Stan versions have some bug fixes. So it’s questionable whether you’re getting better software overall by sticking to CRAN (in this case my personal opinion is that you’re getting worse software by sticking to CRAN, but that’s just in this specific case).
But I certainly understand the motivation to keep things simple for new users and just stick to CRAN. It’s hard enough learning new programing languages, and to add on having to figure out the pros and cons of installing packages from different sources can be overwhelming. So I get where you’re coming from. Of course we’re biased in that we want people using the latest Stan versions, but we’re also thrilled that people like you are encouraging others to use Stan, regardless of the particular interface or Stan version. Thanks for everything you’ve already done to help new and existing Stan users and curious to hear more about the book you’re working on.
Thanks for the thoughtful reply, @jonah. I didn’t know about R-universe, and it looks like a nice resource.
I’m not a big fan of CRAN, but I think that if cybersecurity professionals setting corporate policies suppose that CRAN affords some level of reliability or protection compared to other sources of R packages, then novice users can be forgiven for imagining the same.
It’s a big scary world out here, @jsocolar. There’s no end to the learning of what’s what.
Taking into account that CRAN does not include any cybersecurity checks that would protect from the usual attack vectors, I would keep assuming that many of those setting corporate policies are making those policies based on assumed trust and not on actual facts. If you know a cybersecurity professional who could tell why CRAN is safe, please ask more details. As far as I konw, downloading and running something from CRAN is not safe unless you run it in a sandbox. I guess that at this point Stan still is too small to attract black hat hackers.
The companies I worked for created their own CRAN mirror (CRAN: Mirror HOWTO) and then would scan or only allow packages they deemed safe. This mirror would only be updated once a year or so because of the effort involved. So you were stuck with the package version at that time point until the next refresh.
It seems like this approach could include then also non-CRAN packages like CmdStanR
This week I learned about R-multiverse
R-multiverse provides:
- A home for packages that fall outside the scope of other repositories such as CRAN and Bioconductor.
- Direct and timely distribution of package releases.
- Assurance of package quality for production scenarios.
It includes the same checks as CRAN, but for production snapshots (every 3 months) even stricter dependency and security checks.
R-multiverse production snapshot includes cmdstanr, so it seems there is a more stable and safe option to install cmdstanr than CRAN would be. Of course, you still need to install CmdStan, too, but that would be needed also if cmdstanr would be in CRAN.
we really need to rename this thread.