general questions: for all CmdStanX wrappers, should part of the contract be “minimal library dependencies”? If so, how do we decide when using a library is OK?
context: discussion from https://github.com/stan-dev/cmdstanr/issues/243 -
CmdStanR started using R package vroom to speed up csv reading -
as the name suggests, vroom is faster than
(Ideally, Stan io will improve and we won’t need to deal with csv files - that is still a long way off - but definitely something we should work on).
Unfortunately adding a single dependency on vroom (which seemed like a good idea since it’s much faster than read.csv()) is really like adding a lot of dependencies with how intertwined the tidyverse/r-lib/etc packages are with each other. I really wanted to avoid depending on any of those packages but we don’t have the person hours to commit to writing our own fast csv reader!
CmdStanPy’s original goals, as listed in the README.md: https://github.com/stan-dev/cmdstanpy#goals include this:
- Clean interface to Stan services so that CmdStanPy can keep up with Stan releases.
- Easy to install,
- minimal Python library dependencies: numpy, pandas
- Python code doesn’t interface directly with c++, only calls compiled executables
CmdStanPy needs NumPy and Pandas and now that it’s 2020, Python 3 seemed ilke an OK requirement as well - this seems to have not met with complaints, so far.
What about R? What are the minimal, stable, useful libraries that we need? Which things have proven to be problematic?