As we were packaging PyStan 2.19 we hit an error because stan and stan_math together have more than 2^{16} files. The problem is solved but it does (re)raise the question: Can we prune boost a bit? Is there some way to avoid distributing everything?
I’m leaning towards the conclusion that it’s not worth the developer time to do anything about this. I’m posting this just in case other people have concerns or ideas.
It probably would speed up build times a bit if we could avoid downloading and/or copying all those files.
That might be the case. We could do what @yizhang suggests or even add a dependencies repo outside of Math. But… the real benefit to including all of Boost is that it makes development so much easier. I just clone Math and I’m up and running. If this is partial or there’s another step inside, there’s always a chance for problems with matching versions. This would be a continual burden on the developer as opposed to the packaging problem is an infrequent burden on the maintainer. It’s still a burden, so maybe there’s a clever way to fix it all at once. But first, it’d help to know what the actual error is and what process is generating it.
The problem arose because zip was/is used to package the PyStan
distribution and older versions of the zip format do not allow more
than 2^{16} files in a single zip file.
We solved it with a little hack (removing some files from boost). And
even this hack will not be needed as the build software widely used in
the Python packaging world moves away from this old version of zip.
If an easy way (for developers) to not use all of boost ever emerges, I
do think we should use it. The bandwidth and storage savings are
non-trivial. Boost is huge.
Just a heads up for those that are not following Stan Math PRs closely. With this PR now merged we got from 66828 items, totalling 608,4 MB to 22222 items, totalling 193,9 MB on the Boost folder. So the next version should be more lightweight.