Trimming boost, removing unused files

As we were packaging PyStan 2.19 we hit an error because stan and stan_math together have more than 2^{16} files. The problem is solved but it does (re)raise the question: Can we prune boost a bit? Is there some way to avoid distributing everything?

I’m leaning towards the conclusion that it’s not worth the developer time to do anything about this. I’m posting this just in case other people have concerns or ideas.

It probably would speed up build times a bit if we could avoid downloading and/or copying all those files.

We used to do that in rstan before there was a BH package with something like

find ${STAN_HOME}/src/ -name \*\.\[ch]pp -exec bcp --scan --boost=${STAN_HOME}/lib/boost_1.XX.0 '{}' /tmp/boost_1.XX.0/ \; &> /tmp/boost_1.XX.0/bcp.log 

One option is to have installer download 3rd-party libs on the fly.

1 Like

@ariddell, mind filling in details on the error?

  • where is it coming from?
  • is it the number of files?
  • is it the size of the download?
  • how long does it take to build?
  • is the build time a problem?
  • how often are you packaging?
  • how did you work around this?

That might be the case. We could do what @yizhang suggests or even add a dependencies repo outside of Math. But… the real benefit to including all of Boost is that it makes development so much easier. I just clone Math and I’m up and running. If this is partial or there’s another step inside, there’s always a chance for problems with matching versions. This would be a continual burden on the developer as opposed to the packaging problem is an infrequent burden on the maintainer. It’s still a burden, so maybe there’s a clever way to fix it all at once. But first, it’d help to know what the actual error is and what process is generating it.

The problem arose because zip was/is used to package the PyStan
distribution and older versions of the zip format do not allow more
than 2^{16} files in a single zip file.

We solved it with a little hack (removing some files from boost). And
even this hack will not be needed as the build software widely used in
the Python packaging world moves away from this old version of zip.

If an easy way (for developers) to not use all of boost ever emerges, I
do think we should use it. The bandwidth and storage savings are
non-trivial. Boost is huge.

You’re absolutely right. It looks like Boost takes up 699M out of Math’s total of 732M.

There’s a lot of stuff in lib/boost_1.69.0/libs… I wonder if we can just drop the whole subdir. I don’t see it actually used for building anything.

Turns out we can’t do that; we need the libs folder for building MPI.

1 Like

We could get to ~200MBs easily without sacrificing anything of note:

start size of the boost_1.69.0 folder on Ubuntu 18.04:
66805 items, totalling 608,5 MB

removing test subfolders in libs:
rm -R libs/*/test/
53507 items, totalling 526,7 MB

removing example subfolders in libs:
rm -R libs/*/example/
49782 items, totalling 510,6 MB

removing doc subfolders in libs:
rm -R libs/*/doc/
30438 items, totalling 278,2 MB

without the boost_1.69.0/doc subfolder:
22183 items, totalling 193,9 MB

If removing doc from boost is fine, I can make a issue+PR to trim this.

1 Like

That sounds great! Maybe you can use the script for updating sundials as a template (see the lib/ folder).

2 Likes

That really cut down the size dramatically and never led to any problems.

Just a heads up for those that are not following Stan Math PRs closely. With this PR now merged we got from 66828 items, totalling 608,4 MB to 22222 items, totalling 193,9 MB on the Boost folder. So the next version should be more lightweight.

5 Likes