Tracking Stan ecosystem downloads

I did a dive into tracking downloads from github and it turns out that it is possible. Github support was very helpful and I am confident this is based on the relevant current practices and capabilities.

Running the below on the command line (I think all OSs have curl):

‘curl https://api.github.com/repos/stan-dev/cmdstan/releases/tags/v2.25.0

yields the following for the latest CmdStan release:

<snip>
      "size": 50006786,
      "download_count": 6329,
      "created_at": "2020-10-26T14:23:12Z",
      "updated_at": "2020-10-26T14:23:24Z",
      "browser_download_url": "https://github.com/stan-dev/cmdstan/releases/download/v2.25.0/cmdstan-2.25.0.tar.gz"
    }
<snip>

The "download_count": 6329 applies only to the the explicit tarball that @serban-nicusor presumably put there. The other tar/zip files that you see in the releases interface are automatically generated and NOT counted. You can tell by seeing if there is a file size reported.

So questions:

  • Do we want to add more tracking as a matter of policy? We don’t have solid metrics on Stan ecosystem usage for non-R packages, we have excellent tracking from CRAN for RStan … with the exception of CmdStanR.
  • Repos that don’t create and add a tarball or installer for releases cannot be tracked. Do we want to change this? Looking around we can track currently:
    • CmdStan
    • StanC3 (but lots of custom assets to track there)
  • If you want tracking for your work, then add an installer or tar.gz file explicitly for people to download. I know that I tend to click on explicit tarballs rather than ‘download source’ which is what the automatically generated links are labeled.
  • We can also use it for documentation .pdfs if we serve them from a release rather than a directory in the repo, e.g.
    https://github.com/stan-dev/docs/blob/master/docs/2_23/stan-users-guide-2_23.pdf
    would change to
    https://github.com/stan-dev/docs/releases/download/v2.23/stan-users-guide-2_23.pdf
    as linked from https://mc-stan.org/users/documentation/. So we would also have to do releases from documentation, there is only one now for 2.23.

thanks

Breck

2 Likes

Counts by minor release version:


Still debugging this.

2.14 is correct. Release notes make sense:

"CmdStan’s NUTS sampler in v2.9 - v2.13 was broken. See: Stan 2.10 through Stan 2.13 produce biased samples and Michael found the bug in Stan’s new sampler for more info.

In short, please update to v2.14 immediately."

There were 9,579 downloads until 2.14 which had 5,372 downloads so that event could be a way to estimate active users that download cmdStan. Lots of ways to look at this.

Breck

Thanks Breck.

Ah ok, that makes sense. That’s why this didn’t work previously when I tried to check how many times our R packages were installed via GitHub. We haven’t been uploading our own tarballs when tagging releases of the R packages, we’ve just been using the automatically generated files because that was sufficient for devtools::install_github().

Yeah we get really good stats on that, although it’s only for downloads via the RStudio CRAN mirror. There are many other CRAN downloads via other mirrors (there are lots) that we don’t have numbers for. But we definitely have a good lower bound by using the RStudio CRAN mirror downloads.