Helpful function repository

There are plenty of great UDFs that the community has written that exist on this forum and the old google forum which may never make it into stan-math. Stemming from a discussion with @bbbales2 about having a repo that users could upload their code to. The proposal is to have an informal repo with some directory structure and would only require syntactically valid stan code (a separate directory for valid c++ code), a copywrite disclosure, and comments explaining the function or example usage of a fully fledged stan program just fully commented out. This is just a proposal and opening a discussion with the community, @SGB, and the @Stan_Development_Team .

My tentative proposal (ie open to change) is to have a github repo with the following structure. There could be a separate directory for c++ code and stan code. The biggest issue is how to organize this stuff. I don’t think there will be one best way to do it. The difficulty is that some functions link together, some are for constraints, some are for matrix algebra, pdfs, ect and the proposed structure may not fit easily into how the UDF works. Other issues:

  • What to do if UDFs should be included together, especially if they are distributed among different directories?
  • Someone would have to do some maintenance, such as PRs and double checking that the PR is in the right folder, etc.

Even with those issues, I think it would be nice to have a place where these reside other than the forums.

  • Stan

    • array
    • real
    • matrix/vector
    • ode
    • algebra
    • probability
      • lpdf
      • lcdf
      • rng
  • cpp (same sub structure as Stan)

4 Likes

The idea has been around for a long time. The standing issue with it is testing. How is that done and can that even be integrated with our Jenkins CI?

If something with tests can be worked out that would be great… if not, then this is still better than nothing would be my thinking.

1 Like

Some type of automatic testing would be great. Though I was thinking it would just be a storage area and use at your own risk type thing. Put the onus on the person adding the function to make sure that it works and there’s sufficient documentation to understand what it does.

I think it could be useful. Wouldn’t a simple approach be to include only full Stan models including the functions? (and documentation)

This would mean:

  • Somewhat clear minimal requirements for documentation (included in a sensible Stan model + commented)
  • Amenable to CI (can check that all .stan files compile).

If multiple functions are used together, those should be in the same model. This will introduce some duplicates, but will greatly reduce risks of breaking somebody else’s function by updating yours.

This could probably then extend to .cpp functions as you would accept FUNCTION_NAME.cpp and FUNCTION_NAME.stan and then try to compile the model with the .cpp file included.

But don’t want to force this, just an idea.

As for categorization, I don’t think categorization by type of input/output is sensible. Rather I would categorize by use case (e.g. ode, linear_algebra, geometry, …). Similarly, I don’t think separating lpdf, lcdf and rng is sensible. In most cases you would probalby want a single file with all those functions…

Best of luck in moving this forward… (and really, feel free to ignore my suggestions if you/others don’t find them convincing)

1 Like

Great suggestions. I’m ok with having more domain specific categorizations. My initial take on keeping it as input/output types was that this was the most simple. In terms of having a function be both linear_algebra and geometry, it can be a judgement call which it should go in. But if this is more useful for people then let’s do it!

Maybe another option in that direction is to have a package/module system inside Stan, so that one can do package_name::function_name(...). Tho I know absolutely nothing about how such a thing works or should be designed.

1 Like

I like this. At a bare minimum, just packages/namespaces/tests for functions. I like an extra-mile solution here – if we’re trying to make a way for people to share code, that’s a pretty big deal, so we should go all out.

Bob mentioned that there’s an outstanding debate on whether functions in such packages would be allowed to do things like define parameters. But this I’d like to step around by just doing packages of functions that behave like our UDFs now (but maybe with overloading and namespaces).

The simplest thing I know are Python packages. Like, you make a folder and put __init__.py in it and you have a package.

I think a package system would make Stan code easier to read, but it’d also be an opportunity to add tests for Stan code. I guess that would involve having compile-able Stan files that are not models.

It would be cool to make it easy to write and test Stan packages. Like bin/test mypackage and it runs all the tests in the package. Or bin/test mypackage/tests/test1.stan mypackage/tests/test1.stan and it runs the tests in those files.

Packages installing themselves with the interfaces would be cool too – like not having to put package folders in the right place manually or anything, or being able to install from interpreter of choice.

I really like how devtools makes it easy to install packages from Github.

I’m down to spend some time discussing this. It’s too big an issue to really hammer out quickly. There are a ton of practical critiques of this too (namely, it would be a huge effort), but I don’t know how to make progress on this one unless we start iterating.

Something I don’t like is C++. That stuff is just rough all around. A formal foreign function interface is a different kind of beast, I think.

1 Like

Oh yeah, there’s a design-doc repo that takes pulls: GitHub - stan-dev/design-docs which is our current mechanism for this sorta thing.

While having a package system would be certainly desirable, I wouldn’t make the perfect the enemy of the good. A repo of functions is an improvement over current state and could potentially pave way for packages while being substantially lower-effort. I am also not sure we actually have a good idea what kind of functions people would like to reuse and share…

Stan can compile standalone functions files since 2.16 (unless the code broke recently) Standalone compilation of functions block · Issue #2267 · stan-dev/stan · GitHub althoguh I don’t think the feature has been used much and I am not sure the feature get eventually fully integrated in the interfaces though (e.g. Integrate standalone function compilation with RStan · Issue #422 · stan-dev/rstan · GitHub is still open).

For some even broader context: There was a discussion of “modules” (which is more than a functions library, really more of a reusable library of submodels) here: Stan++/Stan3 Preliminary Design (AFAIK the discussion didn’t really end with a conclusion and I remember not being a particularly helpful element of the discussion myself)

Libraries/packages aren’t perfect. Also we’re getting at two related, but different things here. So I don’t want to line them up on a good/bad axis.

Packaging defines a unit of testable Stan code. A functions repository is acting like a central place to find tested, useful code.

A repository analogy to packages is probably something more like Cran/pip/npm.

  1. Difficult to write functions

    GP basis functions like this. Other basis functions things well.

  2. Useful code that hasn’t been added to Math yet.

    Matrix normal likelihood, interpolation, tridiagonal solve

  3. Useful code that probably won’t ever be put in Math

    As an example, fixed timestep ODE solvers

  4. Internal libraries

    Functions that are used across multiple models, for example, by one person or a team. Right now the solution is copy-paste or #include. #include is better that copy-paste, but it’s still not tested functions (unless you add some separate expose_stan_functions tests) and doesn’t make it easy to share code between different project folders or between different people.

Ooooo, good link.

Thanks for the reference. I need to go read that again.

1 Like

Is there a way to precompile a function then compile a model which can use the precompiled version? I’m just thinking for all of us lazy people who would just clone the repo and #include but don’t want it to take forever to compile to use a few functions.

This would also solve my issue of having functions that relate to each other across different directories.

This would be something like a distributable library. Not that I know of.