Tracking functions for user export

Hi!

It appears to me that we do not have a good way at the moment to track that things put into math are also exposed and documented to users. Some examples:

  • integrate_ode_adams was exposed for years, but not documented (solved now)
  • GP kernels were exposed for a while, but not documented
  • kinsol_solver was around for a year and just very recently exposed and documented thanks to @charlesm93
  • algebra_solver_fp is not exposed nor documented and there isn’t even an issue for this
  • integrate_dae is also lacking being exposed and being documented

Putting this stuff into Stan-math was surely a ton of work for the person writing it and for the persons reviewing it. In addition this stuff creates maintenance work going forward. As such we should really make an effort to expose this to the Stan language.

I am bringing this up to improve the situation and call for suggestions. Here are some ideas:

  1. have a convenient way to track the status of stan language expose and documentation being available - for any new function which is put in math and should be exposed we should add this function to such a list. All old functions as above should be put there as well. This overview should also have a date indicating when it was added to math.

  2. Functions which never make it to the Stan language - and this is controversial, I know - will be removed ultimately from Stan-math in order to avoid the maintenance burden. Removal will only happen if in-activity on the matter has been long enough and no additional time has been requested. Things can always be moved along, but there should be a mechanism to automatically follow-up on this stuff.

@SGB, @syclik as FYI.

If too much stuff is put into Stan-math which is never exposed, then that is very obvious overhead for the project which we should try to get rid of to not harm the project in the long run.

Sebastian

3 Likes

Thanks for bringing this up @wds15. It seems like there are two related issues here:

(A) some things do get exposed to the language but aren’t documented

(B) some things never get exposed to the language (or get exposed but with substantial delay

Is that right?

For (A), how does this happen? Is it typically that reviewers just sometimes forget to insist on documentation when a new function from math is exposed to the language? Currently the submission checklist for a PR to stan-dev/stan is just:

  • [ ] Run unit tests: ./runTests.py src/test/unit
  • [ ] Run cpplint: make cpplint
  • [ ] Declare copyright holder and open-source license: see below

but maybe we need to add something about documentation to the checklist? That wouldn’t guarantee anything but maybe it would help?

For (B), I don’t have any brilliant idea off the top of my head but Sebastian’s suggestions seem reasonable. Before bringing something like this to the SGB I’d like to hear from other people who work on stan-dev/math and stan-dev/stan. Does anyone have different suggestions? And can anyone share more of the backstory behind why some things never make it from math to the language? Do people forget? Or is there no one who wants to do the work? What causes someone to go through the trouble of implementing something in math but not seeing it through to the stan language? As someone who doesn’t work on the math library I’m just curious about what the underlying issue is here. Thanks!

For (a) I think the main reason is that stanc3 PRs did not enforce “new function PRs” to have matching PRs in stan-dev/docs approved before they are merged.
We should think about that.

There is a growing list of docs issues for function docs flagged for 2.24: https://github.com/stan-dev/docs/issues?q=is%3Aopen+is%3Aissue+milestone%3A2.24

For (b) there are probably multiple reasons:

  • it used to be quite some work to add a function in stanc but that reason is gone now. There is some Math functions from that era though.

  • Math devs are not familiar with Ocaml/stanc3 codebase - need stanc3 dev help or some easy to follow instructions (I am guilty of promising @mcol I would add some of his functions to stanc3, but that got delayed for a release cycle because I simply forgot).

  • Math devs not interested in exposing (Math is its own entity - so there is that chance)

I honestly think proper issue creating with correct milestones would go a long way here.

I am however against removing something just because its not exposed to the Stan language. If there are other reasons (broken, obsolete because other stuff works better, etc.) then yes.

2 Likes

I think the best solution will be to merge stan-dev/math, stan-dev/stan, stan-dev/docs, and stan-dev/stanc3 and then require all changes to come with matching doc. That’s what we used to do before the great fragmentation. Short of that, the only solution’s to require either code or an issue in the related repos.

I’m in the not-familiar-with-OCaml camp myself, as are most of our devs. It used to be a very easy C++ tweak to add new functions. Maybe it still is in OCaml and I just don’t know it.

That’d be great. But most of our repos do not have milestones, or the milestones they have are years in the past. We’ve always just come up to releases and swept all the milestones under the rug. That’s even more true now with quarterly scheduled releases rather than feature push based releases.

1 Like

Hi!

Yes, A and B happen.

Reasons outlined from @rok_cesnovar are accurate.

How about the following idea: Whenever a function is added to Stan-math which is intended for users, then the PR will only get merged into Stan-math if a matching issue for the stanc3 parser is created. That ticket should be filed under a specific tag so that we can track these easily and the stanc3 issue should refer to the Stan-math PR which implements this function. This way we would be able to track things and not get buried in our code-base. This stanc3 issue can then serve as track record for the feature. Ideally it is structured a bit so that we can automatically parse these issue to get an overview of floating around issues easily.

Solving point A (no doc, but in the language) can be addressed by enforcing doc for a user exposed function upon merging things into stanc3.

For things which live a long time in math, but do not get exposed I think removal after a long period (to be defined what that is) of inactivity seems reasonable to me - at least we should have an agreement here.

It did look to me that some functions end up in math for the purpose of someones research project. That is one hand fine if that is done openly as that given that Stan Math is funded through research projects, but currently this lacks transparency which should be there given that any code in itself is maintenance cost for the project going forward. Another thing are ongoing things like higher order autodiff. Everyone agreed that it should stay there in math as it will eventually get to production grade. Similar things happened in the Stan repository to my knowledge (not every sampler is exposed to users). As long as things are transparent and there is agreement on it - great. It’s just that sometimes doing your laundry helps to keep things in order.

It does seem like the separate repos are causing some headaches. Is merging them together the same thing as the “monorepo” project that @alashworth was working on?

That seems reasonable to me, at least while they don’t share the same issue tracker (which I guess would change as a consequence of what Bob is suggesting).

What do other people think?

Yes, and also what I proposed in the refactor design doc. Once we finish the rest of the CI refactor (stanc3 test in stan, stanc3 as a submodule, etc) we can do that if people are on board. I would personally start with math to stan merge and then see how that goes and reevaluate then.

My personal opinion is that math/stan/cmdstan & docs should be a monorepo. But its definitely not a small change.

But I also dobt want to seem like I am pushing people to make this change. I know I am still relatively new to the project.

1 Like

Not sure. Your dev time spend on the project is competitive would be my guess. Don’t be shy is what I am saying.

Certainly in terms of amount of work done!

Yes to this. Let name the tag “new function” or something. I would also add the docs issue as a requirement. But the docs issue should be required for stanc3 PR, so maybe not. This doesnt hurt anyone so I think we should just start doing this.

Tagging the stanc3 issue for the next release would be advised also. As long as we dont have a few pages worth of issues tagging for releases feels like a nice way of prioritizing.

2 Likes

Is there a stanc3 equivalent of Contributing New Functions to Stan · stan-dev/stan Wiki · GitHub ? There doesn’t seem to be much doc around stanc3.

If you scroll far enough down https://github.com/stan-dev/stanc3/blob/master/src/middle/Stan_math_signatures.ml it seems pretty easy to add ordinary functions and then fairly opaque for higher order functions.

Sadly, no. This is on my TODO (both simple and HOF - instructions for novices that are not necessarily interested in digging deep in stanc3/ocaml) but my list is pretty crowded atm, so cant promise it will be done soon.

Its really easy to add new functions.

Example PR for some basic functions:

Expose Lambert W functions by Christopher22 · Pull Request #572 · stan-dev/stanc3 · GitHub or Feature/452 Add container construction utilities by rok-cesnovar · Pull Request #550 · stan-dev/stanc3 · GitHub

And for HOF: Feature/issue 311 newton solver by charlesm93 · Pull Request #575 · stan-dev/stanc3 · GitHub

Non-HOF is basically oneliners and a bunch of tests. Personally, I found 90% of the effort was making test Stan models.

Me and Sean once did a Twitch session on how to do some basic stuff with stanc3. But I was too useless with stanc3 at that time so that is probably not worth anyones time.

I don’t know what the scope of that project was in terms of which repos to merge, but it’s the same idea.

The problems include history, issues, and PRs.

I just wanted to bring up the recent consensus at the Stan meeting from last Thursday, July 9th 2020.

So in short we all agreed that the intent to limit dead code is a good one. To move forward on the matter we will

  • Require that PRs to Stan-math can only be approved with an already created stanc3 issue whenever the function is intended to be user-facing. This is a very simple thing to do and will allow us to track things.

  • The stanc3 repo already requires user-facing doc before things are being merged.

  • It’s yet an open question as to what we do with things in Stan-math which never get exposed even though the function is deemed “user” facing. This will be discussed at one of the coming Stan-math meetings (tagging @syclik) to make a decision on this. The intent is to avoid dead code in Stan-math which obviously is a burden to maintain as we go forward.

  • In any case there shall be transparency in terms of what is going on. As Stan is also used as a research platform it can be sometimes definitely an intended thing to have things float around in the sources just for the purpose of doing research (which hopefully funds Stan). It’s just that this is ideally done in a transparent way and with community awareness at least, ideally agreement.

I hope that is a good summary of the discussions at the last Stan meeting and makes sense to those who were not present.

2 Likes

This is a bit topic-tangential, but: what are the best practices for “should I add this function to Stan?”. What I am saying is, I would like to add a function and expose it through Stan for my use cases (it is a Bayesian Holiday Model function), but I don’t know how “useful” it would be to the community.

Typically I would fork a project and add my bespoke function there and just use my fork (a bit of a pain to keep the base current, but doable). This doesn’t seem to be the practice in Stan though. Is the idea “the more the merrier, everyone put your functions in?”.

If your function is likely useful for a bigger set of users, then a contribution is always welcome… maybe you can generalize things a bit to make it even broader useful?

The thing with integrating it with Stan is that it’s going to be reviewed, you need tests, C++ doc, convince some OCaml Guru to help you with the parser and finally also write doc for it (user manual & reference).

It takes time, effort and good teamplay to get it done…but it’s worth the effort.

You should start with an issue for your function and call for feedback. Maybe run this by the Stan gathering meeting to gauge the popularity of the function? That’s a hard question in all honesty, but just try the channels we have.

3 Likes

Okay great. Thanks for this!