A Deprecation Schedule for Stan

This was discussed at today’s Stan: The Gathering meeting and is the first of what may end up being many discussions.

The motivating question here is “When can we remove a deprecated feature from Stan?”. Thus far in the project, the answer has been “in the next major version,” but with Stan 3.0 seemingly always a bit in the distance, and the project’s minor version number well into the twenties, I think it is worth considering other options.

As prior art, consider the NumPy Python Package. They’re also on a very high minor version (latest release: 1.21) and a similar category of tool (broadly numeric/scientific computing) but allow minor deprecations to be removed in minor versions following a schedule. Their policy is a feature must have been deprecated for 2 minor versions or 12 calendar months, whichever is longer, before it is a candidate for removal.

So, what would this entail for Stan? This is a lot of what the discussion today covered, but to summarize some thoughts (I didn’t write down who all said what, my apologies):

  • A ‘minor’ deprecation is one that can be fixed in an obvious and fast way. For example, replacing <- with =. We even extend this to things like the new array syntax, as while this would include most existing Stan programs, it can be done for the user automatically using stanc3.
  • All deprecation warnings in the language would need to contain 1) information on what should be done instead and 2) a version number or calendar date after which the feature will be removed.
  • The schedule for deprecations would need to be decided. Because this policy did not exist prior, it would be good to have a decently long lead time before the first backwards-incompatible version. I propose 3 minor versions/1 year, so something deprecated in 2.28 could earliest be removed in 2.31
  • This should be clearly documented, and any program which generates a deprecation warning should output a link to this documentation.
  • There needs to be a unified decision process on what gets deprecated, as every deprecation should be tied directly to an improvement to the language.
  • The documentation and example models will be updated after a feature is marked as deprecated, but before it is removed, to reflect the newest and best practices.

Why do this? A few reasons.

  • The Stan language has evolved a lot since the last major version number change (things like <- have been deprecated for 5+ years)
  • We would like the freedom to continue evolving the language without accruing unmanageable code or increasingly large specifications. In particular, this policy would allow the phased-in introduction of new reserved keywords to the language: a word like ‘offset’ would first emit warnings, then later be reserved. This dramatically improves our ability to output warning messages and keep the language specification and implementation tidy, while still adding new features.

This is obviously all in the very early phases of being thought out, and more thoughts and feedback would be great! There are a few things that would need to happen on the development side before anyone could even think about implementing this policy, so don’t worry things are going to start breaking tomorrow; the goal is to make Stan better, not worse

8 Likes

We need something like this. Stan3 was always there around the corner, yet its not landed…so a more principled approach with small steps makes a lot of sense!

I am fully on board with this! I did not realize this was even an option. The stuff we had to do for offset, array etc are really tough to maintain going forward.

Is there any appetite for a long term support that get backports of bug fixes? I’m looking at the Julia release schedule, which I’ve copied below.

  • The master branch: where all feature development happens, and where most bug fixes are made; and eventually, when we start working on 2.0 , where breaking changes will be made as well.
  • The unstable release branch (currently release-1.3 ): the release branch that is feature frozen but where active bug fixing and performance work is still happening prior to the next minor release (i.e. 1.3.0 ). Typically bug fixes are done on master and then backported to this branch. There isn’t always an unstable release branch: it only exists after a feature freeze but before the corresponding release; after that it becomes the stable release branch and there is no unstable release branch until the next feature freeze.
  • The stable release branch (currently release-1.2 ): the release branch of the most recently released minor (or major) version. This always exists and gets all applicable bug fixes backported to it from master . Future bug fix releases of the minor version will be made from this branch (e.g. 1.2.1 ).
  • A long term support (LTS) branch (currently release-1.0 ): an older release branch that will continue to get applicable bug fixes for as long as it continues to be the LTS branch. Extra effort is made to backport bug fixes to this branch—it may get its own versions of bug fixes as necessary when a later fix doesn’t apply cleanly.

That way we could introduce breaking changes and let users know that certain bugs will still get fixed in older versions for some time. This probably isn’t the exact model for Stan but something like a bleeding edge (daily version in GitHub not currently released), stable (current release of Stan), and LTS (doesn’t exist yet, could exist if we deprecate some stuff).

It increase the maintenance burden but frees up more wild changes that may be unstable.

I think that is one option, but it (in my opinion) only makes sense if you’re making larger breaking changes than the original post has in mind. I think a slower approach with less breaking changes means 1) you have a much stronger chance to spot bugs before the next version which would have a breaking change and 2) even if a bugfix is only available in a version which isn’t 100% compatible with your code, the update process should always be very smooth (using stanc3 or just a find-and-replace).

The idea of an LTS version is interesting, but I would want to do some historical research into bugs that stan has had to figure out how quickly they’re spotted and where they tend to crop up. If I had to purely guess, my first instinct would be there are probably slightly more bugs that crop up on the C++ side of things than in the compiler itself, so a bugfix version of cmdstan could use an older version of stanc3 (unmodified) with a patched library, for example.

I think it would also be important to be honest about the amount of dev time we can dedicate to LTS maintenance, and if we have devs willing to do that. In particular, splitting the language into 4 different branches like Julia would also mean that all the interfaces to Stan would also likely have to split up their work accordingly

Related to this, https://github.com/stan-dev/stanc3/pull/894 was just merged (thanks @nhuurre!), meaning --auto-format and --print-canonical flags to stanc3 will preserve comments. This means it is much more viable to use as both an auto formatter and, in relation to this issue, a mechanical deprecation updater for things like the array syntax changes.

2 Likes

To be concrete with a proposal, I think a reasonable schedule is 3 minor versions/1 year. Something deprecated in 2.30 could be removed at the earliest in 2.33, assuming the current release schedule of ~4 months.

I’m planning on writing up a design doc soon with a formalized version of this proposal. I’d love more comments before I get to that stage on things people thing should/should not be included!

1 Like

The formal proposal can be found here

2 Likes

The design doc was accepted as part of yesterday’s Language Meeting

3 Likes