Monthly language dev meeting

Hey Bob, could you send me an invite?

Notes for Oct 2021 meeting

Design doc on deprecation

@WardBrian asked for volunteers to read his design doc on deprecation. If nobody expresses objections soon, we’ll go ahead and make this policy.

Type checking refactor

@WardBrian also asked for volunteers to review and think about a refactor of the type checker. The goal was to make it shorter and more readable. The PR reduces the size by 500 lines of code. The major changes are

  • the global symbol table is replaced with a functional map
  • the validation monad is removed
  • there are about 500 fewer lines of code

Hopefully, this will open the way for function overloading, tuple types, etc.

@WardBrian also volunteered to help update outstanding PRs for things like tuples and closures.

Function types in the langauge and the closure PR

@nhuurre has an outstanding PR for closures. I’d very much like to get this into the language. As it stands, it only implements closures, not lambdas. In order to have lambdas, we’ll need to introduce a syntax for functional types into the language. There’s a proposal in the closures design doc, but of course that’s not binding if there’s a better way to do it.

Apparently, functional types are already treated as first class objects in the type checker and the hang up is more on the parsing side. Closures are actually the more useful feature, but it’d be nice to get lambdas in that we can use in contexts where appropriate. I’m OK merging just the closures part of the closures design doc and not worrying about the lambdas yet.

TensorFlow back end

As things stand, there’s a lot of logic in stanc3 dedicated to generating TensorFlow back end code. It only supports a bare-bones set of a few handfuls of functions and is reportedly slower than the Stan math back end. The question came up as to whether we should continue to support it or move it out into its own module.

It’s not clear if anyone using it. This was mostly developed by @seantalts and @Adam_Haber, so it’d be interesting to hear from them on whether it’s still viable.

Even if it is being used, it was agreed that it’d be nice to move it out of the stanc3 repo so it’s not an ongoing drag on development. The consensus seemed to be that we try to break it off so that it could import stanc3 as a submodule for most of its functionality. Nobody thought it’d be worth fully factoring Stan math out of the parser, but it would be OK from the TensorFlow back end’s perspective with some extra stuff they didn’t need in the repo (there was general objection to splitting stanc3 into two repos for code generation and for parsing/type checking).

@Bob_Carpenter suggested all the cool kids were moving to JAX, so maybe we should target that instead of TensorFlow.

If we move things now, everything that works in 2.28 will continue to work going forward just by using the 2.28 release.

Transforms

We had a freewheeling discussion of variable transforms and how much users can be expected to know about underlying implementations. For example, should users be able to rely on <lower=0> constraints being implemented with log transforms or should that be considered an implementation detail? The conclusion seemed to be that it’s an implementation detail, but also critical for understanding how sampling was going to work. The Reference Manual does detail the transformations, but it’s not prescriptive in saying that they have to be of this form.

This is one of the motivations for user-defined constraints. For example, a user may prefer a soft-abs inverse transform to an exponential inverse transform for variables declared as <lower=0>.

We then realized that constraints and transforms are doing something different. It’s always been awkward for the doc and explaining Stan that constraints work to trigger transforms in the parameters block, but are just validation tests in other blocks. If we separate constraints and transforms, we could have constraints parameter variables act like other block variables. For instance, we might have

real<transform = softplus, lower = 0> a;
real<transform = exp, lower = 0> b;

This would open the gateway for us to have multiple forms of each transform for different computational purposes.

But this’d be a major breaking change in the language if <lower=0> doesn’t trigger a transform all by itself in a parameter’s declaration.

We decided that if users define potentially non-sensical transforms, one place to warn them would be with pedantic mode.

See you next meeting!

3 Likes

The November meeting will be Thursday 11 Nov, 2021. Let me know here or via email if you’re not on the list so far and would like a meeting invite. Thanks.

2 Likes

I sent out a standing Google Calendar invite with Zoom details. If you didn’t get an invite and would like to attend, please let me know and I’ll add you to the invite list.

There’s not going to be a language meeting today. Sorry about this—it’s my fault as I spaced after getting a booster shot yesterday. We’ll resume in January.

1 Like

The next language meeting is tomorrow, Thursday 12 January at 10 am NY time. It’s an open meeting and if you’d like to be on the Zoom invite list, let me know.

Here we are again. The next language meeting is this morning 10 am. Let me know if you’d like to be on the Google Calendar invite or if you’d like me to just send you a link to the Zoom for the meeting.

Summary of February 2022 Language Meeting

We spent this meeting talking with @mitzimorris and @spinkney about reporting project status. Sean’s updated the stan-dev org to use GitHub projects, which I understand are like the way Agile projects manage with index cards on a wall.

Sean thought we could do better in getting the word out about what’s going on in the project for a bunch of reasons: letting users know what’s coming, letting developers coordinating, informing the @SGB about what’s going on. He cited projects like Pyro’s use of Twitter, Medium blog posts, etc., to get the community excited.

We tried to work through what that would look like for a project our size and how we’d get the word out. Sean said starting with one feature is better than nothing, so I offered to write one feature up at the level he was asking for. Here goes.

Complex Number Support

The current status of Stan’s complex number support is that it works for complex scalars at the level of the language and math library. And at the level of arrays. We’ve added support for covariant typing, meaning that we can now assign int to real to complex, and that also works for arrays. We now need to add support for vectors and matrices, which requires work in the parser and code generator and also in the math library for polymorphic arithmetic (e.g., multiplying a real and complex matrix) and covariant typing. There is work to do on extending Eigen’s low-level BLAS level of functions to support. We could also use an actual complex-number based application for our user’s guide. After we get vectors and matrices, we want to add support for complex linear algebra (e.g., Schur decomposition and asymmetric eigendecomposition), and for fast Fourier transforms. Pretty much every single scalar and arithmetic funciton could be productively specialized for complex numbers. It’d be nice to think about the equivalent of var_matrix on the real side for the complex case. There is remaining work to do to generate actual complex numbers in the Python, R, and other interfaces rather than just returning real and complex components. There is a design document and issues and branches in progress in the language lib, the math library, and the interfaces.

Other language features

I also mentioned there are about 20 different features we could list this way that require work across the language, math, and interface repos. In no particular order:

  1. tuples
  2. closures
  3. lambdas
  4. 64-bit integers
  5. provide sample back end implementation in JAX
  6. ragged arrays
  7. sparse matrices
  8. regression syntax a la BRMS
  9. comprehensions a la Python (for efficient GPs and static var_matrix optimization)
  10. parallel loop constructs and more general map/reduce functions
  11. new simple types (sum-to-zero, probability, etc.)
  12. orthonormal matrix type for circular/hyperspherical/rotation stats (e.g., Stiefel and Grassmanian manifold types)
  13. vectorized truncation (this one’s relatively easy compared to the others)
  14. differential algebraic equation syntax and solver
  15. stacked/composed transforms, user-defined transforms
  16. user plug-ins at the C++ level
  17. user-defined gradients, including integration with ODE solvers
  18. integer and complex output types in language/interfaces
  19. expose transforms to users; allow Jacobian adjustment in transformed parameters
  20. more code transform and gen optimizations in compiler
9 Likes

Thank’s for putting the list together.

I’ve been working a lot on the complex container support in the language, so I’ve made a ‘project’ to track where things are at the moment. It would be worthwhile to create issues in stan-dev/math for the special functions (FFT, etc.) and function specialization we will need for complex matricies.

I’m curious how using a project will feel for this task, so it’s more of an experiment.

Here’s the link: https://github.com/orgs/stan-dev/projects/3

1 Like

I really like the way the projects page looks. I don’t know how you’d do this for much bigger projects with more complex interdependencies. I was expecting something that looked more like a PERT chart (graph of issue dependencies) and less of a progress report in terms of started/ongoing/finished.

I think it would be nicer if there was more of a way to express direct dependence (this issue is a prerequisite for this other issue, etc). But, for a big picture view I think it’s pretty nice. It also does some things semi-intelligently, such as if you close an issue or merge a PR it automatically gets moved to the ‘Done’ column etc.

Thanks @Bob_Carpenter for the nice summary and everyone in the meeting taking the time to discuss project status updates.

The goal will be to give updates during release cycles of a few features/projects. It can be very high level and concise (or more detailed if you prefer). The goal is to get people excited about Stan, about what’s being worked on, and the people to tag if you want more info on that project. We’ll also experiment with Github projects as a way to track things more consistently and a place anyone can go to see the status, see the current issues, ask questions, and contribute (hopefully!). Thanks @WardBrian for putting an example of projects together!

I spoke with @rok_cesnovar about adding a future/preview note in the 2.29 release notes. The complex number brief you gave will go in (thanks again!). I’m looking for one or two more, maybe tuples (lots of work has been done recently), varmat (it’s partially completed, what’s left?), pathfinder, closures, robust VI, and/or cmdstan I/O? I’m just calling out a few but any project can be added. I’ll tag some relevant developers @rybern @mitzimorris @WardBrian @wds15 @nhuurre @stevebronder @bbbales2 @andrjohns @Dashadower @hyunji.moon @yizhang (these names come to mind, if I left anyone off who contributed it was unintentionally).

I personally wouldn’t say much about tuples. I put in a lot of time yes, but it was primarily just getting Ryan’s PR back up to date with the current version of the compiler. Still a lot of work left there, more than a lot of the rest of that list I think

I think that still counts. It would be

  • Tuples PR is up to date with Stan 2.29 and work is progressing on adding this into Stan soon (or whatever release you’re targeting)

I hope this shows how brief updates can be.

1 Like

I’ve got three that are candidates:

  • Quantile functions
  • Packaging cmdstan for linux
  • User-defined gradients

Let me know if any those would be good to add and I’ll write up the brief notes

4 Likes

How could I forget qf! Yes that’s also a great project to track in GitHub projects. I think it would have issues for all the pFq stuff, and encompass some of what’s in @bgoodri quantile functions design doc

1 Like

As the release is soon, I’m putting together the preview notes. I made a fun name, I don’t care if that’s changed but propose something else if you don’t like it. I would like to add a point of contact to each of these as well if there is one. Tag yourself if you’d like to be that person. @Bob_Carpenter I broke out your paragraph into bullet points so people can more easily scan what’s of interest for them, let me know if it all looks ok.

I need help in filling out the rest from @andrjohns, @rok_cesnovar, @WardBrian, @Bob_Carpenter

Stan Saplings: A Preview of Projects in Development

Below is a partial list of the many exciting projects currently being worked on in Stan. Stan is maintained and developed by volunteers, if any of these projects are of interest to you, please come join us in building the next version of Stan (Github, Discourse forums, Twitter)!

Complex Number Support (contact Brian Ward)

Stan’s complex number support works for complex scalars and arrays at the level of the language and math library! We’ve also added support for covariant typing, meaning that we can now assign int to real to complex , and that also works for arrays.

To do

  • Add support for vectors and matrices, which requires work in the parser and code generator and also in the math library for polymorphic arithmetic (e.g., multiplying a real and complex matrix) and covariant typing.
  • There is work to do on extending the C++ Eigen library to low-level BLAS level of functions to support.
  • We could also use an actual complex-number based application for our user’s guide.
  • After we get vectors and matrices, we want to add support for complex linear algebra (e.g., Schur decomposition and asymmetric eigendecomposition), and for fast Fourier transforms. Pretty much every single scalar and arithmetic funciton could be productively specialized for complex numbers. It’d be nice to think about the equivalent of var_matrix on the real side for the complex case.
  • There is remaining work to do to generate actual complex numbers in the Python, R, and other interfaces rather than just returning real and complex components.
  • There is a design document and issues and branches in progress in the language lib, the math library, and the interfaces.

Tuples in Stan (contact Brian Ward)

  • Tuples PR is up to date with Stan 2.29 and work is progressing on getting them into Stan

To do

  • Lots of IO work
  • C++ handling of tuples
  • Updating docs for tuples
  • Testing

Quantile functions (contact Andrew Johnson)

To do

  • Implement more foundational functions: inverses of the gamma_p and gamma_q functions

Packaging cmdstan for linux (contact Andrew Johnson)

  • Introducing flags to link Math headers against user-specified dependencies (rather than those provided, PR)

To-Do

  • Implement same approach for Stan headers
  • Add compiler flag to default these paths to system locations

User-defined gradients (contact Andrew Johnson)

Very early stages. Prototype framework for using gradient functions is being tested in this Math library PR

To-Do

  • Expand testing across multiple types of function inputs and outputs
  • Decide the user interface for the Stan language
1 Like

I’m happy to be listed as the point of contact for tuples and the complex containers

If you want to describe more of what’s ahead for Tuples, there is a lot of work to be done on IO and the C++ handling of tuple values, followed by a whole lot of testing and doc writing.

1 Like

I’ll update the post above with what people add. Are we targeting the 2.29 release tomorrow or Wed.?

The Github releases will happen today (Nic is starting with Math), but the blog post will most likely go out tomorrow after we double check that everything is in place.

3 Likes