Barriers to New Developers

This thread is for collecting various hindrances that people have encountered when trying to contribute to Stan for the first time in order to motivate better introductory materials. These can include, but are not limited to, undocumented or poorly-documented patterns, confusing style choices, and the like.

As an example, here are some comments that @Lu.Zhang made after working through the code for the first time.

I think there are three important skills for one to be a good coder for Stan:

The first is to know where I can find the function I need in Stan math library;
How it is coded in Stan; and How to use it in oneā€™s own code. (Q1 & Q2)

The second is to understand functions specifically defined in Stan and learn how
to use them. (Q3 & Q4)

The third is to have a profound knowledge about programming in C++. (Q5 & Q6)

I made a small test out of the questions I had in the process of learning Stan code.
Hope this could help you to prepare for the interview.

  1. Whatā€™s the difference and relationship between the following codes:
    lib/eigen_3.2.9/Eigen/src/Cholesky/LLT.h
    lib/eigen_3.2.9/Eigen/src/Cholesky/LDLT.h
    lib/eigen_3.2.9/Eigen/src/Cholesky/LLT_MKL.h
    stan/math/prim/mat/fun/cholesky_decompose.hpp
    stan/math/prim/mat/fun/LDLT_factor.hpp

2.a) What is the most efficient way to calculate a multiplication of a lower-
triangular Cholesky factor with a vector? (No need to calculate determinant);

b)Try to list the least functions needed for checking error.

  1. When to use ā€œVectorViewMvtā€ and ā€œVectorViewā€?
  1. When to use ā€œtypedefs.hppā€? What is the usage of ā€œpartials_type.hppā€?

stan_math/fwd/mat/fun/typedefs.hpp
stan_math/fwd/scal/meta/partials_type.hpp

5.What is the function of ā€œdynamicā€ in the following definition of Matrix?
using Eigen::Dynamic;
Matrix<double,Dynamic,2> coord2(1000, 2)

6.Try to input two 1000 by 4 matrics A and B, and a 4 by 4 matrix C from a text file,
then use functions in stan library or any other available library to calculate: A + B *C^-1.

Were all the questions answered?

I think what we need is something like a high-level overview,
and maybe a case study that walks someone through all the
steps to contributing. Luā€™s questions are diving into specific
details, which would make most sense in a case study.

  • Bob

Are those supposed to be screening questions for somebody who wants to develop with Stan? Theyā€™re confusing.

I like the idea of having this thread. It would be fun at some point to teach a workshop on this to get more people into working with the core of Stan.

Some issues Iā€™ve noticed:

Lack of time (on our parts) to properly address design issues with potential contributions. Itā€™s hard over email/github to sort them out and new contributors often donā€™t understand what our priorities are in terms of efficiency and maintainability. Sebastian and I both had the issue where we wrote a bunch of code while in communication with Daniel/Bob (not pointing fingers, you guys were just the ones willing to correspond over design issues) but in the end a lot of code hard to be re-written anyway due to misunderstandings. It would be nice to flesh out some of our priorities in a doc for new developers. The things I can think of are:

  1. most of the time is spent in auto-diff so maintainability/UI should often win over efficiency in non auto-diff code.
  2. maintainability is critical so donā€™t try to future-proof by adding indirection
    3)ā€¦ I forget, I think Iā€™ve gotten used to the project so I lost perspective on theseā€¦

There are also some basic tasks that could use examples, for example:

  1. example of adding a function to stan-dev/math, I think we have this in some form
  2. example of exposing a function to stan-dev/stan
  3. example of adding a function to stan-dev/math, with analytic derivatives (In the next few weeks I want to add boostā€™s inverse gamma so Iā€™ll try to track the process and produce a simple example).
  4. pointers to tests that demonstrate how to call the stan-dev/stan API for algorithms, these donā€™t necessarily need to be separate examples since the tests Dan and I wrote on the refactor branch are pretty clear. It would help to have some of the objects available.
  5. maybe a generated C++ file for a Stan model with commentary (possibly just in comments) to clear up the ā€˜model conceptā€™ weā€™re using. This stuff didnā€™t really make sense to me until I tried to modify the code generator.

All these pieces make sense to me now but it used to look like a black art. I donā€™t think much of it really requires a deep understanding of C++, although writing a function with analytic gradients requires more than the rest.

1 Like

Krzysztof:

Feel free to write those docs!

I think we already have have everything youā€™re asking
for in bits and pieces, but not in the form of a coherent
document.

I thought all the code we currently have in Stan would
work as an example. Or do you mean something like a step-by-step
example? Itā€™s always hard to know when to stop. We might need
more than one of these docs for people at different levels of
understanding of Git, C++, matrix arithmetic, etc.

  • Bob

Also anyone can feel free to post something on the blog with this material. I think it will be of general interest, and not just for Stan people.
A

Sure. This is meant to be an inclusive thread aggregating confusions people have had when they started and they may very well lead to numerous docs to guide new developers into the project.

No, not screening questions. Just confusions/speed bumps new developers have had when they started looking at the code. The idea is to identify what should be communicated to new developers to make their transition as smooth as possible.

What I meant was a case study of how to write a particular
function. Like a video of all the steps involved.
Itā€™s always nice to see a concrete example in addition to
high-level pointers to Git doc and books on C++ template
programming.

  • Bob

That would be cool. Iā€™d sign up for (attending) that.

Iā€™m feeling the freedom! Just the time hasnā€™t come up yet.

Iā€™m thinking on the level of what it would take to get a committed undergrad with some coding experience and the required (minimum) math background to walk through tasks 1-4. For C++ concepts we can send people to basic books, for math we can ask that they know what a partial derivative is, but Iā€™m not expecting people to grok why the calculation of gradients is set up the way it is by reading the code. Instead Iā€™d like to point them to the right place to plug in the relevant partial derivative. Youā€™re right we have a lot of this doc (and the Stan autodiff paper explains it really well) but Iā€™d like to take the relevant pieces and put them into a worked example.

We basically did this on the Wiki for #1 and #2, Iā€™d like to polish those and find a way to integrate them into workshop-style material (maybe a github repo with Markdown/C++ examples, etcā€¦)

sakrejda Developer
December 16
Bob_Carpenter:
Feel free to write those docs!

Iā€™m feeling the freedom! Just the time hasnā€™t come up yet.

Bob_Carpenter:
I thought all the code we currently have in Stan would work as an example. Or do you mean something like a step-by-step example?

Iā€™m thinking on the level of what it would take to get a committed undergrad with some coding experience and the required (minimum) math background to walk through tasks 1-4. For C++ concepts we can send people to basic books, for math we can ask that they know what a partial derivative is, but Iā€™m not expecting people to grok why the calculation of gradients is set up the way it is by reading the code. Instead Iā€™d like to point them to the right place to plug in the relevant partial derivative. Youā€™re right we have a lot of this doc (and the Stan autodiff paper explains it really well) but Iā€™d like to take the relevant pieces and put them into a worked example.

We basically did this on the Wiki for #1 and #2, Iā€™d like to polish those and find a way to integrate them into workshop-style material (maybe a github repo with Markdown/C++ examples, etcā€¦)

That sounds great. I think youā€™ll find the undergrads
will vary in background among all the relevant factors.

Weā€™d
like to be able to enable people who havenā€™t used GitHub or
C++. Itā€™s a daunting learning curve and an example would be super
great.

What I think makes sense is

  • clone the repo and get the build working

  • create an issue, create a branch, add a test that fails,
    fix the bug, create a pull request

  • add a new templated function to Stan math

  • push a function through Stan (submodules) with doc (LaTeX)

  • add a new function with analytic partial derivatives

  • add a new distribution with all the template metaprogramming

Thereā€™s other contributions to make up at the language, interface
and algorithm level, but I think it makes sense to tackle the
above at the math and Stan lib level first.

I think we could do the same for dealing with the R and Python
interfaces. Or working up at a higher level, like adding a new
language feature.

  • Bob
2 Likes

Hey,

Iā€™m trying to get started myself and have had a few issues so far. My background - ā€œindustrial programmer,ā€ no prior professional C++ experience but have worked in 10 other languages.

Stuff Iā€™ve struggled with:

  1. Where should I be asking questions for this kind of stuff? Some orgs have an IRC room or something similar that serves as triage. Should be something you donā€™t need to get permission to post on. And some of the questions can be a bit embarrassing for a long lived forum post like this.
  2. Getting my dev environment set up. Spent the morning so far trying to configure emacs to understand where the header files for stan-dev/math live. Generally - how should I check out the various repos? Can I just clone one that clones the others as submodules? Where do compiler or IDE settings live for each? Maybe someone casually posting their emacs config would be useful as a starter.
  3. Build process - how do I know Iā€™ve entered a known-good build state, so that I can then make changes on top of that?

Currently spending a lot of time on #2, and given #1 Iā€™m not sure if this is the right place to also post asking for help with that :P

Thanks,
Sean

This or the users list is the right place, happy to help out as much as possibleā€¦ in the context of the large pile of questions that weā€™ve all asked I donā€™t think you could come up with anything all that embarrassing.

git clone the develop branch

Clone the one you are working on. If your really good with git submodules you might get away with cloning just CmdStan and then correctly handling submodules if youā€™re working with, e.g., stan-dev/math, but thatā€™s making it more complicated than it needs to be.

Thereā€™s a make subdirectory for compiler-related stuff, any IDE-related stuff ends up in a separate repo (?)

Sorry, I use vimā€¦ :)

Develop is always in a good build state.

Vim config also useful! I will use whichever one starts working more quickly - in emacs I use evil-mode anyway.

re: building - I meant literally ā€œwhat are the steps to build each of these repos?ā€ Iā€™m trying to add tests to the math repo and there are no make targets that seem to make sense for building, so I tried running runTests.py which I suspect will build as a side effect, but that command has been running for a long time and hasnā€™t finished yet so Iā€™m still not sure if it was the correct entry point.

w.r.t. compiler settings - I think I might understand now how the vendored libraries are included, but some brief description of why the project is laid out the way it is with respect to directory structure, which pieces are dependencies of which other pieces, how dependencies are included, that kind of thing would be very welcome. Even if a veteran C++ developer might understand some of these things by reading the source, we probably want to lower that barrier to entry.

sakrejda Developer
December 16
seantalts:
ā€¢ Where should I be asking questions for this kind of stuff? Some orgs have an IRC room or something similar that serves as triage. Should be something you donā€™t need to get permission to post on. And some of the questions can be a bit embarrassing for a long lived forum post like this.

No IRC because nobody wants to deal with a constant flow of
interruptions. (Well maybe somebody does, but I donā€™t.)

This or the users list is the right place, happy to help out as much as possibleā€¦ in the context of the large pile of questions that weā€™ve all asked I donā€™t think you could come up with anything all that embarrassing.

seantalts:
Generally - how should I check out the various repos?

git clone the develop branch

The developer process wiki explains our whole process. Itā€™s
on stan-dev/stan. This is the stuff weā€™re talking about
consolidating.

seantalts:
Can I just clone one that clones the others as submodules?

Clone the one you are working on. If your really good with git submodules you might get away with cloning just CmdStan and then correctly handling submodules if youā€™re working with, e.g., stan-dev/math, but thatā€™s making it more complicated than it needs to be.

I start with cmdstan, then from cmdstan you can run:

make stan-update

which will pull out the stan submodule. Then cd into
stan and run

make math-update

which will pull in the math lib.

Then donā€™t change those and donā€™t commit new submodule links
directlyā€”we handle those updates through automatically generated
pull requests from the continuous integration system when an
upstream module changes.

seantalts:
Where do compiler or IDE settings live for each?

Thereā€™s a make subdirectory for compiler-related stuff, any IDE-related stuff ends up in a separate repo (?)

We just have makefiles. Thereā€™s no IDE stuff?
Thereā€™s also no compiler-related stuff built into Stan. We
just use the system call or you can create a make.local to
override. We tend to use clang++ on the Mac from Xcode
with O=0 for optimization level 0 during builds and testing.

seantalts:
Maybe someone casually posting their emacs config would be useful as a starter.

Sorry, I use vimā€¦ :)

Mineā€™s out of date because I hate installing software.
Everything other than how to configure stan-mode is in
our Developer Tricks wiki.

seantalts:
ā€¢ Build process - how do I know Iā€™ve entered a known-good build state, so that I can then make changes on top of that?

Develop is always in a good build state.

We try. You can run the unit tests to make sure.

  • Bob

I donā€™t really have oneā€¦ every once in a while Daniel tells me I need to remove whitespace at the end of lines of C++ and I have to break out sed but ā€¦ I think you just need to wait for one of people whoā€™s more into IDEā€™s to answer.

The math library is header-only so you never build it as a shared library.

That does build and run tests, and you can pass it specific test files to run, but it doesnā€™t build a library.

Good point, one of us should really do thatā€¦

seantalts Developer
December 16
Vim config also useful! I will use whichever one starts working more quickly - in emacs I use evil-mode anyway.

re: building - I meant literally ā€œwhat are the steps to build each of these repos?ā€ Iā€™m trying to add tests to the math repo and there are no make targets that seem to make sense for building, so I tried running runTests.py which I suspect will build as a side effect, but that command has been running for a long time and hasnā€™t finished yet so Iā€™m still not sure if it was the correct entry point.

The math library is primarily header only. The only thing
that gets build in the sense of generating object code is the
ODE solver library.

You can do something like this:

make -j4 O=0 CC=clang++ test/unit

which will kick off all the unit tests using 4 cores using optimization
level 0 and the clang compiler.

w.r.t. compiler settings - I think I might understand now how the vendored libraries are included, but some brief description of why the project is laid out the way it is with respect to directory structure, which pieces are dependencies of which other pieces, how dependencies are included, that kind of thing would be very welcome.

All there on the wiki. Just not well organized. You
want to start from the top level Wiki page, which is a directory:

Youā€™ll see the layout of the repos there. If you click through
the process, youā€™ll get more detailed layout of directories.

The math lib has its own wiki:

Somewhere, thereā€™s a list of dependency order among scalar,
array and matrix, and primitive, reverse-mode, forward-mode
and mixed autodiff.

Even if a veteran C++ developer might understand some of these things by reading the source, we probably want to lower that barrier to entry.

Absolutely. Very high priority for the project, I think.

  • Bob

(add-to-list 'load-path ā€œ~/stan-contrib/stan-mode/ā€)

(defun java-mode-untabify ()
(save-excursion
(goto-char (point-min))
(if (search-forward ā€œtā€ nil t)
(untabify (1- (point)) (point-max))))
nil)

(add-hook ā€˜java-mode-hook
ā€™(lambda ()
(make-local-variable 'write-contents-hooks)
(add-hook 'write-contents-hooks 'java-mode-untabify)))

(add-hook ā€˜html-mode-hook
ā€™(lambda ()
(make-local-variable 'write-contents-hooks)
(add-hook 'write-contents-hooks 'java-mode-untabify)))

(add-hook ā€˜cpp-mode-hook
ā€™(lambda ()
(make-local-variable 'write-contents-hooks)
(add-hook 'write-contents-hooks 'java-mode-untabify)))

(add-hook ā€˜stan-mode-hook
ā€™(lambda ()
(make-local-variable 'write-contents-hooks)
(add-hook 'write-contents-hooks 'java-mode-untabify)))

(setq indent-tabs-mode nil)
(put 'downcase-region 'disabled nil)

(add-to-list 'load-path ā€œ/Users/carp/stan/src/StanMode/ā€)
(require 'stan-mode)

(setq x-select-enable-clipboard t)

(add-hook 'c+Ā±mode-hook
(lambda () (add-to-list 'write-file-functions 'delete-trailing-whitespace)))

The last line gets rid of the trailing whitespace. Thereā€™s
also code to use the StanMode, but mineā€™s out of date, Iā€™m sure.
Thereā€™s also the tab killing code in here.

  • Bob
2 Likes

sakrejda Developer
December 16
seantalts:
Vim config also useful! I will use whichever one starts working more quickly - in emacs I use evil-mode anyway.

I donā€™t really have oneā€¦ every once in a while Daniel tells me I need to remove whitespace at the end of lines of C++ and I have to break out sed but ā€¦ I think you just need to wait for one of people whoā€™s more into IDEā€™s to answer.

Run:

make cpplint

which runs Googleā€™s code style checker. Weā€™re down to zero errors.
Thereā€™s also reference to that and some exceptions in the Stan repo
wiki.

  • Bob