Barriers to New Developers


#1

This thread is for collecting various hindrances that people have encountered when trying to contribute to Stan for the first time in order to motivate better introductory materials. These can include, but are not limited to, undocumented or poorly-documented patterns, confusing style choices, and the like.

As an example, here are some comments that @Lu.Zhang made after working through the code for the first time.

I think there are three important skills for one to be a good coder for Stan:

The first is to know where I can find the function I need in Stan math library;
How it is coded in Stan; and How to use it in one’s own code. (Q1 & Q2)

The second is to understand functions specifically defined in Stan and learn how
to use them. (Q3 & Q4)

The third is to have a profound knowledge about programming in C++. (Q5 & Q6)

I made a small test out of the questions I had in the process of learning Stan code.
Hope this could help you to prepare for the interview.

  1. What’s the difference and relationship between the following codes:
    lib/eigen_3.2.9/Eigen/src/Cholesky/LLT.h
    lib/eigen_3.2.9/Eigen/src/Cholesky/LDLT.h
    lib/eigen_3.2.9/Eigen/src/Cholesky/LLT_MKL.h
    stan/math/prim/mat/fun/cholesky_decompose.hpp
    stan/math/prim/mat/fun/LDLT_factor.hpp

2.a) What is the most efficient way to calculate a multiplication of a lower-
triangular Cholesky factor with a vector? (No need to calculate determinant);

b)Try to list the least functions needed for checking error.

  1. When to use “VectorViewMvt” and “VectorView”?
  1. When to use “typedefs.hpp”? What is the usage of “partials_type.hpp”?

stan_math/fwd/mat/fun/typedefs.hpp
stan_math/fwd/scal/meta/partials_type.hpp

5.What is the function of “dynamic” in the following definition of Matrix?
using Eigen::Dynamic;
Matrix<double,Dynamic,2> coord2(1000, 2)

6.Try to input two 1000 by 4 matrics A and B, and a 4 by 4 matrix C from a text file,
then use functions in stan library or any other available library to calculate: A + B *C^-1.


#2

Were all the questions answered?

I think what we need is something like a high-level overview,
and maybe a case study that walks someone through all the
steps to contributing. Lu’s questions are diving into specific
details, which would make most sense in a case study.

  • Bob

#3

Are those supposed to be screening questions for somebody who wants to develop with Stan? They’re confusing.


#4

I like the idea of having this thread. It would be fun at some point to teach a workshop on this to get more people into working with the core of Stan.

Some issues I’ve noticed:

Lack of time (on our parts) to properly address design issues with potential contributions. It’s hard over email/github to sort them out and new contributors often don’t understand what our priorities are in terms of efficiency and maintainability. Sebastian and I both had the issue where we wrote a bunch of code while in communication with Daniel/Bob (not pointing fingers, you guys were just the ones willing to correspond over design issues) but in the end a lot of code hard to be re-written anyway due to misunderstandings. It would be nice to flesh out some of our priorities in a doc for new developers. The things I can think of are:

  1. most of the time is spent in auto-diff so maintainability/UI should often win over efficiency in non auto-diff code.
  2. maintainability is critical so don’t try to future-proof by adding indirection
    3)… I forget, I think I’ve gotten used to the project so I lost perspective on these…

There are also some basic tasks that could use examples, for example:

  1. example of adding a function to stan-dev/math, I think we have this in some form
  2. example of exposing a function to stan-dev/stan
  3. example of adding a function to stan-dev/math, with analytic derivatives (In the next few weeks I want to add boost’s inverse gamma so I’ll try to track the process and produce a simple example).
  4. pointers to tests that demonstrate how to call the stan-dev/stan API for algorithms, these don’t necessarily need to be separate examples since the tests Dan and I wrote on the refactor branch are pretty clear. It would help to have some of the objects available.
  5. maybe a generated C++ file for a Stan model with commentary (possibly just in comments) to clear up the ‘model concept’ we’re using. This stuff didn’t really make sense to me until I tried to modify the code generator.

All these pieces make sense to me now but it used to look like a black art. I don’t think much of it really requires a deep understanding of C++, although writing a function with analytic gradients requires more than the rest.


Multivariate Function with Known Gradients - RStan
#5

Krzysztof:

Feel free to write those docs!

I think we already have have everything you’re asking
for in bits and pieces, but not in the form of a coherent
document.

I thought all the code we currently have in Stan would
work as an example. Or do you mean something like a step-by-step
example? It’s always hard to know when to stop. We might need
more than one of these docs for people at different levels of
understanding of Git, C++, matrix arithmetic, etc.

  • Bob

#6

Also anyone can feel free to post something on the blog with this material. I think it will be of general interest, and not just for Stan people.
A


#7

Sure. This is meant to be an inclusive thread aggregating confusions people have had when they started and they may very well lead to numerous docs to guide new developers into the project.


#8

No, not screening questions. Just confusions/speed bumps new developers have had when they started looking at the code. The idea is to identify what should be communicated to new developers to make their transition as smooth as possible.


#9

What I meant was a case study of how to write a particular
function. Like a video of all the steps involved.
It’s always nice to see a concrete example in addition to
high-level pointers to Git doc and books on C++ template
programming.

  • Bob

#10

That would be cool. I’d sign up for (attending) that.


#11

I’m feeling the freedom! Just the time hasn’t come up yet.

I’m thinking on the level of what it would take to get a committed undergrad with some coding experience and the required (minimum) math background to walk through tasks 1-4. For C++ concepts we can send people to basic books, for math we can ask that they know what a partial derivative is, but I’m not expecting people to grok why the calculation of gradients is set up the way it is by reading the code. Instead I’d like to point them to the right place to plug in the relevant partial derivative. You’re right we have a lot of this doc (and the Stan autodiff paper explains it really well) but I’d like to take the relevant pieces and put them into a worked example.

We basically did this on the Wiki for #1 and #2, I’d like to polish those and find a way to integrate them into workshop-style material (maybe a github repo with Markdown/C++ examples, etc…)


#12

sakrejda Developer
December 16
Bob_Carpenter:
Feel free to write those docs!

I’m feeling the freedom! Just the time hasn’t come up yet.

Bob_Carpenter:
I thought all the code we currently have in Stan would work as an example. Or do you mean something like a step-by-step example?

I’m thinking on the level of what it would take to get a committed undergrad with some coding experience and the required (minimum) math background to walk through tasks 1-4. For C++ concepts we can send people to basic books, for math we can ask that they know what a partial derivative is, but I’m not expecting people to grok why the calculation of gradients is set up the way it is by reading the code. Instead I’d like to point them to the right place to plug in the relevant partial derivative. You’re right we have a lot of this doc (and the Stan autodiff paper explains it really well) but I’d like to take the relevant pieces and put them into a worked example.

We basically did this on the Wiki for #1 and #2, I’d like to polish those and find a way to integrate them into workshop-style material (maybe a github repo with Markdown/C++ examples, etc…)

That sounds great. I think you’ll find the undergrads
will vary in background among all the relevant factors.

We’d
like to be able to enable people who haven’t used GitHub or
C++. It’s a daunting learning curve and an example would be super
great.

What I think makes sense is

  • clone the repo and get the build working

  • create an issue, create a branch, add a test that fails,
    fix the bug, create a pull request

  • add a new templated function to Stan math

  • push a function through Stan (submodules) with doc (LaTeX)

  • add a new function with analytic partial derivatives

  • add a new distribution with all the template metaprogramming

There’s other contributions to make up at the language, interface
and algorithm level, but I think it makes sense to tackle the
above at the math and Stan lib level first.

I think we could do the same for dealing with the R and Python
interfaces. Or working up at a higher level, like adding a new
language feature.

  • Bob

#13

Hey,

I’m trying to get started myself and have had a few issues so far. My background - “industrial programmer,” no prior professional C++ experience but have worked in 10 other languages.

Stuff I’ve struggled with:

  1. Where should I be asking questions for this kind of stuff? Some orgs have an IRC room or something similar that serves as triage. Should be something you don’t need to get permission to post on. And some of the questions can be a bit embarrassing for a long lived forum post like this.
  2. Getting my dev environment set up. Spent the morning so far trying to configure emacs to understand where the header files for stan-dev/math live. Generally - how should I check out the various repos? Can I just clone one that clones the others as submodules? Where do compiler or IDE settings live for each? Maybe someone casually posting their emacs config would be useful as a starter.
  3. Build process - how do I know I’ve entered a known-good build state, so that I can then make changes on top of that?

Currently spending a lot of time on #2, and given #1 I’m not sure if this is the right place to also post asking for help with that :P

Thanks,
Sean


#14

This or the users list is the right place, happy to help out as much as possible… in the context of the large pile of questions that we’ve all asked I don’t think you could come up with anything all that embarrassing.

git clone the develop branch

Clone the one you are working on. If your really good with git submodules you might get away with cloning just CmdStan and then correctly handling submodules if you’re working with, e.g., stan-dev/math, but that’s making it more complicated than it needs to be.

There’s a make subdirectory for compiler-related stuff, any IDE-related stuff ends up in a separate repo (?)

Sorry, I use vim… :)

Develop is always in a good build state.


#15

Vim config also useful! I will use whichever one starts working more quickly - in emacs I use evil-mode anyway.

re: building - I meant literally “what are the steps to build each of these repos?” I’m trying to add tests to the math repo and there are no make targets that seem to make sense for building, so I tried running runTests.py which I suspect will build as a side effect, but that command has been running for a long time and hasn’t finished yet so I’m still not sure if it was the correct entry point.

w.r.t. compiler settings - I think I might understand now how the vendored libraries are included, but some brief description of why the project is laid out the way it is with respect to directory structure, which pieces are dependencies of which other pieces, how dependencies are included, that kind of thing would be very welcome. Even if a veteran C++ developer might understand some of these things by reading the source, we probably want to lower that barrier to entry.


#16

sakrejda Developer
December 16
seantalts:
• Where should I be asking questions for this kind of stuff? Some orgs have an IRC room or something similar that serves as triage. Should be something you don’t need to get permission to post on. And some of the questions can be a bit embarrassing for a long lived forum post like this.

No IRC because nobody wants to deal with a constant flow of
interruptions. (Well maybe somebody does, but I don’t.)

This or the users list is the right place, happy to help out as much as possible… in the context of the large pile of questions that we’ve all asked I don’t think you could come up with anything all that embarrassing.

seantalts:
Generally - how should I check out the various repos?

git clone the develop branch

The developer process wiki explains our whole process. It’s
on stan-dev/stan. This is the stuff we’re talking about
consolidating.

seantalts:
Can I just clone one that clones the others as submodules?

Clone the one you are working on. If your really good with git submodules you might get away with cloning just CmdStan and then correctly handling submodules if you’re working with, e.g., stan-dev/math, but that’s making it more complicated than it needs to be.

I start with cmdstan, then from cmdstan you can run:

make stan-update

which will pull out the stan submodule. Then cd into
stan and run

make math-update

which will pull in the math lib.

Then don’t change those and don’t commit new submodule links
directly—we handle those updates through automatically generated
pull requests from the continuous integration system when an
upstream module changes.

seantalts:
Where do compiler or IDE settings live for each?

There’s a make subdirectory for compiler-related stuff, any IDE-related stuff ends up in a separate repo (?)

We just have makefiles. There’s no IDE stuff?
There’s also no compiler-related stuff built into Stan. We
just use the system call or you can create a make.local to
override. We tend to use clang++ on the Mac from Xcode
with O=0 for optimization level 0 during builds and testing.

seantalts:
Maybe someone casually posting their emacs config would be useful as a starter.

Sorry, I use vim… :)

Mine’s out of date because I hate installing software.
Everything other than how to configure stan-mode is in
our Developer Tricks wiki.

seantalts:
• Build process - how do I know I’ve entered a known-good build state, so that I can then make changes on top of that?

Develop is always in a good build state.

We try. You can run the unit tests to make sure.

  • Bob

#17

I don’t really have one… every once in a while Daniel tells me I need to remove whitespace at the end of lines of C++ and I have to break out sed but … I think you just need to wait for one of people who’s more into IDE’s to answer.

The math library is header-only so you never build it as a shared library.

That does build and run tests, and you can pass it specific test files to run, but it doesn’t build a library.

Good point, one of us should really do that…


#18

seantalts Developer
December 16
Vim config also useful! I will use whichever one starts working more quickly - in emacs I use evil-mode anyway.

re: building - I meant literally “what are the steps to build each of these repos?” I’m trying to add tests to the math repo and there are no make targets that seem to make sense for building, so I tried running runTests.py which I suspect will build as a side effect, but that command has been running for a long time and hasn’t finished yet so I’m still not sure if it was the correct entry point.

The math library is primarily header only. The only thing
that gets build in the sense of generating object code is the
ODE solver library.

You can do something like this:

make -j4 O=0 CC=clang++ test/unit

which will kick off all the unit tests using 4 cores using optimization
level 0 and the clang compiler.

w.r.t. compiler settings - I think I might understand now how the vendored libraries are included, but some brief description of why the project is laid out the way it is with respect to directory structure, which pieces are dependencies of which other pieces, how dependencies are included, that kind of thing would be very welcome.

All there on the wiki. Just not well organized. You
want to start from the top level Wiki page, which is a directory:

You’ll see the layout of the repos there. If you click through
the process, you’ll get more detailed layout of directories.

The math lib has its own wiki:

Somewhere, there’s a list of dependency order among scalar,
array and matrix, and primitive, reverse-mode, forward-mode
and mixed autodiff.

Even if a veteran C++ developer might understand some of these things by reading the source, we probably want to lower that barrier to entry.

Absolutely. Very high priority for the project, I think.

  • Bob

#19

(add-to-list 'load-path “~/stan-contrib/stan-mode/”)

(defun java-mode-untabify ()
(save-excursion
(goto-char (point-min))
(if (search-forward “t” nil t)
(untabify (1- (point)) (point-max))))
nil)

(add-hook ‘java-mode-hook
’(lambda ()
(make-local-variable 'write-contents-hooks)
(add-hook 'write-contents-hooks 'java-mode-untabify)))

(add-hook ‘html-mode-hook
’(lambda ()
(make-local-variable 'write-contents-hooks)
(add-hook 'write-contents-hooks 'java-mode-untabify)))

(add-hook ‘cpp-mode-hook
’(lambda ()
(make-local-variable 'write-contents-hooks)
(add-hook 'write-contents-hooks 'java-mode-untabify)))

(add-hook ‘stan-mode-hook
’(lambda ()
(make-local-variable 'write-contents-hooks)
(add-hook 'write-contents-hooks 'java-mode-untabify)))

(setq indent-tabs-mode nil)
(put 'downcase-region 'disabled nil)

(add-to-list 'load-path “/Users/carp/stan/src/StanMode/”)
(require 'stan-mode)

(setq x-select-enable-clipboard t)

(add-hook 'c+±mode-hook
(lambda () (add-to-list 'write-file-functions 'delete-trailing-whitespace)))

The last line gets rid of the trailing whitespace. There’s
also code to use the StanMode, but mine’s out of date, I’m sure.
There’s also the tab killing code in here.

  • Bob

#20

sakrejda Developer
December 16
seantalts:
Vim config also useful! I will use whichever one starts working more quickly - in emacs I use evil-mode anyway.

I don’t really have one… every once in a while Daniel tells me I need to remove whitespace at the end of lines of C++ and I have to break out sed but … I think you just need to wait for one of people who’s more into IDE’s to answer.

Run:

make cpplint

which runs Google’s code style checker. We’re down to zero errors.
There’s also reference to that and some exceptions in the Stan repo
wiki.

  • Bob