Google Summer of Code 2022 - Call for Proposals

It’s that time of year again.

@SGB would like to call attention to an opportunity to participate in Google’s Summer of Code program 2022.

At a high level, this would connect student coders with mentors in Stan’s developer community. The mentors would supervise students as they work on development projects.

There has been a couple of major changes to the program from 2022. The official blog post from Google Open Source covers these changes.

  1. Now it’s possible for projects to propose “big” projects (350 hours time commitment) and “medium” projects (175 hours time commitment) for GSoC. The “medium” projects were introduced in GSoC 2021, and after a lot of feedback, GSoC brought back the pre-2021 “big” projects. So now when projects work on the ideas list they also need to mention if this is a “big” or “medium” project.

  2. Pre-2021 there used to be strict deadlines, the start date, and end date were synced up for every student now, it’s possible to have these dates anytime between a 22 week period instead of a 12 week period.

  3. Anyone above the age of 18 can participate in GSoC, no need to be an enrolled student anymore.

Applications for GSoC mentoring organizations open on Feb 7th, and the deadline to apply is Feb 21st. (full timeline).

If you have a project in mind, and would like to participate, please do the following:

  • Find another developer or two who are willing to serve as a mentor.
  • Please submit a brief proposal with the following information:
    • Your names, project title, brief general project description (one or two lines).
    • Detailed project description.
    • Aims/expected outcomes of the project.
    • Technical skills required/preferred.
    • What can the student expect their contribution to be and what skills they hope to learn/improve.
    • A difficulty rating of easy/medium/hard

SGB will meet to review proposals and forward approved proposals to NumFOCUS, which will submit a single application that includes these Stan projects alongside those from other NumFOCUS organizations.

If you are insterested, please respond to this thread with a link to your proposal at your earliest convenience to give SGB sufficient time to review.

NumFOCUS has a template that you can use as a guide here: gsoc/ideas-page.md at master · numfocus/gsoc · GitHub

You may find previous year’s projects here.

Thanks,
Yi Zhang on behalf of SGB

5 Likes

we got selected for last year’s GSoC - did we get a student?

here’s something I think would be do-able:

  • develop a Python package for posterior predictive checks via plotting - basically BayesPlotPy - looking to put together a few good plots and a sane infrastructure. using the plotnine package so that we can produce plots that look like ggplot2 outputs.
2 Likes

Yes. We had contributors on

I like it. It’d be great if someone is interested to mentor.

1 Like

yes, I would be willing to mentor.

also, if there’s a hacker out there who likes to build package installers, we could revive the idea of a drag-and-drop installer for CmdStan and the C++ toolchain.

2 Likes

would like to get feedback from last summer’s set of mentors - how did it go?

We had a chat with @mans_magnusson. According to him it’s generally positive. He also pointed out that the two-mentor setup is helpful for the summer season. Mentors mentioned above are welcome to further share experience here.

That’s great. Looking forward to the proposal!

@mitzimorris Somehow I can no longer pin a thread. Can you help pin the current thread and this one globally? Thanks.

Hi!

I think it went quite good. Although I learnt that it is important that the project does not have any loose ends. There were some models and stuff that wasn’t done during the project already and this was actually never completed. So I think it is really important to finish the project on time.

Otherwise, it is quite similar to a masters project in scope (at least the masters project scope at Aalto and Uppsala) =).

Edit: I’m happy to answer questions - If any.

/Måns

2 Likes

Done.

1 Like

I’ve done a bunch of work recently on cross-platform installer of a Python app, inc using brew on Unix-likes & pyenv across the board. Think the compile tool chain would just need addressing Windows with something like chocolatey? Would also be cool to have a WSL2 option for windows, as last I heard Stan was faster via WSL2 on windows anyway.

I’m talking “drag-and-drop” - the kind of install experience where link on a webpage downloads a .dmg package installer, user clicks on the icon and a little wizard guides the user through the install experience.

the trick is in the packaging. List of installation software - Wikipedia - I’ve contacted the folks at InstallAnywhere and we can get an open-source license to use their product. can we and should we and can we find someone who likes to play around with this sort of thing?

2 Likes

Just noticed that we only have one proposal candidate! How about:

  • IO-rewrite (@mitzimorris maybe you thought of this and discarded it as a reasonable magnitude project?)

  • streamingStan (@s.maskell are any of your streaming-inference projects suitable timeline-/workload-wise?)

  • http dashboard with a for during-sampling diagnostics (maybe written in python but ideally launched from either cmdstanr or cmdstanpy).

Some of Bob and I’s coworkers have built GitHub - flatironinstitute/mcmc-monitor: Monitor MCMC runs in the browser. It has some direct integration with Python, but you can also run it as a command line script and point it to any folder that cmdstan output is being placed and then run your sampling in any of the wrappers.

2 Likes

Amazing! I implemented something similar in aria using RStudio’s built-in “jobs” feature, but ended up finding it too unreliable (in ways I confirmed were not attributable to my own code). I’d been planning on implementing something with a simple http output that both Rstudio and VScode could support, but happy to see you’ve already gotten the ball rolling! Is there anything particularly outstanding for that project that a GSoC intern could address?

I’m not 100% sure - I think the basic product is essentially complete as is, but I’m not too familiar with it. The authors showed Bob and I around an earlier version last year

Possibly my look just now was too cursory, but I think I see some additional features that could be worthwhile:

  • histogram of tree-depths by chain (useful to detect if one or more chains is lumping against the max)
  • BFMI by chain
  • bulk/tail ESS by chain by parameter
  • rhats by parameter (this requires across-chains computation)
  • Estimated time remaining (I know this really can’t be reliable until a few non-warmup iterations complete, but something is better than nothing)

And I haven’t tried it enough to discern that it does not already achieve this, but I think best user uptake would be achieved if the page served can be accessed in both RStudio (I think one can use it’s api to show http in the Viewer pane?) and VSCode (I’m positive users there have worked out how to have a http dashboard pane).

Oh, I missed this on first catching up with the thread. I’d be happy to join @mitzimorris mentorship on this too if it’s still of interest

hi Mike

I just put together the following, which I would be happy to mentor (jointly, if anyone’s interested)

“Visual diagnostics for Bayesian Data Analysis in Python”
Under guidance from the mentors, the student will develop a library of plotting functions in Python
for use after fitting Bayesian models using MCMC methods. Key plots for a set of
posterior draws from an MCMC sample include visual MCMC diagnostics, and graphical posterior
(or prior) predictive checking. The student will use the plotting library plotnine
which provides a grammar of graphics for Python.
The resulting plots will be added to the mc-stan.org ecosystem of tools for Bayesian Data Analysis.

1 Like

excellent - happy co-mentor

Oh, one quick thought: how much overlap would there be between this project and ArViz? Is it that we want more publication-quality visuals and we think ggplot/plotnine achieves that better than ArViz?