Google Summer of Code 2021 - Call for Proposals

Hi All,

@SGB would like to call attention to an opportunity to participate in Google’s Summer of Code program 2021.

At a high level, this would connect student coders with mentors in Stan’s developer community. The mentors would supervise students as they work on development projects.

Some details:

If you are a developer, have a project in mind, and would like to participate, please do the following:

  • Find another developer or two who are willing to serve as a mentor.
  • Please submit a brief proposal with the following information:
    • Your names, project title, brief general project description (one or two lines).
    • Detailed project description.
    • Aims/expected outcomes of the project.
    • Technical skills required/preferred.
    • What can the student expect their contribution to be and what skills they hope to learn/improve.
    • A difficulty rating of easy/medium/hard

Please ensure that the project can be completed during the 10 week period. Previously, NumFOCUS recommended as a rule of thumb that the proposed project should be able to be finished by the mentors within a week or so.

Here is a nice example of such a proposal from our participation a few years ago: gsoc/ideas-list-stan.md at master · numfocus/gsoc · GitHub

SGB will meet to review proposals and forward approved proposals to NumFOCUS, which will submit a single application that includes these Stan projects alongside those from other NumFOCUS organizations.

NumFOCUS has asked that we submit project ideas to them in the next 2 weeks, so please respond to this thread with a link to your proposal by 2021-01-24T05:00:00Z to give SGB sufficient time to review.

It seems that a .md file hosted on GitHub is the ideal format. NumFOCUS has a template that you can use as a guide here: gsoc/ideas-page.md at master · numfocus/gsoc · GitHub

Thanks,
Arman Oganisian

10 Likes

I posted in another thread that it would be helpful to get some students to survey users from various fields to collect in-use models, clean/optimize them and add them to posteriorDB for more robust benchmarking.

I’m not a dev, but would be happy to help supervise students.

5 Likes

Hi Mike,

Just looked through that thread - it seems like an impactful project for the community! If you do find one or two others (maybe including a developer?) to supervise this project, please do submit a proposal. The program recommends 2-3 mentors per student.

We would be happy to review the proposal in our next SGB meeting. Let me know if you have any questions!

1 Like

Let me know if any of these are of interest and I can get some people together to put the proposals through.

  1. There’s a thread about LambertW transforms that I’d say is medium difficulty and the student could work with me and @stevebronder . Adding Lambert W Transforms to Stan?.

  2. Tangentially related to posteriorDB is the helpful functions repository. @bbbales2 mentions a packages/namespaces/tests structure. It depends on how we scope this on the difficulty level. It could be easy if it is just combing through discourse, google, stackoverflow, github, etc. and gathering a bunch of UDFs to put in a repo. Or hard if it is doing the namespaces and overloading. Though I think the impact of the hard route is much higher, plus they’d just have to put in a few examples with tests to show how to add to the repo. I’ll nominate @bbbales2 to oversee the project (I can help to) and have him add anyone else who maybe interested and able to help.

6 Likes

Great idea! @mans_magnusson would you be interested? I can help making the plan, but I’m not able to commit to supervise during the summer.

EDIT: somehow the first post was scrambled

5 Likes

+1 to this.

SGB reviewed during our meeting today and we like these ideas. It’s great to see enthusiasm for participating. We feel it will be an effective way to engage students and grow the community.

So here is a tentative list of projects with proposed mentors.

It would be great if each proposed mentor replies here and confirms availability to mentor should we successfully recruit students for these projects through GSoC. Just want to be upfront that this seems to be a non-trivial time commitment - keep the following in mind:

  • Students will be with us for 10 weeks, 175 hrs. Most of this time will be spent coding and not with the mentor, but mentors must be available throughout to meet and guide students as needed to complete the project.
  • It seems mentors will also be writing evaluations for students as per GSoC’s requirements.
  • Mentors should put in significant time into defining a proper scope for each project with clear milestones for the student and managing progress accordingly.
  • Please do look over the GSoC’s mentor guide: What is Google Summer of Code? | Google Summer of Code Guides .

Once we have confirmed mentors, we can begin work on our “ideas page” - which will describe each of the projects as per the template in my initial post. Our ideas page will be linked here along with other NumFOCUS orgs (see PyMC3’s page as an example): gsoc/ideas-list.md at master · numfocus/gsoc · GitHub.

Similar to other orgs, we can host our ideas page as a Wiki in the Stan GitHub Repo. I can start a template there with sections for each of these projects and ask the mentors to fill in their respective sections. I wasn’t sure if there was a process for starting such a Wiki page so I started an issue here to check before messing with the repo.

4 Likes

@stablemarkets do we have the historical accepted proposals?

2 Likes

This specific thing is likely too big for a GSOC.

Math library things are probably the most appropriate for GSOC. SOmething that comes to mind is implementing custom autodiff for a bunch of our forward mode stuff.

An advantage there is that we already test forward/higher order autodiff for most of our functions, so there’s a lot of work to be done there that already has a big support framework in place.

Also there’d be work implementing/cleaning up custom reverse mode autodiff stuff that is there.

I can’t promise I’ll have time in the Summer though. If I’m around, happy to help, but I get anxious about my name being on lists where there are expectations lolol.

@andrewgelman I recall you’ve mentioned a couple times in the weekly meetings that folks approach you with models that are slow; would you have availability/interest in joining the proposal to get some GSoC students to collect and optimize such models for inclusion in posteriorDB and benchmarking?

I know this question is addressed to Andrew, but responding anyhow - I don’t see the connection between helping folks rewrite and/or reparameterize slow models and posteriorDB. also thinking of the saying “give a person a fish / teach a person to fish”.

I would love to see a series of case studies showing how to reformulate a complex model that is fundamentally correct but is slow so that it takes advantage of the way that the Stan math and current architectures.

However, if the model is just another very complicated multi-level regression, it probably doesn’t need to be added to posteriorDB.

Ah, yes, it would be important to select candidate model/data combos that span an assortment of domains.

Yeah sure!

1 Like

I don’t have that data on hand and couldn’t find it with a cursory google search. NumFOCUS seems to put in an application every year and I confirmed they did participate in 2019 and 2018. Also Stan did participate in 2017 it seems. So I think NumFOCUS (and by extension, us) has a great shot at getting accepting as a member organization.

However, my understanding is that getting accepted as a member org doesn’t necessarily guarantee that a student will pick one of our projects. I think there’s some period where students look through options, reach out to member orgs to discuss the potential projects, and then make a selection. So getting students will in part depend on us writing interesting project descriptions.

I can say that I’d be interested in @mike-lawrence’s idea regarding the survey of models and updating the examples repo. I’m not too familiar with the details of the process, but I know you’d have at least one student applying (me) if that proposal makes the final cut.

4 Likes

Sure, that would be fine. I don’t really know how summer of code works. But I’ve been meaning to draft this Bayesian Benchmarks paper so this could be a good motivator.

1 Like

I’m happy to help out. I can be a mentor. Although, there are some weeks during the summer I will not be available (a total of 4 weeks, spread out). But I guess that would be the reason of having multiple mentors?

I think this is a great idea. I think even me and @avehtari have a document on a similar project already laying around. Also we have the discussions with the Facebook and Google people on a set of benchmark models, so we have additional models from different domains listed there as well. I guess the most important step would be to get the scope not to be too large.

6 Likes

Great! So with you and @andrewgelman we’ve met the 2-Dev-minimum (I’m happy to help supervise but I’m not an official Dev and certainly haven’t made any Dev-level contributions yet).

I’ll try to finish up my job-work quick today so I can start writing the proposal (deadline is in two days). I’ll share the doc link here when I start, but feel free to do so yourselves if you don’t want to wait on me

2 Likes

Thank you! This is great. Being away shouldn’t be an issue I feel. The other mentor can be more active in those weeks as you suggest. Also you could meet ahead of time to be sure the student knows what needs to be done while you are away.

FYI @stevebronder suggested that we host the proposal on stan-dev/design-docs. I have a pull request open to merge on a proposal template I made. If anyone has review/merge authorization it would be great.

Maybe we all fork this, work on our individual portions, and merge.

2 Likes