GSoC 2021 - Q/A thread

Hi everyone,

Starting this topic under the “google summer of code” category. Here, interested students (and, later, accepted students) can communicate with each other and project mentors.

Descriptions of Stan projects and respective mentors can be found here.

Information about timeline about GSoC 2021 can be found here.

4 Likes

Hi @anirudht. Thanks for the interest. Please post your questions here and we will be happy to answer. The mentors for the Bayesian Benchmarking project are @mans_magnusson and @mike-lawrence - maybe they can help.

Original question here @anirudht : Google Summer of Code 2021 - Call for Proposals - #26 by anirudht

3 Likes

Hi Stan developers and @mans_magnusson & @mike-lawrence,

I am Huy, a Master student in Data Science in Sydney, Australia. I’m interested in submitting a proposal for the Benchmarking Bayesian models project. Could you recommend any background reading related to the benchmarking methodology? (And also the “why” of the project)

My motivation is I want to further develop my Bayesian statistics knowledge.Learning by doing whilst contributing to an open source project is very attractive to me. Plus I’ve learnt that Stan is among the most popular Bayesian package.

I followed Stan’s GSoC project lists submission progress. While waiting for the confirmation from Google for the organizations, I have finished the Fundamental of Bayesian Data Analysis course on Datacamp as preparation. And I’m excited for this opporutnity

3 Likes

Hi!

Great that you are interested in the project!

So I will try to flesh this out a little. First, the main place where to add data and models can be found are here:

Milestone 1: Identify models that would be of relevance to include as benchmark models.

This will include finding models that would be of relevance for benchmarking purposes in Stan. Below are a couple of examples. Ideally, we want posteriors that have been published somewhere so it is possible for users to read up on these models.

Posteriors where both center and non-center parameterizations have funnels

This could be done by making some of the errors in the 8-school example arbitrarily small so both parametrizations would have problems. Alternatively, we could simulate data to get to the same problem.

The Birthday problem

See Section 5.2 in [2004.11408] Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming.
Data and model can be found here: basis_functions_approach_to_GP/uni_dimensional/birthday-dataset at master · gabriuma/basis_functions_approach_to_GP · GitHub

GARCH and ARMA posteriors

These needs to be included.

Gaussian Mixture models on real data

One simple example would be a two-component mixture model with ordered mu. @andrewgelman might have some ideas here.

I also know that @andrewgelman and @Bob_Carpenter has a couple of models they think should be included for benchmarking purposes. Of course, here we could also ask others at Stan Discourse what type of models would be of interest for benchmarking (and testing purposes).

Here is another proposal from astrophysics that might be included:

Milestone 2: Implement the models in Stan (and posteriordb).

This is essentially extracting all relevant information and including the models in posteriordb so it is easy for everyone to use. If it is possible, it would also be great to compute a reference posterior for each model.

Here is some documentation for this:

Milestone 3: Optimize for performance

Finally, the included model (and other) might be good to improve for performance reasons. This is mainly to give good comparisons when the posteriors are used for computational speed benchmarking.

Here are a discussion and examples @stevebronder has found so far:

I think this gives some more information. Maybe @avehtari or @mike-lawrence have additional thoughts and comments?

/Måns

5 Likes

Great! If at any point there is a need for blogging this, just let me know.

Also, we have some updates on the birthday problem. Aki and I should be able to provide an up-to-date file.

And I’d prefer not to be analyzing the Old Faithful data if we can avoid it. I feel like we have better examples than that!

I’m ccing Aki because he often has good benchmark-style examples, also this is relevant for posteriordb.

2 Likes

Great! I just removed the old faithful now. Just let me know if you have some other mixture model posterior.

New birthdays case study
https://avehtari.github.io/casestudies/Birthdays/birthdays.html
and the model codes

(which are based on Gabriel’s code, but improved in many ways)

3 Likes

Hey myself Debojyoti a cs prefinal year student I found this project quite interesting to implement design-docs/bayesian_benchmarking.md at master · stan-dev/design-docs · GitHub

How can I start @mans_magnusson @mike-lawrence @avehtari

Have you worked through the resources above?

No I just came across through this project today and I have basic understanding of probabilistic models.

Debojyoti chakraborty, Student, https://myexpindark.me

Ok, then best to begin working through those resources and post back when you have specific questions :)

1 Like

Okay thanks,I will start working on the resources.

Debojyoti chakraborty, Student, https://myexpindark.me

I’m interested in participating although I’m not sure how much time I’ll have this summer. I was looking at contributing/putting in an application to the BRMS topic because it was how I got into Stan. I have been reading some of the literature on GARCH models and tested few versions of the basic GARCH(1,1) model on simulated data and have some questions.

A lot of the literature notes that normally distributed noise is insufficient to fully capture the tails of real data, and suggests a students-t model or even various mixture-type models in newer literature. From testing simulated data and the S&P500 data most textbooks use, the fit was generally much better with the t-noise model unless I was generating data using normal noise.

The github page only mentioned gaussian noise models but I would assume we could be flexible on this in our implementation?

These models can also be kind of finicky to fit with any kind of weak prior… basically all of the literature uses a uniform prior. I know the end goal would be to implement all of the models listed but testing prior distributions and model specifications seems like a difficult task in it of itself. Will the project be more so focused on implementing all the models and then going back in to test things like priors or will it be more focused on doing a thorough implementation+write-up for each model?

Lastly, are there any decent papers/texts that I might be missing for Bayesian Time Series? I’ve mostly been referencing this text by Ruey Tsay but it’s focused mostly on ML.

2 Likes

Hi there,

I am quite interested in the “GARCH Modelling in BRMS” project proposed. I have done some work with time series models, interned at a trading firm, and have used BRMS during a research internship where we implemented Bayesian hierarchical models in BRMS (application was on farm animals here in New Zealand, wonder if that is the first agricultural application Paul has heard of?).

I think the Bayesian workflow material lots of the developers/researchers here have been putting out recently is super cool. Although this project may not be “directly” involving workflow, the BRMS package has certainly had a big impact on my workflow when it comes to fitting certain types of models (who knows, maybe we come up with cool default priors for certain instances of GARCH models?).

Anyways, sorry for going off tangent there, but just have some questions about the application process:
– I do not know too much about GSOC, but it seems that often students are expected to come to the participating organisations with a proposal in mind. It appears that the three offered projects here seem reasonably thorough/planned (you have something clear in mind). I am wondering to what degree do you expect an applicant to build upon the proposed projects in their application?
– How do we apply? Will a portal be made available? Sorry in advance, I understand I could be getting ahead of myself here.

Lastly, just got to fan boy over you all for one second. All the research papers published by this group and Stan itself is the main influence on my understanding and enjoyment of Bayesian Inference, and honestly you all rock! Thank you.

Cheers,

Conor

2 Likes

Before I start, I should note that I am not a mentor for any Stan project, but I have been both student and mentor in previous GSoC runs and think I can answer some of the questions.

The mentors will be the ones to actually review the proposals sent by applicants and decide based on them, so I can’t really say anything about how much should each student add to their application. However, I would encourage you to take a look at the gsoc student guide which also has some example proposals in the appendix.

All applications must be submitted to the GSoC website: https://summerofcode.withgoogle.com/ which is already available and will start accepting applications on March 29th. Stan is participating under the NumFOCUS umbrella, so you should select NumFOCUS as your organization, then Stan as sub-organization. There you will be able to submit your proposal in pdf format.

10 Likes

Awesome @OriolAbril. Thanks for your reply, this helps a lot!

Hello @stevebronder @spinkney

I’d like to signal my interest in adding Lambert-W transforms to Stan. I’ve read the proposal and believe I’m qualified. My motivation is as follows: I’m a Stan user/first-year research masters student in Maths and eventually want to work on Bayesian computation problems, so this is an ideal practice project.

Let me know if you’d like to discuss further or have any questions (I don’t have a public portfolio because I work at a private company).

Neel

2 Likes

Hi everyone,

I’m new here (actually I am registered from 2 years now, but this is my first post), and I discovered this opportunity with @andrewgelman blog post (thank you!).
I already have some experience in developing functions at the interface of C++ and R (I worked on the fdaPDE package last year) and, besides being fun per se, the collaboration with a great community as the Stan’s one is thrilling me.

I’m particularly interested in the Lambert W project with @stevebronder and @spinkney, and I would ask a couple of question about it if you don’t mind.
I’ve read the original paper on Lambert W transform, which introduces MLE and IGMM estimators for it, and I wonder if there exists also a more “Bayesian approach” to the problem, and if this work on Stan should consider also the embedding of the transform in a Bayesian framework (I wonder this since I know Stan mainly for Bayesian analysis).
Second thing I would ask what “Priority: Low/Medium” means on the project description since the other ones only have “Priority: Low”.

Thank you again for the amazing opportunity,

Alberto

P.S.: I’ve also found some non-working link in the Stan homepage and wiki. What is the best way to report them/check if they’ve already been reported?

4 Likes

I would say the garch models for brms is a project that would probably require full time during the summer. Though we have it broken up so that any of the individual steps are quite a nice success on their own.

Supporting more distributions is always awesome, but getting the first steps done of a normal ol’ garch(p, q) should be a nice amount of work on it’s own. As we work out the standard garch(p, q) it should become pretty evident which pieces of the code generation would need to change to support different types of conditional heteroskedasticity and distributions etc. And if we have enough time to do these then that’s super rad!

Yes so we broke it up into milestones so that we would have flexibility in what the GSoC student wanted to do as well. Just doing SBC on a bunch of those models would be worthwhile in itself as a case study! But if the student wants to jump to the BRMS stuff we can just do SBC for the simpler garch models as an exercise and then jump straight to the design process for BRMS.

The varstan vignette is nice for a lot of the models related to the project, I’ll ping @asael_am if he knows more resources. For general time series analysis I like Time Series Analysis and Its Applications and Forecasting: Principles and Practice

2 Likes

Working off the original proposal is good and if you have something else in mind that’s also good! I’m not omnipotent so it’s probable I missed some nice project idea :-P. The projects we listed are things we have talked about doing for a while but never got around to and think a summer student would be reasonably able to tackle. Happy to chat about any project ideas you have if you want a sense of if we’d like it.

That’s exactly how I started as well!

1 Like