GSoC 2021 - Q/A thread

Okay thanks,I will start working on the resources.

Debojyoti chakraborty, Student, https://myexpindark.me

I’m interested in participating although I’m not sure how much time I’ll have this summer. I was looking at contributing/putting in an application to the BRMS topic because it was how I got into Stan. I have been reading some of the literature on GARCH models and tested few versions of the basic GARCH(1,1) model on simulated data and have some questions.

A lot of the literature notes that normally distributed noise is insufficient to fully capture the tails of real data, and suggests a students-t model or even various mixture-type models in newer literature. From testing simulated data and the S&P500 data most textbooks use, the fit was generally much better with the t-noise model unless I was generating data using normal noise.

The github page only mentioned gaussian noise models but I would assume we could be flexible on this in our implementation?

These models can also be kind of finicky to fit with any kind of weak prior… basically all of the literature uses a uniform prior. I know the end goal would be to implement all of the models listed but testing prior distributions and model specifications seems like a difficult task in it of itself. Will the project be more so focused on implementing all the models and then going back in to test things like priors or will it be more focused on doing a thorough implementation+write-up for each model?

Lastly, are there any decent papers/texts that I might be missing for Bayesian Time Series? I’ve mostly been referencing this text by Ruey Tsay but it’s focused mostly on ML.

2 Likes

Hi there,

I am quite interested in the “GARCH Modelling in BRMS” project proposed. I have done some work with time series models, interned at a trading firm, and have used BRMS during a research internship where we implemented Bayesian hierarchical models in BRMS (application was on farm animals here in New Zealand, wonder if that is the first agricultural application Paul has heard of?).

I think the Bayesian workflow material lots of the developers/researchers here have been putting out recently is super cool. Although this project may not be “directly” involving workflow, the BRMS package has certainly had a big impact on my workflow when it comes to fitting certain types of models (who knows, maybe we come up with cool default priors for certain instances of GARCH models?).

Anyways, sorry for going off tangent there, but just have some questions about the application process:
– I do not know too much about GSOC, but it seems that often students are expected to come to the participating organisations with a proposal in mind. It appears that the three offered projects here seem reasonably thorough/planned (you have something clear in mind). I am wondering to what degree do you expect an applicant to build upon the proposed projects in their application?
– How do we apply? Will a portal be made available? Sorry in advance, I understand I could be getting ahead of myself here.

Lastly, just got to fan boy over you all for one second. All the research papers published by this group and Stan itself is the main influence on my understanding and enjoyment of Bayesian Inference, and honestly you all rock! Thank you.

Cheers,

Conor

2 Likes

Before I start, I should note that I am not a mentor for any Stan project, but I have been both student and mentor in previous GSoC runs and think I can answer some of the questions.

The mentors will be the ones to actually review the proposals sent by applicants and decide based on them, so I can’t really say anything about how much should each student add to their application. However, I would encourage you to take a look at the gsoc student guide which also has some example proposals in the appendix.

All applications must be submitted to the GSoC website: https://summerofcode.withgoogle.com/ which is already available and will start accepting applications on March 29th. Stan is participating under the NumFOCUS umbrella, so you should select NumFOCUS as your organization, then Stan as sub-organization. There you will be able to submit your proposal in pdf format.

10 Likes

Awesome @OriolAbril. Thanks for your reply, this helps a lot!

Hello @stevebronder @spinkney

I’d like to signal my interest in adding Lambert-W transforms to Stan. I’ve read the proposal and believe I’m qualified. My motivation is as follows: I’m a Stan user/first-year research masters student in Maths and eventually want to work on Bayesian computation problems, so this is an ideal practice project.

Let me know if you’d like to discuss further or have any questions (I don’t have a public portfolio because I work at a private company).

Neel

2 Likes

Hi everyone,

I’m new here (actually I am registered from 2 years now, but this is my first post), and I discovered this opportunity with @andrewgelman blog post (thank you!).
I already have some experience in developing functions at the interface of C++ and R (I worked on the fdaPDE package last year) and, besides being fun per se, the collaboration with a great community as the Stan’s one is thrilling me.

I’m particularly interested in the Lambert W project with @stevebronder and @spinkney, and I would ask a couple of question about it if you don’t mind.
I’ve read the original paper on Lambert W transform, which introduces MLE and IGMM estimators for it, and I wonder if there exists also a more “Bayesian approach” to the problem, and if this work on Stan should consider also the embedding of the transform in a Bayesian framework (I wonder this since I know Stan mainly for Bayesian analysis).
Second thing I would ask what “Priority: Low/Medium” means on the project description since the other ones only have “Priority: Low”.

Thank you again for the amazing opportunity,

Alberto

P.S.: I’ve also found some non-working link in the Stan homepage and wiki. What is the best way to report them/check if they’ve already been reported?

4 Likes

I would say the garch models for brms is a project that would probably require full time during the summer. Though we have it broken up so that any of the individual steps are quite a nice success on their own.

Supporting more distributions is always awesome, but getting the first steps done of a normal ol’ garch(p, q) should be a nice amount of work on it’s own. As we work out the standard garch(p, q) it should become pretty evident which pieces of the code generation would need to change to support different types of conditional heteroskedasticity and distributions etc. And if we have enough time to do these then that’s super rad!

Yes so we broke it up into milestones so that we would have flexibility in what the GSoC student wanted to do as well. Just doing SBC on a bunch of those models would be worthwhile in itself as a case study! But if the student wants to jump to the BRMS stuff we can just do SBC for the simpler garch models as an exercise and then jump straight to the design process for BRMS.

The varstan vignette is nice for a lot of the models related to the project, I’ll ping @asael_am if he knows more resources. For general time series analysis I like Time Series Analysis and Its Applications and Forecasting: Principles and Practice

2 Likes

Working off the original proposal is good and if you have something else in mind that’s also good! I’m not omnipotent so it’s probable I missed some nice project idea :-P. The projects we listed are things we have talked about doing for a while but never got around to and think a summer student would be reasonably able to tackle. Happy to chat about any project ideas you have if you want a sense of if we’d like it.

That’s exactly how I started as well!

1 Like

Hi!

Happy to discuss this further! send me and @spinkney a PM and we can sort out a time to chat.

Yes I think we can follow a more Bayesian approach here, check out this thread where @spinkney goes over that a bit. I believe what we want at the Stan level is code with something like

data {
int N;
vector[N] x;
}

parameters {
real mu;
real<lower=0> sigma;
real gamma;
}

transformed parameters {
  // Skew example for simplicity
  // lambert_transform skew has signature
  // f(distribution, Data, distribution_params, lambert_skew_params)
  vector[N] x_gauss = lambert_transform_skew(normal_lpdf, x, mu, sigma, gamma);
}

model {
  // Whatever modeling a user wants to do on the gaussianized data
  x_gauss ~ std_normal();
}

generated quantities {
  // make predictions and degauss them
  vector[N] x_pred_gauss = normal_rng(N, 0 ,1);
  vector[N} x_pred = lambert_untransform_skew(normal_lpdf, x_pred_gauss, mu, sigma, gamma);
}

Where we use the distribution type to infer the transform for that particular distribution. Sean may have other schemes / ideas.

4 Likes

Hi everyone,

I am interested in the project “Benchmarking Bayesian Models in Stan”.
I am a Ph.D. student at Cornell working at the intersection of sociology and statistics.

Three reasons why I might be qualified for this project:
First, I have taken several courses on Bayesian statistics and have practical experience implementing Bayesian models. Second, in my master program “Methods and Statistics for the Behavioral, Biomedical, and Social Sciences”, I was introduced to a variety of models from different disciplines. Third, I am an experienced programmer: Python (~ 1 year), R (~ 7 years), and Jags (~ 4 years).

The reason why I am interested in this project is that I have programmed an R package (similar to but way less developed than BRMS) to estimate a specific type of Bayesian hierarchical model in Jags from within R in a user-friendly way. This model is useful for political scientists working on coalition government data (and more generally, researchers interested in including aggregation functions into regression models). I would like to implement this model in Stan at some point. In the proposed project, I would learn how to implement a variety of models in Stan and how to optimize their performance. I imagine that these skills could be very useful when I will translate my Jags model into Stan at some point down the road. More generally, I am interested in getting involved in the Stan community.

7 Likes

I agree with @stevebronder applying a GARCH structure to other distributions might be tricky, for example, the t-student innovation GARCH model is a mixture of a normal and gamma distribution, such that the marginal likelihood follows a student-t, or models such as Poisson ARMA models actually are integer GARCH models.

Some references of GARCH models might be Vronts, Dellaportas n Politis or Ardia n Hoogerheide 2008 and my fav Fonseca, Cerqueira, Migon n Torres.

3 Likes

@stevebronder

Hi! I won’t be able to PM people (my account is too new). If you PM me, then hopefully we can correspond via email. I don’t have any questions at the moment. I do hope to put in some smaller commits while my application gets processed to get familiar with the code-base.

2 Likes

Hi all,

I also discovered the brms GSoC projects through @andrewgelman 's blog, and I’m excited to hopefully contribute in some way. I am interested in the “GARCH Models in brms” project as well. I have known about brms for quite some time, but have not used it thoroughly. I have recently worked to implement censored ecological models using this package, and I have been very impressed with the existing versatility (especially in the distributional parameter syntax)! I’d like to contribute to this package to give back to the Stan community while gaining some mentorship on becoming a more active open source developer.

One question in regards to the final milestone of the project description. Is there a prioritization regarding the additional flavors the team would like to see implemented or will this be assessed as the project progresses?

Feel free to reach out via PM (I do not believe I can a PM myself due to account age) to continue the conversation.

4 Likes

Just a quick message to applicants. When you submit the application to NumFOCUS via the GSoC web site, I believe there will be a “proposal tag” field. In this field please indicate “Stan” so we can easily subset to proposals for Stan versus other NumFOCUS projects.

Thanks!

3 Likes

Unfortunately, “Stan” is not an option as proposal tag:

3 Likes

Thanks for letting me know. Let me see if I can resolve this and get back to you today.

So it seems the upper limit to the number of tags has been hit already, so we cannot add Stan as a tag.

Please do exactly what you did above and append "Stan - " to your title. That should be enough to help us identify the Stan applications.

Thank you!

2 Likes

Just a reminder that application deadline is tomorrow, April 13. Please have your proposals on the GSoC site submitted. I.e. they should not be in “draft” mode.

Thank you!

1 Like

Thanks @mans_magnusson and @avehtari for the really helpful pointers. I got a clear picture of the project from working through the suggested material. I submitted my proposal and I am excited.

Also, I checked out the issue page of the posteriorDB at Issues · stan-dev/posteriordb · GitHub . Would you recommend any good first issue to get familiar with the code base? Thank you