Baysian Data Analysis of PGA Golf Scores with Py-Stan

Hi Everyone,

The link below is a survey of some analysis and prediction of golf scores using py-stan. I use three different models to understand and predict scores. It’s not ground breaking research, but could be good for a Stan conference presentation. The readme contains and embedded slide presentation with overview, analysis, and results.


1 Like

Cool. You might be interested in presenting this at StanCon in January. You can go faster if you replace Stan constructs like

for (n in 1:N) {
  y[n] ~ normal(alpha[t[n]] + tau[p[n]], sigma[p[n]]); 

with the single line

y ~ normal(alpha[t] + tau[p], sigma[p]);

Since t is an integer array with size N, alpha[t] copies the elements of alpha the appropriate number of times so that the total size of alpha[t] is also N.

I’m totally into people posting sports examples! They usually come with cool plots.

The intervals in this plot: , are these like mins and maxes of round scores for each player? Or 50% intervals?

And do the orange dots here ( come from generated quantities and the green line the original data?

I was looking at “Different tournaments/different courses have different coefficients that fit SG to score”. Does that factor into the regression here: ? I think adding comments to the model there would be good (I wasn’t exactly sure what N_T is… Is it number of tournaments?)

Is it possible to plot the AR coefficients along with the score predictions for a few players?

Fun stuff!

makes sense… wasn’t sure I could do that with the mapping array. thank you!

thanks for the feedback, will definitely do a second pass at legends and more explanation. great idea for AR coef.

doesn’t seem to work, fyi.

No matches for:

real[] + real[]

Oh, if you can define alpha and tau to be vectors and then you can add them.

Either that or you can use to_vector(alpha) + to_vector(tau). Arrays and vector/matrix things are kept distinct in Stan (arrays are std::vectors, and vector/matrix things are Eigen types).

Thanks for pointing that out. Yes, indeed… took like 40% the time of
the non-vectorized version.

1 Like