Baysian Data Analysis of PGA Golf Scores with Py-Stan


Hi Everyone,

The link below is a survey of some analysis and prediction of golf scores using py-stan. I use three different models to understand and predict scores. It’s not ground breaking research, but could be good for a Stan conference presentation. The readme contains and embedded slide presentation with overview, analysis, and results.



Cool. You might be interested in presenting this at StanCon in January. You can go faster if you replace Stan constructs like

for (n in 1:N) {
  y[n] ~ normal(alpha[t[n]] + tau[p[n]], sigma[p[n]]); 

with the single line

y ~ normal(alpha[t] + tau[p], sigma[p]);

Since t is an integer array with size N, alpha[t] copies the elements of alpha the appropriate number of times so that the total size of alpha[t] is also N.


I’m totally into people posting sports examples! They usually come with cool plots.

The intervals in this plot: , are these like mins and maxes of round scores for each player? Or 50% intervals?

And do the orange dots here ( come from generated quantities and the green line the original data?

I was looking at “Different tournaments/different courses have different coefficients that fit SG to score”. Does that factor into the regression here: ? I think adding comments to the model there would be good (I wasn’t exactly sure what N_T is… Is it number of tournaments?)

Is it possible to plot the AR coefficients along with the score predictions for a few players?

Fun stuff!


makes sense… wasn’t sure I could do that with the mapping array. thank you!


thanks for the feedback, will definitely do a second pass at legends and more explanation. great idea for AR coef.


doesn’t seem to work, fyi.

No matches for:

real[] + real[]


Oh, if you can define alpha and tau to be vectors and then you can add them.

Either that or you can use to_vector(alpha) + to_vector(tau). Arrays and vector/matrix things are kept distinct in Stan (arrays are std::vectors, and vector/matrix things are Eigen types).


Thanks for pointing that out. Yes, indeed… took like 40% the time of
the non-vectorized version.