I’d like to submit a case study, Golf Putting. I’ve placed it all at http://www.stat.columbia.edu/~gelman/temp/
Author: Andrew Gelman
Keywords: sports, workflow
R Package Dependencies: rstan
License: BSD (3 clause), CC-BY
I’d like to submit a case study, Golf Putting. I’ve placed it all at http://www.stat.columbia.edu/~gelman/temp/
Author: Andrew Gelman
Keywords: sports, workflow
R Package Dependencies: rstan
License: BSD (3 clause), CC-BY
I’ve updated the case study and put it at the same web-location as above.
I’m not quite sure what to do next. On the Stan Case studies page, it says: “To contribute a case study, please contact us through the Stan Forums…” I didn’t see any place on Stan Forums specifically for contributing case studies, hence these posts on the General page.
I guess it would be good for us to have more specific directions of where to submit case studies, at least if we want to encourage more submissions.
But then we’ll also need someone to manage the case studies page. It’s kind of like being a journal editor. We’ll need a volunteer or group of volunteers to do this!
I do not have the statistical/Stan chops to function as the case studies editor. Any volunteers?
Breck
I’ll take a look. I’m also looking at Imad’s.
Hi, just to let you know, I updated the case study once again; current version is at http://www.stat.columbia.edu/~gelman/temp/
I may be commenting on an older one. I just saw that it was updated when I went to report.
Very cool case study. Definitely think we should add it.
The classical standard deviations here are standard deviations for the maximum likelihood estimator, right?
In the model, can you say y[j] is the number of successes observed in n[j] trials at distance x[j]?
You could mention that Stan vectorizes. I don’t know how much help you want to give the reader.
I don’t like that printing with comment = ‘##’, but that’s presumably the default.
You need something like options(“width” = 120) in order to avoid that wrapping in the figure. When it renders you’ll be able to scroll over. You could also drop the lp__ by including an argument: pars = c(“a”, “b”).
It’s hard to see the difference between light lines and dark lines. I’d plot the draws, then make sure to plot the other one in a different color over it if you can.
After the description, I espected a comparison involving the angle, not just the bare expression 2 * Phi(...) - 1
.
You’re assuming a lot of trig here that might be helpful to work out. But one thing that you should do is discuss that you’re measuring angle from straight in radians, right, so it’ll be 0 +/- error? And it’s the absolute value of that which needs to be below the threshold.
Oops, looks like you’re talking about degrees in the next plot.
The formula for p[j] doesn’t make sense as j never shows up in it. I think it needs x[j] instead of just x.
Is there a way to put the degrees above the line without the sigma = ? They look odd cut in half by the lines.
I don’t like saying Stan sweeps over values consistent with the posterior—can you just say Stan samples from the posterior? Or that it sweeps over values consistent with the prior (improper) and data, i.e., values drawn from the posterior? Or is there a subtle point you’re trying to make here?
Stray a in “σa” after “in preparation to estimating”
You can combine declaration and definition and should wherever possible:
vector[J] p = 2 * Phi(asin((R - r) ./ x) / sigma) - 1;
...
real sigma_degrees = sigma * 180 / pi();
Plus, we recommend spaces between all operators and unfolding as much as possible to remove parens—it really helps with readability in text.
R never ceases to amaze me with its c and list shenanigans.
For efficiency, asin((R - r) ./ x)
should be defined as trasnformed data. You might just make a note of it if you don’t want to clutter the program with a new block and constant.
You don’t need parens in (a / b) / c
—it’s equal to a / b / c
because /
(like *
and +
and -
) is left associative.
Can you come up with a better name than golf.stan and golf1.stan
? How about golf_logit
, and golf_angle
and golf_angle_distance
?
Won’t logistic regression be better at those long puts as it falls off faster?
Starting with this section, it’s getting confusing how you’re mixing computer notation (* for multiplication) and LaTeX (italic variables), as in the definition of u in the second paragraph of “A new model”). Also, some space before that sigma_distance would help—the rendering of LaTeX by MathJax is rough. Fractions might make it more readable aas they’d let you drop a layer of parens.
Same comment about declare and define in the program. And adding spaces between operators. In Stan, we can line up on next line as in most math typesetting.
There’s a dangling “Here’s the new Stan program:” after you included it a paragraph earlier.
Did you coin the folk theorem usage? If so, it’s odd for you to refer to it as “so-called”!
Same comment about width as earlier.
There’s a dangling $ in third bullet point of “new parameter estimates are:”
There’s a dangline paragraph start “Compared to the”
Is that the same new data as before that didn’t match the old data? Did you fit the new model with the old data? If so, it looks like it fits the new data better even being fit to the old data. Not impossible, but if that’s what’s going on, you should say so in order for everyone else not to wonder what’s up.
One of the other problems in your model is no priors on the sigma. Given that they get close to 0, that can be problematic.
Rather than using to_vector
, you could define the data variables as vectors and then just use ./
.
If you are going to apply to_vector
to constants, you should do it once in the transformed data block so as not to have to construct new vectors and do the division every single evaluation.
Rather than “.*”, you can just use backtick and it’ll render as computer code. That’s how you should be handling inline code anyway in markdown.
This general point about having too much data at one point is important in a lot of models, I think. It seems like we should just be able to throw more data at a problem, but if it doesn’t have the right leverage or whatever you call it, it doesn’t help and can cause you to mess up other estimates. I guess it depends on what the goal is here—get the most new predictions right or try to get a sense of accuracy versus distance. Anyway, good grist for the philosophy mill here when it comes to the “we only care about predictions” philosophy.
You need open source licenses if we’re going to publish them and a link to a repository where people can find the source. We can put the latter in example-models if you don’t already have it in your own organization. I recommend these:
Code © 2019, Trustees of Columbia University in New York, licensed under BSD-3.
Text © 2019, Andrew Gelman, licensed under CC-BY-NC 4.0.
We also generally ask people to run from a clean start and report sessionInfo() so that people will be able to see which versions of R packages (like rstan) you used to generate the output. So that’s just a single block with sessionInfo()
called and reported.
No citations? Was that note about distance on a personal correspondence?
P.S. Here’s how I’d write that final model for efficiency, though I probably wouldn’t combine all those priors into a one-liner—I just thought it’d amuse you until we get the
parameters {
real<lower = 0> sigma_, sigma_b, sigma_c;
...
model {
sigma_a, sigma_b, sigma_c ~ foo(theta)` notation into production.
thing into production. Anyway, here it is:
data {
int<lower = 0> J;
vector<lower = 0>[J] n;
vector<lower = 0>[J] x;
vector<lower = 0>[J] y;
real<lower = 0> r;
real<lower = 0> R;
real<lower = 0> overshot;
real<lower = 0> distance_tolerance;
}
transformed data {
vector[J] x_p_overshot = x + overshot;
vector[J] asins = asin((R - r) ./ x);
}
parameters {
real<lower = 0> sigma_theta;
real<lower = 0> sigma_distance;
real<lower = 0> sigma_y;
}
model {
vector[J] p_angle = 2 * Phi(asins / sigma_theta) - 1;
vector[J] denom = x_p_overshot * sigma_distance;
vector[J] p_distance
= Phi((distance_tolerance - overshot) ./ denom)
- Phi(-overshot ./ denom);
vector[J] p = p_angle .* p_distance;
y ./ n ~ normal(p, sqrt(p .* (1 - p) ./ n + sigma_y^2));
{ sigma_theta, sigma_distance, sigma_y } ~ normal(0, 1);
}
generated quantities {
real sigma_degrees = sigma_theta * 180 / pi();
}