Determining accuracy of darts player using score of each throw and multivaratie distribution

Hi all,

This topic is a follow-up from this topic

My goal is to determine the throwing accuracy of a darts player. However, now we don’t know the xy coordinates of each thrown, we only know the observed score.

Let’s assume again that a darter aims for target region, tr, triple 20 / T20. We can divide the darts board into xy coordinates. The xy-coordinates of target µ T20 are

µ = [0, 104]

If we assume that the player is a professional, we can argue that the possible realized scores z are close to T20. Because the 5 and 1 segments of the dartboard are adjacent to the 20 segment, we say

z ∈ {T20, S20, T5, S5, T1, S1}

A visual representation of the data looks as follows:


We transpose the data, so that each row is in the form (tr, z, n)
example.csv (103 Bytes)

h represents the numerical score of the dart outcome and n represents the number of darts.

Taking the center of the dart board to be the origin (0,0), we can map each xy-coordinate to the dart score z, e.g. D16, SB, T20, S7 etc.


Next, we write the following model:

[x, y] ~ N(µ, Σ)

µ is known for each data-point but that only the result z was observed rather than the realized location (x, y) of each dart.
Σ is a covariance matrix andimage

According to the authors of this paper I need to solve the following loglikelihood function (Appendix, formula 16)

where R(z,i) is the region defined by the dart z,i, i.e


and p is the number of possible scores; 6.

Can someone help me write this likelihood function into my Stan code/attempt?

data {
	int<lower=1> n_of_scores;                                // number of possible scores
	matrix[n_of_scores, 2] scores; 							 // actual scores and frequency 
	row_vector[2] mu;							         	 // [0,104]

parameters {
	cov_matrix[2] sigma;

model {
	for (p in in 1:n_of_scores)
		scores[p][1] ~ multi_normal(mu, sigma);

In addition, can someone explain to me how we can go from a score (1-dimensional data) to calculating the covariance matrix which is two dimensional? I am struggling to see how this is achieved…


1 Like

Just to be clear - can you actually safely assume that all the throws aimed at T20? I’ve played darts only a few times, so my undesrtanding of the matter is shallow, but wouldn’t players at some point aim at different targets to hit the target score precisely? And if yes, is the data cleaned from those attempts? Because if not, than this transposes to a fundamentally more complex problem :-)

Also the paper you refer to seems to treat \mu as unknown while you treat it as known (this is a good starting point for model development). But treating \mu as unknown IMHO wouldn’t make the model fundamentally more complex.

The difficult part is how to compute P((x,y) \in R(z_i) | (x,y) \sim N(\mathbf{\mu_i}, \mathbf{\Sigma}), so let us for a moment assume we have this stored as p_scores, then the resulting likelihood is a multinomial. I assume that counts of the scores are in scores[,1]:

transformed parameters {
   simplex[n_of_scores] p_scores; //simplex, because the probabilities need to sum to 1
   //do some magic to compute p_scores

model {
   scores[,1] ~ multinomial(p_scores);

Now for many questions, we could just treat p_scores as a parameter and estimate it directly - if we want to measure accuracy as the probability that a player hits a specific area when aiming for T20, this is the answer.

The authors of the paper you linked to go a step a further and assume additional structure guiding the p_scores based on the geometry of the board. This is super interesting and potentially useful to let you estimate accuracy even for targets where you have less data and/or share information between targets. So just want to lay this out, because, as we will se, computing p_scores given mu and sigma will be challenging.

Indeed, this is the tricky part. Note however that for inference we do not need to go “from score to covariance matrix”, but instead the other way around “from covarianace matrix (and mean) to score”. The model the paper you linked to assum is that a player aims for a specific point and the point they actually hit is drawn from a 2D normal distribution.

So we are interested in the probability that a draw from a 2D normal falls into a region of interest. In mathematical words, for computing p_scores given mu we need to compute an integral of the bivariate normal distribution, i.e.:

P((x,y) \in R(z_i) | (x,y) \sim N(\mathbf{\mu_i}, \mathbf{\Sigma}) = \int_{(x,y) \in R(z_i)}{} f( (x,y), \bf{\mu_i}, \bf{\Sigma})

Where f( (x,y), \bf{\mu_i}, \bf{\Sigma}) is the bivariate normal density function.

Those integrals generally don’t have closed forms and we are dealing with some somewhat annoying geometry of the R_i regions. We most likely cannot reuse the importance sampling method from the paper because a) it is stochastic (which can break sampling) and b) because Stan needs not only the value of the integral, but also the derivative of the value wrt. mu and sigma. In theory maybe writing a C++ function that would compute a lot of samples (to reduce the noise) and autodiffing through all the samples could maybe somewhat work, but I wouldn’t bet on it.

A method for integrating a bivariate normal over general regions that should be implementable - and likely reasonably efficient - in Stan are at but I haven’t tried it myself.

Anyway, this integration will be challenging, so best of luck going forward!

1 Like

hi Martin,

Thank you for replying. You have made some very informative points and the way to go is a lot clearer now. You said that the target regions are somewhat annoying from a geometry perspective. However, I found out that the double and triple regions are so-called “annulus sectors” and the double bulls-eye is ofcourse a circle. We can calculate the area of these regions using the characteristics of the dartboard right?


The formula to calculate the area of an annulus sector is


R is the radius of the outer circle / triple wire.
r is the radius of the inner circle / triple wire.

This gives for the triples


which is equal to 258 mm^2

and for the doubles 417 mm^2 . The area of a singles region is equal to the area of a circle sector(with angle 18 and radius 162) - the area of the double region. Please let me know what you think!

That’s correct, but we don’t need just their area. What we actually need is to integrate the multivariate normal density over those areas. Possibly, the only “non-annoying” geometry in this case would be a rectangle, where you could (most likely, didn’t test this) compute the integral relatively directly only with Stan’s built-in functions. In the paper I linked to, they evaluate this integral as sum of volumes over rays, so you would need to compute intersections of arbitrary lines with the annulus sectors as a part of implementing the procedure. There are also other approaches in the literature (I don’t understand any of them really well), but in all cases, you would need to analytically operate with the geometry. This is definitely possible, but for me, this type of geometric calculations with anything more coplex than lines and maybe full circles is tedious and I make a lot of errors in both math and implementation. :-)

1 Like

If you want you could rasterize the board (i.e. break the board up into rectangular pixels) at some acceptable resolution, out to some acceptably large distance from the board. Then integration over each pixel would be relatively straightforward.

1 Like

Yes, that’s a good idea to simplify the problem a lot. Unfortunately integrating over each pixel exactly is still computationally expensive (you would need a single call to integrate_1d for each pixel, which would likely be prohibitively slow), but once you have those pixels, you could plausibly use the pixels as integration grid and assume the density is constant on each pixel (or use a bit more precise but still simple quadrature rule over a uniform grid, like the 2D trapezoid rule - see e.g. this intro: to get a reasonable approximation to the actual integral.

1 Like