Hi,
My dataset consists of darts thrown at different parts of a dartboard.
For some parts there is only a small amount of data. An example is the “double 1” (D1) as its likely that a skilled darts player is able to check out “earlier” on a higher score.
However, there is a enormous amount of data available for D20 and D18, the segments adjacent to D1.
I assume that if a darts players aims at D1 the following outcomes are possible: {D1, OS1, D20, OS20, D18, OS18, miss} where OS means outer single.
My idea was to borrow information from data-rich regions and create a hierarchical model.
Initially, I planned to create a basic Dirichlet-multinomial with N
observations over K
categories, but I found the following comment:
“It’d be much more efficient to code up the compound Dirichlet-multinomial by marginalizing out > the theta. That’s what you get in the beta-binomial, but we don’t have Dirichlet-multinomial built in.”
Inspired by some topics on this forum I wrote the following code:
functions {
real dirichlet_multinomial_lpmf(int[] y, vector alpha) {}
real sum_alpha = sum(alpha);
return lgamma(sum_alpha) - lgamma(sum(y) + sum_alpha)
// + lgamma(sum(y)+1) - sum(lgamma(to_vector(y)+1) // constant, may omit
+ sum(lgamma(to_vector(y) + alpha)) - sum(lgamma(alpha));
}
}
data {
int<lower=1> N; // number of target regions =3 {D1, D20 and D18)
int<lower=1> K; // possible outcomes per target region =7
array[N, K] int<lower=1> y;
}
parameters {
real<lower=0> mu;
real<lower=0> sigma;
vector<lower=0>[K] alpha;
}
model {
// prior
mu ~ lognormal(0, 5);
sigma ~ normal(0, 2.5);
alpha ~ normal(mu, sigma);
// likelihood
for (n in 1:N) {
for (k in 1:K) {
y[n, k] ~ dirichlet_multinomial(alpha[k]);
}
}
}
I read that when there is insufficient data in a hierarchical model, the variables being inferred end up having correlation effects, thereby making it difficult to sample. One solution is to obtain more data, but when this isn’t possible we should resort to reparameterization by creating a non-centered model from the centered model.
Does it make sense to reparameterize alpha
by adding z \sim std\_normal()
Finally, how would dirichlet_multinomial_rng
? look like?
Thank you