The Dirichlet doesn’t have a simple monotonic notion of diffuseness the way something like a normal does. For the normal, the larger the scale parameter sigma, the more diffuse y ~ normal(mu, sigma)
is. In the limit as sigma -> infinity
, the distribution approaches uniformity over R
; in the limit as sigma -> 0
, the distribution approaches a delta function.
The Dirichlet is different. It behaves like a generalized Beta distribution. Dirichlet(1) is the most diffuse by most measures of diffusion. For instance, it’s highest entropy because it spreads the probability mass out more evenly over the subspace of simplexes (vectors with non-negative entries that sum to one). Dirichlet(0.1) concentrates more of the mass in the corners, whereas Dirichlet(10) concentrates moreof the mass around the uniform simplex.
So I’d ask again what you’re trying to do. A Dirichlet(0.001) is very informative in the sense that it concentrates most mass on very sparse realizations (they’ll look like one-hot simplexes with a single 1 value and the rest 0 because of floating point rounding). A Dirichlet(1000) is also very informative in that it concentrates most mass very near uniform simplexes. Let’s see what that looks like in practice with
data {
int<lower = 1> K;
real<lower = 0> alpha;
}
generated quantities {
vector[K] theta = dirichlet_rng(rep_vector(alpha, K));
}
Here are the first 10 draws for alpha = 0.001.
iterations [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 3e-203 0e+00 2e-298 9e-106 1e+00 0e+00 0e+00 1e-47 0e+00 4e-279
[2,] 1e+00 0e+00 5e-279 2e-14 1e-275 0e+00 3e-285 9e-147 0e+00 0e+00
[3,] 1e-308 0e+00 1e-213 0e+00 0e+00 8e-75 0e+00 1e+00 4e-58 7e-112
[4,] 6e-166 5e-65 3e-68 3e-147 0e+00 1e+00 3e-249 0e+00 0e+00 0e+00
[5,] 2e-91 0e+00 0e+00 0e+00 1e-60 0e+00 4e-312 1e+00 0e+00 0e+00
[6,] 1e-114 0e+00 0e+00 1e-231 1e+00 1e-302 4e-67 0e+00 0e+00 3e-16
[7,] 3e-311 5e-53 3e-249 0e+00 1e+00 5e-309 0e+00 0e+00 0e+00 0e+00
[8,] 9e-267 0e+00 1e+00 0e+00 4e-20 0e+00 5e-143 4e-147 2e-90 0e+00
[9,] 1e+00 0e+00 3e-230 5e-100 0e+00 3e-234 7e-121 6e-76 0e+00 0e+00
[10,] 0e+00 3e-173 2e-96 3e-164 1e+00 0e+00 4e-257 1e-178 0e+00 2e-06
Here are the first 10 draws for alpha = 1
iterations [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0.17 0.05 0.07 0.17 0.034 0.133 0.026 0.032 0.271 0.05
[2,] 0.08 0.02 0.12 0.07 0.521 0.008 0.069 0.043 0.008 0.06
[3,] 0.02 0.03 0.22 0.29 0.171 0.096 0.086 0.002 0.051 0.03
[4,] 0.04 0.03 0.21 0.13 0.041 0.009 0.098 0.037 0.224 0.18
[5,] 0.11 0.22 0.02 0.01 0.059 0.183 0.333 0.041 0.010 0.01
[6,] 0.19 0.05 0.22 0.03 0.007 0.093 0.036 0.209 0.025 0.13
[7,] 0.01 0.14 0.18 0.14 0.128 0.051 0.119 0.092 0.077 0.05
[8,] 0.03 0.06 0.04 0.10 0.049 0.060 0.009 0.227 0.203 0.22
[9,] 0.03 0.20 0.01 0.05 0.012 0.237 0.112 0.143 0.038 0.17
[10,] 0.05 0.08 0.06 0.15 0.137 0.106 0.040 0.132 0.070 0.17
And here are the first 10 for alpha = 1000
iterations [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0.1 0.10 0.1 0.1 0.10 0.10 0.10 0.1 0.10 0.1
[2,] 0.1 0.10 0.1 0.1 0.10 0.10 0.11 0.1 0.10 0.1
[3,] 0.1 0.10 0.1 0.1 0.10 0.10 0.10 0.1 0.10 0.1
[4,] 0.1 0.10 0.1 0.1 0.10 0.10 0.10 0.1 0.10 0.1
[5,] 0.1 0.10 0.1 0.1 0.10 0.10 0.10 0.1 0.10 0.1
[6,] 0.1 0.10 0.1 0.1 0.10 0.10 0.09 0.1 0.11 0.1
[7,] 0.1 0.10 0.1 0.1 0.10 0.09 0.10 0.1 0.10 0.1
[8,] 0.1 0.09 0.1 0.1 0.10 0.10 0.10 0.1 0.10 0.1
[9,] 0.1 0.11 0.1 0.1 0.09 0.10 0.10 0.1 0.09 0.1
[10,] 0.1 0.10 0.1 0.1 0.10 0.10 0.10 0.1 0.10 0.1
As the parameter alpha
increases, the simplexes produced are increasingly uniform.
I was using this in RStan to fit:
> fit <- stan("dir.stan", data = list(K = 10, alpha = 1),
chains=1, iter=10, warmup = 0,
algorithm = "Fixed_param")
> print(extract(fit)$theta, digits=1)