The Dirichlet doesnâ€™t have a simple monotonic notion of diffuseness the way something like a normal does. For the normal, the larger the scale parameter sigma, the more diffuse `y ~ normal(mu, sigma)`

is. In the limit as `sigma -> infinity`

, the distribution approaches uniformity over `R`

; in the limit as `sigma -> 0`

, the distribution approaches a delta function.

The Dirichlet is different. It behaves like a generalized Beta distribution. Dirichlet(1) is the most diffuse by most measures of diffusion. For instance, itâ€™s highest entropy because it spreads the probability mass out more evenly over the subspace of simplexes (vectors with non-negative entries that sum to one). Dirichlet(0.1) concentrates more of the mass in the corners, whereas Dirichlet(10) concentrates moreof the mass around the uniform simplex.

So Iâ€™d ask again what youâ€™re trying to do. A Dirichlet(0.001) is very informative in the sense that it concentrates most mass on very sparse realizations (theyâ€™ll look like one-hot simplexes with a single 1 value and the rest 0 because of floating point rounding). A Dirichlet(1000) is also very informative in that it concentrates most mass very near uniform simplexes. Letâ€™s see what that looks like in practice with

```
data {
int<lower = 1> K;
real<lower = 0> alpha;
}
generated quantities {
vector[K] theta = dirichlet_rng(rep_vector(alpha, K));
}
```

Here are the first 10 draws for alpha = 0.001.

```
iterations [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 3e-203 0e+00 2e-298 9e-106 1e+00 0e+00 0e+00 1e-47 0e+00 4e-279
[2,] 1e+00 0e+00 5e-279 2e-14 1e-275 0e+00 3e-285 9e-147 0e+00 0e+00
[3,] 1e-308 0e+00 1e-213 0e+00 0e+00 8e-75 0e+00 1e+00 4e-58 7e-112
[4,] 6e-166 5e-65 3e-68 3e-147 0e+00 1e+00 3e-249 0e+00 0e+00 0e+00
[5,] 2e-91 0e+00 0e+00 0e+00 1e-60 0e+00 4e-312 1e+00 0e+00 0e+00
[6,] 1e-114 0e+00 0e+00 1e-231 1e+00 1e-302 4e-67 0e+00 0e+00 3e-16
[7,] 3e-311 5e-53 3e-249 0e+00 1e+00 5e-309 0e+00 0e+00 0e+00 0e+00
[8,] 9e-267 0e+00 1e+00 0e+00 4e-20 0e+00 5e-143 4e-147 2e-90 0e+00
[9,] 1e+00 0e+00 3e-230 5e-100 0e+00 3e-234 7e-121 6e-76 0e+00 0e+00
[10,] 0e+00 3e-173 2e-96 3e-164 1e+00 0e+00 4e-257 1e-178 0e+00 2e-06
```

Here are the first 10 draws for alpha = 1

```
iterations [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0.17 0.05 0.07 0.17 0.034 0.133 0.026 0.032 0.271 0.05
[2,] 0.08 0.02 0.12 0.07 0.521 0.008 0.069 0.043 0.008 0.06
[3,] 0.02 0.03 0.22 0.29 0.171 0.096 0.086 0.002 0.051 0.03
[4,] 0.04 0.03 0.21 0.13 0.041 0.009 0.098 0.037 0.224 0.18
[5,] 0.11 0.22 0.02 0.01 0.059 0.183 0.333 0.041 0.010 0.01
[6,] 0.19 0.05 0.22 0.03 0.007 0.093 0.036 0.209 0.025 0.13
[7,] 0.01 0.14 0.18 0.14 0.128 0.051 0.119 0.092 0.077 0.05
[8,] 0.03 0.06 0.04 0.10 0.049 0.060 0.009 0.227 0.203 0.22
[9,] 0.03 0.20 0.01 0.05 0.012 0.237 0.112 0.143 0.038 0.17
[10,] 0.05 0.08 0.06 0.15 0.137 0.106 0.040 0.132 0.070 0.17
```

And here are the first 10 for alpha = 1000

```
iterations [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0.1 0.10 0.1 0.1 0.10 0.10 0.10 0.1 0.10 0.1
[2,] 0.1 0.10 0.1 0.1 0.10 0.10 0.11 0.1 0.10 0.1
[3,] 0.1 0.10 0.1 0.1 0.10 0.10 0.10 0.1 0.10 0.1
[4,] 0.1 0.10 0.1 0.1 0.10 0.10 0.10 0.1 0.10 0.1
[5,] 0.1 0.10 0.1 0.1 0.10 0.10 0.10 0.1 0.10 0.1
[6,] 0.1 0.10 0.1 0.1 0.10 0.10 0.09 0.1 0.11 0.1
[7,] 0.1 0.10 0.1 0.1 0.10 0.09 0.10 0.1 0.10 0.1
[8,] 0.1 0.09 0.1 0.1 0.10 0.10 0.10 0.1 0.10 0.1
[9,] 0.1 0.11 0.1 0.1 0.09 0.10 0.10 0.1 0.09 0.1
[10,] 0.1 0.10 0.1 0.1 0.10 0.10 0.10 0.1 0.10 0.1
```

As the parameter `alpha`

increases, the simplexes produced are increasingly uniform.

I was using this in RStan to fit:

```
> fit <- stan("dir.stan", data = list(K = 10, alpha = 1),
chains=1, iter=10, warmup = 0,
algorithm = "Fixed_param")
> print(extract(fit)$theta, digits=1)
```