I am tweaking a model and realized that I am confused about something fundamental. Reading the original Hoffman & Gelman (2014) NUTS paper, I was under the impression that what is called `accept_stat__`

in Stan is \alpha / n_\alpha in the paper, which is adapted to \delta in the dual averaging initial phase, which is in turn `delta`

for Stan.

Yet for pretty much everything I run, the average `accept_stat__`

is usually much higher than `delta = 0.8`

(the default). What am I missing?

Eg for the simplest of models (using 2.20.0):

```
Inference for Stan model: bernoulli_model
1 chains: each with iter=(1000); warmup=(0); thin=(1); 1000 iterations saved.
Warmup took (0.0085) seconds, 0.0085 seconds total
Sampling took (0.016) seconds, 0.016 seconds total
Mean MCSE StdDev 5% 50% 95% N_Eff N_Eff/s R_hat
lp__ -7.2 2.8e-02 6.3e-01 -8.4 -7.0 -6.7 509 32235 1.0e+00
accept_stat__ 0.94 2.7e-03 9.8e-02 0.73 0.98 1.0 1268 80282 1.0e+00
stepsize__ 0.88 3.8e-15 3.8e-15 0.88 0.88 0.88 1.0 63 1.0e+00
treedepth__ 1.4 1.6e-02 5.1e-01 1.0 1.0 2.0 975 61708 1.0e+00
n_leapfrog__ 2.9 5.8e-02 1.6e+00 1.0 3.0 7.0 785 49698 1.0e+00
divergent__ 0.00 0.0e+00 0.0e+00 0.00 0.00 0.00 500 31654 -nan
energy__ 7.7 4.2e-02 9.2e-01 6.8 7.4 9.6 490 31008 1.0e+00
theta 0.24 4.9e-03 1.1e-01 0.087 0.23 0.44 502 31763 1.0e+00
Samples were drawn using hmc with nuts.
For each parameter, N_Eff is a crude measure of effective sample size,
and R_hat is the potential scale reduction factor on split chains (at
convergence, R_hat=1).
```