A bit of a question of aesthetics/style. STAN seems to use the usual definition of the Student-T distribution, so that in code like:

```
parameters {
...
real<lower=2> nu;
real<lower=0> scale;
...
}
model {
...
Y ~ student_t(nu, loc, scale);
...
}
```

one is actually describing a distribution not with standard deviation equal to `scale`

but instead with standard deviation equal to `scale * sqrt(nu / (nu - 2))`

which is strictly greater than `scale`

. I understand that the Student-T so-defined is understood to be the fat-tailed cousin of a Normal distribution with standard deviation equal to `scale`

. But I never really understood why the convention was to formulate things so that as `nu`

shrinks (with `scale`

held constant) the result is to *both* have the tails get fatter (relative to the core of the distribution) *and* have the actual standard deviation of the distribution increase.

To that end, Iâ€™m always tempted to write this instead as:

```
parameters {
...
real<lower=2> nu;
real<lower=0> scale;
...
}
transformed parameters {
real norm_equiv_scale;
norm_equiv_scale = scale / sqrt(nu / (nu - 2));
}
model {
...
Y ~ student_t(nu, loc, norm_equiv_scale);
...
}
```

In this way, once my parameters are done fitting, I wind up with what feels like a more easily-interpreted meaning for `scale`

(namely that it should match up with the standard deviation of the data) and a more easily-interpreted meaning for `nu`

(namely that it is just a shape-of-distribution parameter that doesnâ€™t have much to do with observed standard deviation).

Put differently, I can imagine that scatter plots of `scale`

vs `nu`

in the top case could show negative correlation between the fitted parameters (because increasing `nu`

is compatible with shrinking the modeled standard deviation in that case). But I would think that same scatter plot in the second formulation should show less correlation between the parameters.

What do you think?