Hi everyone,

The general approach to making models faster is, where possible, to have the parameters be closer to multivariate normal.

Suppose I have the following toy model:

```
data{
int n;
vector[n] y;
}
parameters{
real mu;
real<lower=0> sig;
}
model{
y ~ lognormal(mu, sig);
}
```

Presumably the following model is equivalent but much easier to sample from. Is this right?

```
data{
int n;
vector[n] y;
}
transformed data{
vector[n] log_y;
for(i in 1:n) log_y[i] = log(y[i]);
parameters{
real mu;
real<lower=0> sig;
}
model{
log_y ~ normal(mu, sig);
}
```

Is there any reason not to do whatâ€™s listed above in general? It seems to me that in almost all circumstances the `lognormal()`

sampling function is convenient but likely to lead to more work for NUTS.

Thanks

1 Like

everything gets unconstrained in the background anyway â€“ the key as far as I understand is then to find ways of specifying the model such that the unconstrained parameters do not vary their curvature / relation to other parameters too much across different points in the space.

1 Like

The approach you proposed is unlikely to help much. The important part for Stanâ€™s sampler is how the parameters behave. What you have shown should give exactly the same likelihood for the same parameter values. There might be a small peformance gain for precomputing `log_y`

, but that is likely to be negligible.

It would be a different story if `y`

was a parameter. Letâ€™s look at 3 cases:

A)

```
parameters{
real mu;
real<lower=0> sig;
real<lower=0> y;
}
model {
y ~ lognormal(mu, sig);
//do something more with y
}
```

B)

```
parameters{
real mu;
real<lower=0> sig;
real log_y;
}
transformed parameters {
real y = exp(log_y);
}
model {
log_y ~ normal(mu, sig);
//do something more with y
}
```

C)

```
parameters{
real mu;
real<lower=0> sig;
real log_y_raw;
}
transformed parameters {
real y = exp(log_y_raw * sig + mu);
}
model {
log_y_raw ~ normal(0, 1);
//do something more with y
}
```

Here A) and B) are basically equivalent, because for parameters with lower bound, Stan does exactly this log transform under the hood.

However, in many cases C) would be preferable to both A) and B) as `log_y_raw`

is less tangled with `mu`

and `sigma`

. This is also called the â€śnon-centered parametrizationâ€ť.

Does that make sense?