# How to re-parameterize model with large dynamic range

New to Stan/Bayesian modelling, apologies if this is too naive or not appropriate for these forums.

I have a somewhat novel model (I think), for which I obtain two measurements per object, `nmeas` and `zeta`. Across the population of objects there is an inherent `ninh` that is a (tight) Gaussian, with mean `n_mu` which I wish to estimate. I know that `ninh` is strictly between 2 and 7. The model is that

`nmeas` ~ normal(`ninh`, `zeta`)

I generate some synthetic data in python via

``````import numpy as np
rng = np.random.default_rng(seed=0) #for reproducibility
ninh = rng.normal(3.5, 0.1, size=500) #assume n_mu=3.5
zeta = rng.lognormal(1, 0.5, size=500)
nmeas = rng.normal(ninh, zeta, size=500)
data_dic = {"NN": 500,
"nmeas": nmeas,
"zeta": zeta,
}
``````

I can then recover `ninh` (`n_mu`) using the very simple Stan model

``````data {
int NN;
vector[NN] nmeas;
vector[NN] zeta;
}
parameters {
real<lower=2, upper=7> n_mu;
real<lower=2, upper=7> ninh;
}
model {
ninh ~ normal(n_mu, 0.1);
nmeas ~ normal(ninh, zeta);
}
``````

While this is all well-and-good, in reality the expected distribution of `zeta` across the population is a lognormal covering 12 orders of magnitude. That is, I need to make the above work with

``````zeta = rng.lognormal(7, 7, size=500)
``````

At this point Stan rightfully has some nasty convergence complaints for me, and fails to recover the true value of `n_mu` much of the time.

I believe the answer will lie in a smart re-parameterization of the model, but my initial attempts haven’t come up with much success. Any help would be appreciated!

It’s not just Stan. Once you get into lognormal(7, 7), you have serious dynamic range problems with 1 million draws producing values like 10^{-14} and 10^{17}.

`````` print(summary(rlnorm(1e6, 7, 7)), digits=10)
Min.         1st Qu.          Median            Mean         3rd Qu.            Max.
0.000000000e+00 1.000000000e+01 1.106000000e+03 1.153044506e+12 1.234160000e+05 3.537262218e+17
``````

The problem is that you can’t easily rescale and hope to deal with the really small and really large values.

You can also replace the compound normal with a single normal, which might help

``````model {
nmeas ~ normal(n_mu, sqrt(0.1^2 + zeta^2));
}
``````

You can keep the constraint on `n_mu`, but this assumes `ninh` is unconstrained.

You shouldn’t need the constraints if your values are distributed in those ranges. If the constraints have some bite, you can run into problems with probability mass bunching up against the boundaries and causing numerical (and statistical) problems.

Given that `zeta` is data, you can just compute the whole scale term as transformed data. With the reparameterization in terms of a single normal. This should be much easier to fit than the hierarchical model.

Interesting, thanks! You’re correct that the single normal is fit easily.

As a follow-up, I can now also estimate the previously-hard-coded spread of the inherent distribution (the `0.1` above), and can even handle some observational uncertainty on zeta (via the standard “treat zeta as missing data” approach).

Now to see whether it survives contact with the enemy (real-world data)!