Thanks for the calculations! Although I’m not sure if I understand where you’re going with this. I think that scaling of the variance with sigma is not what is causing the problems of these models, as the latent observation error is on the scale of the linear predictor, where the variance is well behaved.
Here’s a real world example showing one of the datasets I’m modelling with this. What this example shows is that real world data actually does have such high sigma values. The example model has posterior values of sigma^2 of over 1, which is in the range where the negative-binomial approximation breaks down. Here’s the posterior of sigma for one of them:

x-axis logarithmical
blue: prior (normal(0,3), preliminary prior for testing purposes)
red: posterior
Here’s the data and model predictions:

Its a local-linear-slope model, year on the x-axis, number of counted individuals on the (logarithmical) y-axis.
That model works ±ok-ish, but doesn’t sample well enougth for publication quality yet on some of the datasets I’m using it for. The problems of this particular model are only partially related to this thread.
During writing this answer I realized that I made some mistakes in the previous posts:
(Note: It seems I can’t update my posts above anymore to correct the error)
Contrary to what I wrote there, as can be seen from the example model, the problem, at least with the models I currently work on, is not that sigma varies over orders of magnitude (it doesn’t!, although I think i remember I had that problem in the past with other similar models). Rather the problem in my current models is (at least that’s what I think) the correlations between the latent variables used to represent the observation level errors with other latent variabels in the model (e.g. the markov-chain representing population size in structural time series models). Thus what i wrote about the non-centered parameterisation I tried is also wrong. What I non-centered was not the observation level error, but the latent markov-chain in the structural time-series models. This works for “curvy” datasets, but when the datapoints are ± on a straight line, thus allowing for the posterior of the stepsize of population changes between the timesteps in the model to get close to zero, then I get the problem described previously that i need a noncentered parameterisation for the latent markov-chain which (i think) causes problems with the latent gaussian observation level errors because it makes the correlations between them and the non-centered markov-chain non-linear, which means that the sampler can no-longer adapt to them.
So, disregarding the particular problems I have with my current models, the motivation of this thread actually was the general pattern I saw in my work that I gravitate to the type of models explained in the opening post, which can lead to problems for various kinds of reasons. So the idea of this thread was to get rid of the latent observation level error to reduce the number of possible variable interactions, reduce dimensionality, etc, which would bring different benefits for different models.