Can somebody suggest how to increase “informativeness” of a prior? In PK world the measurement error is pretty well defined, however in order to explore I have used a vague prior for additive sd: sigma ~ student_t(3, 0, 1). Often times the one compartmental model converges to reasonable value of sigma and gives very reasonable predictions but sometimes (most probably based on how many patients are not well described by one compartmental model) out of 4 chains one loose skinny and very distinct from others chain with unreasonable sigma values pops up. How to systemically increase “informativeness” of prior on sigma so that such loose chains don’t show up. Is student-t a good prior? The reasonable values for sigma in this application is 3ug/mL so value of 10ug/mL indicates that HMC got distracted. The loose chain doesn’t converge so I assume that increasing warmup will solve the problem but it may take a while.

Could you please advice how to simulate from sigma prior to determine
that data is covered well. In my case I have quite a bit of parameters
besides sigma.

Well, in any Bayesian inference we condition our prior on our data via the likelihood. So if you run your Stan program, but just don’t give it any data, then the program will simulate from the prior. For hierarchical models, as you are running, then you have to also simulate those new patients. Think of it like a VPC (visual predictive check) when you have no data at all.

If you are only interested in sigma_y and you seem to have good knowledge on the measurement precision, then it may suffice to just look at a histogram from draws from this student_t which you quote.

Another shortcut is to get rid of the outlying observations, then model everything with normal priors on sigma and once you got finally a good model worry again about the outliers. To leave away data during the model building step is highly arguable and people may disagree, but it is a rather “practical” way of making progress.

However, the non-linear models you are fitting are sensitive to many things! Non-linear models are only locally well behaved (and even then PK models can suffer from flip-flop). Whatever is local “enough” depends on many things. What you need to make sure in any case is that the initials are within reasonable ranges of where the parameters should land. Having said that you also should make sure to have initials which are not too close to the posterior as otherwise Rhat diagnostics are getting invalid. If things come out nasty, then this calls for reparametrizations or/and better priors.