I have been thinking about it for some days
It’s hard stuff! Welcome to the club!
OLS … So values beyond these limits are not allowed.
Because of the non identifiability that I faced using this approach
Eh, let’s just repeat the exercise and verify this with Stan. Stan’s diagnostics are usually easier to interpret than stuff you get out of OLS, though it takes some time to get used to them.
I did not get properly the idea of dividing the parameters that are extremely large by 100
Parameters are the little degrees of freedom with which the sampler explores your space. Transformed parameters are not – they just transformations of that. k3 is the parameter you expect to be on a scale of 0.02~ or something, which means that k3big (which is 100x larger than k3) will be around 2. You can do a similar transform for your very large parameters to bring them from being on a scale of like 1000 to a scale of like 1-10.
The sampler moves around on the parameter space, not the transformed space. The idea is you want to use scaling to make your parameters look as much like iid normals as possible, and you mostly don’t care what the transformed parameters look like. This sorta space is really easy for the sampler to explore cause steps in all directions are basically the same size. This is in comparison to a narrow valley, where an efficient sampler would want to take large steps along the narrow valley (of a negative log likelihood), but it would need to be careful to only move up and down the wall so much to avoid instabilities (which conceptually isn’t that bad but numerically is hard).
Stan tries to figure out this transformation for you, but it’s not a bad idea to help it, especially if you’re working with a hard model (ODEs are hard).
I read in the link that you sent me, that when one has some idea that a parameter lies within certain bounds, it could be better to drop out constraints, and instead use a distribution around those boundaries as a prior.
However, when trying such an approach, I had problems in some aspects. First, I am not able anymore to sample more than one chain.
Keep going down this route. First off, if there’s any way to shrink your model, start with that.
Also, with these sorts of dynamics models, start making posterior predictive plots and compare them with your data (this is the generated quantities stuff – look in the Stan manual – looking at the generated signals with and without noise is usually really informative). Make sure what you think is happening is actually happening. You gotta make sure your model is fitting your data in some reasonable way (it might not be!).
One chain could progress really quick, and the others simply do not make progress at all, and depending on the seed used in the stan command (is my guess) sometimes simply do not run any chain.
When Stan goes slow, it’s time to investigate the model and figure out the exact problem. To do this, usually I just starting by running single chains for short numbers of samples. You can’t believe the results of an inference until you run a bunch of chains a long time, but if you’re just searching for problems, start with the easiest thing. Once small numbers of samples seem to work, go to large numbers. Once single chains with lots of samples work, go to multiple chains (checking and fixing stuff along the way).
Your friends here are the Stan pairplots and ShinyStan. ShinyStan is really easy (basically boils down to
launch_shinystan(fit)) and it plots tons of diagnostics that might give you clues as to what is going on. Look for divergences and non-identifiabilities. If those aren’t obvious, look at the traceplots and make sure the sampler is exploring rapidly, and that the sampler isn’t reaching its max treedepth (this is associated with how many timesteps the sampler goes in HMC looking for a new sample – search around for more details, higher is worse, 10 is the default max and if you’re stuck there it’s bad).
Sorry if some of that seems confusing. May have typed too much haha. Hope it helps!