Physically-based model: How to parameterize program such that one variable must be larger than another

Thank you! This solution is accurate and wicked fast!

I’m still trying to unpack what you’ve done. Apologies if this is getting cumbersome. I’m just very thirsty to learn and this thread has become a fountain of knowledge.

Logit transformation

The logit transform is kind of blowing my mind, but it seems to be the most important element of removing the divergence I was observing, so I’m just putting down some thoughts.

I’ve not tried working in transformed response space before (I’ve only ever worked with transformed independent variables), so this is a novel realm for me and I don’t entirely understand it, yet. Looking at a graphical comparison of theta versus logit(theta) below, it appears that the relationship is essentially linear, given the domain of theta:

LogitTransformOfTheta

In an effort to understand the utility of this transformation I replaced some of the code with a simple linear transform that multiplied theta by 10, effectively expanding the response space.

Became:

transformed data {
  vector[N] theta_logit = 10 * theta;
}

And

Became

model {
…
theta_logit ~ normal(10 * theta_pred, sigma);
}

Using iter = 10000, warmup = 5000, thin = 2, and chains = 4, I got the following results:

10x transform:

extract(vgBayes, pars = c("s", "r", "a", "n", "sigma", "RMSE")) %>%
purrr::map_df(mean)
      s     r     a     n sigma   RMSE
  <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
1 0.802 0.200 0.210  1.54 0.175 0.0174

Logit transform:

      s     r     a     n sigma   RMSE
  <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
1 0.802 0.201 0.203  1.56 0.100 0.0176

It appears that the logit transform is more accurate than the 10x transform, may be because it varies across zero?

Does anyone have any thoughts on why this is? Or perhaps a favorite book, website, YouTube series, or article that you’d recommend, but is still comprehensible for someone that has only taken up to linear algebra and integral calculus?

Parameters and desperate voodoo

Uninformative priors are okay?

That’s really interesting that you can get away with not explicitly estimating the r and s parameters. From the manual, this suggests an implicit uniform prior of 0 and 2, which I guess makes sense, since it is in the ball park of the expected result. I’m guessing that’s not a problem, because, as you say, the function is relative well behaved (now).

I played around with removing a and n from the model block, and I got slightly worse results, but not by much, suggesting a moderately informative prior is still useful – probably because a and n are highly correlated, so adding priors helps uncorrelated them a bit.

No need to truncate?

I also find it interesting that you can get away with no truncation. I’m guessing this is because the truncation would only come into play if the posterior creeped into the edge of the truncation values?

In any case, I really appreciate the effort you put into this @andre.pfeuffer!