Advice on a non-linear regression with divergences

I am developing a hierarchical non-linear regression to fit data on rates of photosynthesis. The hierarchical part is that I have data to estimate the relationship between incident light and photosynthetic rate from several different individuals and multiple species. My intention is to estimate the parameters (either 3 or 4, see below) at the individual level. All of the parameters will have the same hierarchical structure, but first I have to get the basic non-linear regression working, and that’s where I need some advice.

The most widely used approach for fitting light curves uses a 4-parameter rectangular hyperbola implemented in this model:

data {
  int<lower=0> n_obs;
  vector[n_obs] A;
  vector[n_obs] Q;
  real scale_A_sat;
  real scale_Phi_A;
  real scale_R_d;
  real scale_Theta;
  real scale_sigma;
}

parameters {
  real<lower=0> A_sat;
  real<lower=0> Phi_A;
  real<lower=0> R_d;
  real<lower=0> Theta;
  real<lower=0> sigma;
}

transformed parameters {
  real LCP;
  vector[n_obs] mu_A;

  for (i in 1:n_obs) {
    mu_A[i] = (Phi_A*Q[i] + A_sat -
               sqrt((Phi_A*Q[i] + A_sat)^2 - 4*Theta*Phi_A*Q[i]*A_sat)) /
              (2*Theta) - R_d;
  }
  LCP = R_d*(R_d*Theta - A_sat)/(Phi_A*(R_d - A_sat));
}

model {
  // likelihood
  //
  A ~ normal(mu_A, sigma);

  // priors
  A_sat ~ normal(0.0, scale_A_sat);
  Phi_A ~ normal(0.0, scale_Phi_A);
  R_d ~ normal(0.0, scale_R_d);
  Theta ~ normal(0.0, scale_Theta);
  sigma ~ normal(0.0, scale_sigma);
}

generated quantities {
  vector[n_obs] A_rep;

  for (i in 1:n_obs) {
    A_rep[i] = normal_rng(mu_A[i], sigma);
  }
}

An alternative that doesn’t seem to be as widely used is a 3-parameter function reflecting Michaelis-Menten enzyme kinetics:

data {
  int<lower=0> n_obs;
  vector[n_obs] A;
  vector[n_obs] Q;
  real scale_A_sat;
  real scale_Phi_A;
  real scale_R_d;
  real scale_sigma;
}

parameters {
  real<lower=0> A_sat;
  real<lower=0> Phi_A;
  real<lower=0> R_d;
  real<lower=0> sigma;
}

transformed parameters {
  real LCP;
  vector[n_obs] mu_A;

  for (i in 1:n_obs) {
    mu_A[i] = (Phi_A*Q[i]*A_sat)/(Phi_A*Q[i] + A_sat) - R_d;
  }
}

model {
  // likelihood
  //
  A ~ normal(mu_A, sigma);

  // priors
  A_sat ~ normal(0.0, scale_A_sat);
  Phi_A ~ normal(0.0, scale_Phi_A);
  R_d ~ normal(0.0, scale_R_d);
  sigma ~ normal(0.0, scale_sigma);
}

generated quantities {
  vector[n_obs] A_rep;

  for (i in 1:n_obs) {
    A_rep[i] = normal_rng(mu_A[i], sigma);
  }
}

I’ve fit both models to the data I have, and they give very similar estimates of the 3 parameters they have in common. Unfortunately for me, the more widely used 4-parameter version of the model produces a small number of divergences (8-12 from 4 chains after 1250 iterations to warm up and 1250 to sample). The max_treedepth is set to 20, and the highest observed is 14-15. adapt_delta is set to 0.99. The divergences are not concentrated spread across the sample space in every diagnostic plot I examined through shinystan().

I am hesitant to use the full hierarchical model from the 4-parameter version, because it produces a very large number of divergences (several hundred), but at the same time, the posterior estimates seem reasonable (in the sense that they are comparable to separate individual estimates). Given that reviewers are already going to be skeptical about a Bayesian hierarchical model, I’d like to avoid using a non-standard function to describe the relationship.

Can anyone see a way to reparameterize the 4-parameter version to avoid divergences? If not, would it be unreasonable to use and report results from the 4-parameter version if I verified that they were very similar to the 3-parameter version?

If it would be helpful to work with the data I’m using and the R wrapper I’m using to compare, let me know. The data aren’t publicly available yet, but they aren’t confidential.

Kent