Non-linear modelling - Michaelis-Menten

Hi all,

I’m quite new to Bayesian modelling, and have spent the past couple of months working through Statistical Rethinking. I’ve also read Bayesian Models by Hobbs and Hooten, which has really inspired me to go a bit beyond GLMMs and think a bit more about the processes that underlie the data generation process. I thought it might be fun to apply some of what I’ve learned so far to one of my own datasets, hopefully cementing some of what I’ve picked up, but I’m running into all sorts of fun new problems.

The dataset in question contains data on the mycorrhizal fungal species richness of pine seedlings. Seedlings from each of 12 genotypes (“families”) were grown in the field at each of 18 locations (“grids”), with seedlings from each family replicated 3 - 5 times at each location. Seedlings were collected after three months and the species richness of fungi was counted. Each fungal species occupies a single root tip, so we also counted the total number of root tips on each seedling (this varies quite a lot from 0 to ~500). The question we would like to answer is if pine genotypes differ in their fungal species richness.

When I worked on this data during my PhD, I modelled this using a GLMM, which gave very lacklustre results. Thinking about the data process a bit more, however, species richness is likely to saturate as the number of root tips increases, as a seedling samples fungal species from the available pool at any given location. I thought it might be fun to try and model this using a Michaelis-Menten equation, modelling V_{max} (the saturation point of the curve) as a function of genotype and location. However, I’m running into some issues fitting the model and I’m not quite sure where I’ve gone wrong.

The model I’m trying to implement is:

\begin{align} R_{ifg} &\sim Poisson(\lambda_{fg}) \\ \lambda_{fg} &= \frac{(V_{max} \times tips)}{k + tips} \\ V_{max} &= Family_f * Grid_g \\ k &\sim Lognormal(1, 1) \\ Family_f &\sim Lognormal(\mu_f, \sigma_f) \\ Grid_g &\sim Lognormal(\mu_g, \sigma_g) \\ \mu_f &\sim N(1, 1) \\ \mu_g &\sim N(1, 1) \\ \sigma_f &\sim Exponential(1)\\ \sigma_g &\sim Exponential(1)\\ \end{align}

I modelled V_{max} as the product of the Family and Grid parameters, with the idea being that the Grid parameter captures the maximum species richness at a location, but that different families may capture different proportions of this richness (if there are any differences). My understanding is that as long as both $V_{max} and k are positive, then \lambda is positive and thus I don’t need a log-link. To ensure that, I specified Lognormal distributions for Family, Grid and k, and set lower bounds for these values at 0.0001.

I have implemented this model as follows, using ulam in the rethinking package:

SeedMod <- ulam(
  alist(
    richness ~ dpois(lambda),
    lambda <- (vmax*tips) / (k + tips),
    vmax <- fam[family]*gr[grid],
    k ~ dlnorm(1, 1),
    fam[family] ~ dlnorm(fam_a, fam_sig),
    gr[grid] ~ dlnorm(grid_a, grid_sig),
    fam_a ~ dnorm(1, 1),
    fam_sig ~ dexp(1),
    grid_a ~ dnorm(1, 1),
    grid_sig ~ dexp(1)
  ), data = with(fungi_data, list(richness = Richness,
                                            family = as.integer(as.factor(Family)),
                                            grid = as.integer(as.factor(Grid)),
                                            tips = Tot_tips/max(Tot_tips))),
  cores = 4, chains = 4, log_lik = TRUE, cmdstan = TRUE,
  constraints = list(fam="lower=0.0001", gr="lower=0.0001", k = "lower=0.0001"),
)

Which produces the following Stan code:

data{
    int richness[673];
    vector[673] tips;
    int grid[673];
    int family[673];
}
parameters{
    real<lower=0.0001> k;
    vector<lower=0.0001>[12] fam;
    vector<lower=0.0001>[18] gr;
    real<lower=0> fam_a;
    real<lower=0> fam_sig;
    real<lower=0> grid_a;
    real<lower=0> grid_sig;
}
model{
    vector[673] lambda;
    vector[673] vmax;
    grid_sig ~ exponential( 1 );
    grid_a ~ lognormal( 1 , 1 );
    fam_sig ~ exponential( 1 );
    fam_a ~ lognormal( 1 , 1 );
    gr ~ lognormal( grid_a , grid_sig );
    fam ~ lognormal( fam_a , fam_sig );
    k ~ lognormal( 1 , 1 );
    for ( i in 1:673 ) {
        vmax[i] = fam[family[i]] * gr[grid[i]];
    }
    for ( i in 1:673 ) {
        lambda[i] = (vmax[i] * tips[i])/(k + tips[i]);
    }
    richness ~ poisson( lambda );
}
generated quantities{
    vector[673] log_lik;
    vector[673] lambda;
    vector[673] vmax;
    for ( i in 1:673 ) {
        vmax[i] = fam[family[i]] * gr[grid[i]];
    }
    for ( i in 1:673 ) {
        lambda[i] = (vmax[i] * tips[i])/(k + tips[i]);
    }
    for ( i in 1:673 ) log_lik[i] = poisson_lpmf( richness[i] | lambda[i] );
}

However, the model fails - although it compiles fine, it fails to find starting values for all chains, and just dies:

Chain 4 Rejecting initial value:
Chain 4   Gradient evaluated at the initial value is not finite.
Chain 4   Stan can't start sampling from this initial value.
Chain 4 Initialization between (-2, 2) failed after 100 attempts. 
Chain 4  Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
Chain 4 Initialization failed.
Warning: Chain 4 finished unexpectedly!

Where have I gone wrong here? The model samples OK if I include a log-link (with about 1% of transitions divergent), but dies completely without it, which suggests to me that it’s trying out impossible values for \lambda. However, from my understanding, with the bounds I specified, this shouldn’t happen? Any insight anyone might have would be really invaluable!

2 Likes

Can multiple fungus species occupy a single root tip, or is it the case that each tip will have either zero or one species?

Technically, yes- you an get multiple fungi on a single root, but for the purposes of the model (and with the limitations of our identification process!) there is only one species per root tip. Root tips also don’t necessarily have to have any fungi on them at all.

Are there any data points with 0 tips but nonzero richness?

None!

Try (if you haven’t already) removing points with zero tips and zero richness (they aren’t informative anyway, and I think they might be producing infinite gradients).

That worked a treat - I get samples now! Still getting quite a few divergent transitions (may need to think about the parameterisation here), but at least it runs.

Can I ask how these data points would produce infinite gradients?

2 Likes

I’m not entirely sure, but my guess is that some of the functions used in calculating the Poisson log probability mass function (lpmf) do weird things if lambda is zero. Note that in most applications, lambda will be declared with a lower-bound constraint of zero, and so it will never actually reach a literal zero during sampling.

Stan doesn’t “understand” that lambda[i] is a constant when tips[i] is zero; it still takes a gradient of the lpmf with respect to lambda. My guess is that even though the gradient of the Poisson lpmf for an observed zero with respect to lambda is finite at lambda = 0, the implementation that Stan uses isn’t designed to handle the case where lambda literally reaches zero.

I don’t really know C++ and so I can’t take the time now to really understand the code, but my best guess is that the multiply_log function here goes haywire or is discontinuous when given zero as an argument.

Edit:
Here’s the c++ for multiply_log, which suggests that the above explanation is probably the correct one:

2 Likes

Thanks, that makes sense!

Thanks for all your help, with a bit more finagling I have a model that works… okay-ish! Certainly well enough for me to move on from this toy example. Confirms the expectation that there is no evidence for a genotype effect, so I’m happy.

Strange, because multiply_log() exists specifically to avoid problems with 0\times \log 0. You’re right though, the gradient divides by zero when it shouldn’t

1 Like

Does this look like a bug, or just an unfortunate limitation of the implementation?

It’s a bug. The gradient is uniformly zero when a is constant and zero.
We should just special-case a=0.0 like we do with a=1.0

1 Like

I’ll open an issue

Edit:

3 Likes

@prototaxites really appreciate your curiosity here! It led somewhere useful.

3 Likes