Troubleshooting ODE model

I have an ODE model that runs without errors, recovers chosen parameters of synthetic data, and has a decent fit to real data. In this code, I solved and wrote out the solutions in closed form.
But I now want to add more terms to my system of ODEs. As a first step, I rewrote my Stan code to solve the original ODEs numerically with Stan’s integrate_ode_* functions. Alas, my model fails with the message of “Rejecting initial value: Gradient evaluated at the initial value is not finite”. Perhaps you can spot the bug?
FYI, although my first set of code appears to run correctly, the presence of a calculus error would not surprise me.

Rstan: 2.19.1
R: 3.4.4 Platform x86_64-pc-linux-gnu (64-bit)
Operating system: Ubuntu 16.04.5 LTS (GNU/Linux 4.4.0-78-generic x86_64)

Original code:

functions {
    real exp_dgamma(real x, real xc, real[] theta, real[] x_r, int[] x_i){
        real m = theta[1];
        real a = theta[2];
        real b = theta[3];
        return exp((m-b)*x + (a-1)*log(x) + a*log(b) - lgamma(a));
    }   
}
data {
    int nData; //Number of measurements
    int nFiltd; //Number of measurements for which filtrate was measured. Less/equal to nData.
    int nStrains; //Number of distinct strains, currently 2.
    int nMolec; //Number of chem species measured, currently 2.
    int nExperi; //Number of experiments (one per batch culture)

    //The predictors
    int Strain[nData];
    int ReverseTx[nData];
    int BatchCulture[nData];
    real<lower=0> Inoc[nData]; //Inoculum, expected initial # of live cells
    real<lower=0> Time[nData]; //Duration, of exposure
    real<lower=0> Drug[nData]; //Drug concentration

    //The regression outputs
    real<lower=0> AllX_obs[nData]; 
    real<lower=0> Dead_obs[nFiltd];
}
transformed data{
    real machineeps = 1e-15; //2.22044604925e-16

    real data_r[2] = {0.0, 0.0}; //dummy variable to satisfy integrate_1d()
    int data_i[2] = {0, 0}; //dummy variable to satisfy integrate_1d()
}
parameters {
    real<lower=0> mu; 
    real<lower=1> alpha;
    real<lower=0> Bmax;
    real<lower=0> hill;
    real<lower=0> EC50[nStrains];
    real<lower=0> InocErr[nExperi]; //The error in Inoc due/shared by a batch.

    real<lower=0> amp[nMolec]; //Amplification efficiency
    real<lower=0> noise_var[nMolec]; //constant variance, fixed amount of noise.
    real<lower=0> noise_cv; //constant coefficient of variance.
}
transformed parameters {
    real<lower=0> beta[nData];
    real<lower=0> live[nData];
    real<lower=0> dead[nData];
    real<lower=0> allX[nData];

    real<lower=0> noise_sd[2];

    noise_sd[1] = sqrt(noise_var[1]);
    noise_sd[2] = sqrt(noise_var[2]);

    for (i in 1:nData){
        beta[i] = Bmax * pow(Drug[i], hill) / (pow(Drug[i], hill) + pow(EC50[Strain[i]], hill));

        live[i] = exp(Time[i]*mu) * (1 - gamma_p(alpha, (beta[i]*Time[i]) ));

        if (Time[i] < machineeps){
            dead[i] = 0;
        } else
        if (beta[i] < machineeps){
            dead[i] = 0;
        } else
        if (beta[i] > (mu + machineeps)){ //Separating this case from the case below is optional.
            dead[i] = ( exp(alpha * ( log(beta[i]) - log(beta[i] - mu) ))
                    * gamma_p(alpha, ((beta[i] - mu) * Time[i])) );
        } else {
            dead[i] = ( integrate_1d(exp_dgamma, 0, Time[i],
                        { mu, alpha, beta[i] },
                        data_r, data_i, 1e-6) );
        }
        allX[i] = live[i] + dead[i];
    }
}
model {
    real L0[nData]; //<lower=0>
    real Dead_amp[nFiltd];
    real AllX_amp[nData];

    alpha ~ gamma( (3.0/6)^2, 3.0/(6^2));
    Bmax ~ gamma( (2.0/2)^2, 2.0/(2^2) );
    hill ~ gamma( (2.0/1)^2, 2.0/(1^2) );

    EC50[1] ~ gamma( (0.03125/0.1)^2, 0.03125/(0.1^2) );
    EC50[2] ~ gamma( (4.0/8)^2, 4.0/(8^2) ); //Ec38

    amp[1] ~ gamma( (1.0/0.5)^2, 1.0/(0.5^2) );
    amp[2] ~ gamma( (1/2.0)^2, 3e4/(3e4*2)^2 );
    noise_var[1] ~ inv_gamma( 2+(5.0/20)^2, (1+(5.0/20)^2)*5.0 );
    noise_var[2] ~ inv_gamma( 2+(20.0/40)^2, (1+(20.0/40)^2)*20.0 );
    noise_cv ~ gamma( (0.05/0.1)^2, 0.05/(0.1^2) );

    InocErr ~ gamma( 3, 3 );

    for (i in 1:nData){
        L0[i] = Inoc[i]*InocErr[BatchCulture[i]];
    }
    for (i in 1:nData){
        AllX_amp[i] = allX[i]*amp[ReverseTx[i]]*L0[i];
        AllX_obs[i] ~ normal(AllX_amp[i], (noise_cv*AllX_amp[i] + noise_sd[ReverseTx[i]]));
    }
    for (i in 1:nFiltd){
        Dead_amp[i] = dead[i]*amp[ReverseTx[i]]*L0[i];
        Dead_obs[i] ~ normal(Dead_amp[i], (noise_cv*Dead_amp[i] + noise_sd[ReverseTx[i]]));
    }
}

New code:

functions {
    real[] ODEstep(real time, real[] state, real[] theta, real[] data_r, int[] data_i){
        real mu = theta[1];
        real alpha = theta[2];
        real beta = theta[3];

        real Live = state[1];
        real Dead = state[2];

        real hazard;
        real dYdt[2];

        hazard = exp( (alpha-1)*log(time) - lgamma(alpha) - beta*time
                + alpha*log(beta) - log1m(gamma_p(alpha, beta*time)) );

        dYdt[1] = (mu - hazard) * Live;
        dYdt[2] = (hazard) * Live;

        return dYdt;
    }   
    real[] ODE_wrapper(real[] init_system, real t0, real[] times, real[] theta, real[] data_r, int[] data_i){
        real results[1, 2]; //Because integrate_ode only returns real[,], I need a [1,2] array, not a [2] array.
        results = integrate_ode_adams(ODEstep, init_system, t0, times, theta, data_r, data_i);
        //print(results);
        return {results[1, 1], results[1, 2]};
    }   
}
data {
    int nData; //Number of measurements
    int nFiltd; //Number of measurements for which filtrate was measured. Less/equal to nData.
    int nStrains; //Number of distinct strains, currently just 1:2.
    int nMolec; //Number of chem species measured, currently 1:2.
    int nExperi; //Number of experiments (one per batch), about 18 batches.

    //The predictors
    int Strain[nData];
    int ReverseTx[nData];
    int BatchCulture[nData];
    real<lower=0> Inoc[nData]; //Inoculum, expected initial # of live cells
    real<lower=0> Time[nData]; //Duration, of exposure
    real<lower=0> Drug[nData]; //Drug concentration

    //The regression outputs
    real<lower=0> AllX_obs[nData]; 
    real<lower=0> Dead_obs[nFiltd];
}
transformed data{
    real machineeps = 1e-15; //2.22044604925e-16

    real data_r[2] = {0.0, 0.0}; //dummy variable to satisfy integrate_1d() signature
    int data_i[2] = {0, 0}; //dummy variable to satisfy integrate_1d()
}
parameters {
    real<lower=0> mu;
    real<lower=0> alpha;
    real<lower=0> Bmax;
    real<lower=0> hill;
    real<lower=0> EC50[nStrains]; //Effective concentration 50
    real<lower=0> InocErr[nExperi]; //The error in Inoc due to different batches.

    real<lower=0> amp[nMolec]; //Amplification efficiency
    real<lower=0> noise_var[nMolec]; //constant variance, fixed amount of noise.
    real<lower=0> noise_cv; //constant coefficient of variance.
}
transformed parameters {
    real<lower=0> noise_sd[2] = sqrt(noise_var);

    real<lower=0> beta[nData];
    real<lower=0> L0[nData]; //Initial amount of live cells. Can later generalize as init_system[nData, 2];
    real system[nData, 2]; //<lower=0> 1=Live 2=Dead

    for (i in 1:nData){
        beta[i] = Bmax * pow(Drug[i], hill) / (pow(Drug[i], hill) + pow(EC50[Strain[i]], hill));
        L0[i] = Inoc[i]*InocErr[BatchCulture[i]];

        if (Time[i] < machineeps){
            system[i, 1] = L0[i];
            system[i, 2] = 0;
        } else if (beta[i] < machineeps){ //Trying to avoid log(0) in the ODE. Even if exp(-Inf) becomes 0, gradient may be Inf?
            system[i, 1] = L0[i] * exp(Time[i]*(mu));
            system[i, 2] = 0;
        } else {
            system[i, :] = ODE_wrapper({L0[i], 0}, //initial state, L0 live cells, 0 dead cells.
            0, //init_time
            {Time[i]}, //times
            {mu, alpha, beta[i]}, //theta
            data_r, data_i); //no data is used because I calculate beta[i] outside of the ODE.
        }
    }
}
model {
    vector[nData] AllX;
    vector[nFiltd] Dead;
    vector[nData] AllX_amp;
    vector[nFiltd] Dead_amp;

    mu ~ gamma( (0.024/0.003)^2, 0.024/(0.003^2) );

    alpha ~ gamma( (3.0/6)^2, 3.0/(6^2) );
    Bmax ~ gamma( (2.0/2)^2, 2.0/(2^2) );
    hill ~ gamma( (2.0/1)^2, 2.0/(1^2) );

    EC50[1] ~ gamma( (0.03125/0.1)^2, 0.03125/(0.1^2) ); //K12
    EC50[2] ~ gamma( (4.0/8)^2, 4.0/(8^2) ); //Ec38

    amp[1] ~ gamma( (1.0/0.5)^2, 1.0/(0.5^2) );
    amp[2] ~ gamma( (1/2.0)^2, 3e4/(3e4*2)^2 );

    noise_var[1] ~ inv_gamma( 2+(5.0/20)^2, (1+(5.0/20)^2)*5.0 );
    noise_var[2] ~ inv_gamma( 2+(20.0/40)^2, (1+(20.0/40)^2)*20.0 );
    noise_cv ~ gamma( (0.05/0.1)^2, 0.05/(0.1^2) );

    InocErr ~ gamma( 3, 3 );

    for (i in 1:nData){
        AllX[i] = system[i, 1] + system[i, 2];
        AllX_amp[i] = AllX[i]*amp[ReverseTx[i]]; //*L0[i];
        AllX_obs[i] ~ normal(AllX_amp[i], (noise_cv*AllX_amp[i] + noise_sd[ReverseTx[i]]));
    }
    //{
    //    real logP = normal_lpdf( AllX_obs[i] | AllX_amp[i], (noise_cv*AllX_amp[i] + noise_sd[ReverseTx[i]]) );
    //    if (i == 125){
    //    //if (logP < log(0.0001)){
    //        print("low prob AllX ", i);
    //        print(system[i, :], " ", AllX_obs[i]);
    //        print(L0[i], " ", InocErr[BatchCulture[i]], " ", Inoc[i], " Time ", Time[i], " ", Drug[i],
    //                " beta ", beta[i], " ", mu, " ", kill, " ", alpha, " ", Bmax, " ", hill, " ", EC50[Strain[i]],
    //                " amp ", amp[ReverseTx[i]], " ", noise_var[ReverseTx[i]], " ", noise_cv);
    //    }
    //}
    //print("Total_amp lpdf"); //The total log posterior is around -1e16 during initialization for all data points, is not Inf or -Inf.
    //print( normal_lpdf(AllX_obs | AllX_amp, (noise_cv*AllX_amp + noise_sd_expand)) );

    for (i in 1:nFiltd){
        Dead[i] = system[i, 2];
        Dead_amp[i] = Dead[i]*amp[ReverseTx[i]];
        Dead_obs[i] ~ normal(Dead_amp[i], (noise_cv*Dead_amp[i] + noise_sd[ReverseTx[i]]));
    }
}

Example data set (with Rstan)

nMolec = length(amp)
strip = 8 
nRepl = 3 
nExperi = 11*2*nRepl
nPCRs = nExperi*nMolec
nData = strip*nPCRs

ReverseTx = rep(c(1, 2), each=nData/nMolec)
Inoc = rep(c(
          rep(c(1e4, 3e3, 1e3, 3e2, 1e2, 30, 10, 0), length.out=strip*nRepl),
          rep(c(1e4, 3e3, 1e3, 3e2, 1e2, 30, 10, 0), length.out=strip*nRepl),
          rep(c(1e4, 3e3, 1e3, 3e2, 1e2, 30, 10, 0), length.out=strip*nRepl),
          rep(c(1e4, 3e3, 1e3, 3e2, 1e2, 30, 10, 0), length.out=strip*nRepl),
          rep(c(1e3), length.out=strip*nRepl),
          rep(c(1e3), length.out=strip*nRepl),
          rep(c(1e3), length.out=strip*nRepl),
          rep(c(1e3), length.out=strip*nRepl),
          rep(c(1e3), length.out=strip*nRepl),
          rep(c(1e3), length.out=strip*nRepl),
          rep(c(1e3), length.out=strip*nRepl),
          rep(c(1e4, 3e3, 1e3, 3e2, 1e2, 30, 10, 0), length.out=strip*nRepl),
          rep(c(1e4, 3e3, 1e3, 3e2, 1e2, 30, 10, 0), length.out=strip*nRepl),
          rep(c(1e4, 3e3, 1e3, 3e2, 1e2, 30, 10, 0), length.out=strip*nRepl),
          rep(c(1e4, 3e3, 1e3, 3e2, 1e2, 30, 10, 0), length.out=strip*nRepl),
          rep(c(1e3), length.out=strip*nRepl),
          rep(c(1e3), length.out=strip*nRepl),
          rep(c(1e3), length.out=strip*nRepl),
          rep(c(1e3), length.out=strip*nRepl),
          rep(c(1e3), length.out=strip*nRepl),
          rep(c(1e3), length.out=strip*nRepl),
          rep(c(1e3), length.out=strip*nRepl)
          ), times=nMolec)
BatchCulture = rep(rep(1:nExperi, each=strip), times=nMolec)
theInocErrs = rep(c(1000, 100, 10, 0), length.out=nExperi)
InocErr = rep(rep(theInocErrs, each=strip), times=nMolec)
Time = rep(c(
         rep(c(60), length.out=strip*nRepl),
         rep(c(60), length.out=strip*nRepl),
         rep(c(45), length.out=strip*nRepl),
         rep(c(45), length.out=strip*nRepl),
         rep(c(0, 10, 20, 30, 45, 60, 90, 120), length.out=strip*nRepl),
         rep(c(0, 10, 20, 30, 45, 60, 90, 120), length.out=strip*nRepl),
         rep(c(0, 10, 20, 30, 45, 60, 90, 120), length.out=strip*nRepl),
         rep(c(0, 10, 20, 30, 45, 60, 90, 120), length.out=strip*nRepl),
         rep(c(0, 10, 20, 30, 45, 60, 90, 120), length.out=strip*nRepl),
         rep(c(0, 10, 20, 30, 45, 60, 90, 120), length.out=strip*nRepl),
         rep(c(0, 10, 20, 30, 45, 60, 90, 120), length.out=strip*nRepl),
         rep(c(60), length.out=strip*nRepl),
         rep(c(60), length.out=strip*nRepl),
         rep(c(45), length.out=strip*nRepl),
         rep(c(45), length.out=strip*nRepl),
         rep(c(0, 10, 20, 30, 45, 60, 90, 120), length.out=strip*nRepl),
         rep(c(0, 10, 20, 30, 45, 60, 90, 120), length.out=strip*nRepl),
         rep(c(0, 10, 20, 30, 45, 60, 90, 120), length.out=strip*nRepl),
         rep(c(0, 10, 20, 30, 45, 60, 90, 120), length.out=strip*nRepl),
         rep(c(0, 10, 20, 30, 45, 60, 90, 120), length.out=strip*nRepl),
         rep(c(0, 10, 20, 30, 45, 60, 90, 120), length.out=strip*nRepl),
         rep(c(0, 10, 20, 30, 45, 60, 90, 120), length.out=strip*nRepl)
         ), times=nMolec)
Drug = rep(c(
            rep(c(1), length.out=strip*nRepl),
            rep(c(0), length.out=strip*nRepl),
            rep(c(1), length.out=strip*nRepl),
            rep(c(0), length.out=strip*nRepl),
            rep(c(1), length.out=strip*nRepl),
            rep(c(0.5), length.out=strip*nRepl),
            rep(c(0.25), length.out=strip*nRepl),
            rep(c(0.0625), length.out=strip*nRepl),
            rep(c(0.03125), length.out=strip*nRepl),
            rep(c(0), length.out=strip*nRepl),
            rep(c(8), length.out=strip*nRepl),

            rep(c(1), length.out=strip*nRepl),
            rep(c(0), length.out=strip*nRepl),
            rep(c(1), length.out=strip*nRepl),
            rep(c(0), length.out=strip*nRepl),
            rep(c(1), length.out=strip*nRepl),
            rep(c(0.5), length.out=strip*nRepl),
            rep(c(0.25), length.out=strip*nRepl),
            rep(c(0.0625), length.out=strip*nRepl),
            rep(c(0.03125), length.out=strip*nRepl),
            rep(c(0), length.out=strip*nRepl),
            rep(c(8), length.out=strip*nRepl)
            ), times=nMolec)
Strain = rep(c(
               rep(1, length.out=strip*nExperi/2),
               rep(2, length.out=strip*nExperi/2)
               ), times=nMolec)
nStrains = length(unique(Strain))
MIC = EC50[Strain]

exp_dgamma = function(t, m, a, b){ 
    logY = (m-b)*t + (a-1)*log(t) + a*log(b) - lgamma(a)
    return(exp(logY))
}
beta = Bmax * (Drug^hill) / ((Drug^hill) + (MIC^hill))
live = exp(mu * Time) * pgamma(alpha, (beta * Time))
dead = rep(NA, nData)
for (i in 1:nData){
    dead[i] = integrate(exp_dgamma, 0, Time[i], mu, alpha, beta[i])$value
}

allX = live + dead

L0 = Inoc * InocErr

AllX = allX*amp[ReverseTx]*rpois(nData, L0) 
AllX_obs = abs( rnorm( nData, AllX, (AllX*noise_cv + sqrt(noise_var[ReverseTx])) ) ) 

Dead = dead*amp[ReverseTx]*rpois(nData, L0) 
Dead_obs = abs( rnorm( nData, Dead, (Dead*noise_cv + sqrt(noise_var[ReverseTx])) ) ) 

datalist = list(nData=nData,
                nFiltd=nData,
                AllX_obs=AllX_obs,
                Dead_obs=Dead_obs,
                nStrains=nStrains,
                nExperi=nExperi,
                Strain=Strain,
                BatchCulture=BatchCulture,
                Inoc=Inoc,
                Time=Time,
                Drug=Drug
                )

The ODE system:
\frac{dL}{dt} = \mu L - h L
\frac{dD}{dt} = h L
initial conditions: L\left(0\right) = L_0 ≥ 0 and D\left(0\right) = 0.
where h = \frac{\beta^\alpha t^\left(\alpha - 1 \right) e^{-\beta t}}{\Gamma [ \alpha, \beta t ] } is the hazard rate function of a gamma distributed random variable.
For clarity, \Gamma [ \alpha, \beta t ] = \int_{\beta t}^{\infty} \tau^\left(\alpha - 1 \right) e^{-\tau}d\tau is the upper incomplete gamma function evaluated at \alpha and \beta t.

My closed form solutions, as written in the original code, can be found by noticing that both equations are separable ODEs. WolframAlpha also agrees with me:
L(t) = L_0 e^{\mu t} Q[\alpha, \beta t ]
D(t) = L_0 \int_{0}^{t} e^{\mu \tau} \left( \frac{\beta^\alpha \tau^\left(\alpha - 1 \right) e^{-\beta \tau}}{\Gamma [ \alpha ] } \right) d\tau

When \beta > \mu, then D = \left( \frac{\beta}{\beta - \mu} \right)^{\alpha} L_0 P[\alpha, \left(\beta - \mu \right) t ]

For clarity of notation, Q[\alpha, z ] = \frac{\Gamma [ \alpha, z ] }{\Gamma [ \alpha ] } = \frac{\int_{z}^{\infty} \tau^\left(\alpha - 1 \right) e^{-\tau}d\tau}{\int_{0}^{\infty} \tau^\left(\alpha - 1 \right) e^{-\tau}d\tau} is the regularized upper incomplete gamma function (and the survival function of a gamma distributed random variable), while P[\alpha, z ] = 1 - Q[\alpha, z ] = \frac{\int_{0}^{z} \tau^\left(\alpha - 1 \right) e^{-\tau}d\tau}{\int_{0}^{\infty} \tau^\left(\alpha - 1 \right) e^{-\tau}d\tau} is the regularized lower incomplete gamma function and the cumulative density function of the gamma distribution.

Thanks in advance.

1 Like

The Adams solver is broken from my experience for Stan< 2.24. In 2.24 we changed how error control is handled and then it starts working. So for Stan 2.19, please try RK45 and if that does not work go with the bdf solver. Does that help?

1 Like

I’ve tried the rk45, adams, and bdf solver, but the situation appears the same. [With the BDF solver only, the ODE solution sometimes went below 0 (e.g. -1e-15), triggering a warning because I had declared real<lower=0> system[nData, 2]. In the code above, I took away the <lower=0> and again encountered endless “Rejecting initial value”.]
I used expose_stan_functions() to test the behavior of ODE_wrapper() and it gave me numeric results that qualitatively matched my expectations.
I noticed that 2.24’s ode solver interface changed significantly. Do you recommend I embark on a Stan upgrade/re-installation?

Ah… if that’s the problem, then: The ODE solvers only control positivity up to the absolute error which you specify (or is set as default). What works in practice is to set the ODE solver to a sensible small absolute error which is below what you consider as “0” in your problem. Then you always just add some small number to the output of the ODE solver which is above that, but still basically “0”. I don’t know your problem, but presume you can measure with a precision of 1E-3 your outcome (numbers below that would mean in practice “0”) and to get stable solves of the ODE you may need 1E-6 as absolute error. In that case you run the solver with 1E-6 as absolute tolerance and simply add 1E-5 (or even 1E-4) to all the outputs from the ODE solver. This way things are still “0” in the sense of your problem, but you never go below 0.

The alternative is to represent the ODE in the form of the log of your state (you have dx/dt which you put to 1/x dx/dt => d(log(x))/dt)… but that is not necessarily a better solution…

1 Like

Unfortunately, adding absolute tolerance didn’t seem to solve the problem. If the gradient is not finite, how should I go about debugging? Is there a way to print out the gradient or intermediate calculations?

I also passed initial parameter values to stan(), without a change in output.

Though I more suspect an indexing error or wrong math, is there an online tutorial on how the autodiff part of Stan works?

An update:

If I replace the formula for the ‘hazard’ variable in the new code with hazard = beta;, which behaves as a special case of my ODE model, then Stan begins sampling. So, the problem occurs when calculating the gradient of that expression:

exp( (alpha-1)*log(time) - lgamma(alpha) - beta*time + alpha*log(beta) - log1m(gamma_p(alpha, beta*time)) );

Any ideas?

Can u post the current Stan model with the absolute tolerances thing?

functions {
    real[] ODEstep(real time, real[] state, real[] theta, real[] data_r, int[] data_i){
        real mu = theta[1];
        real alpha = theta[2];
        real beta = theta[3];

        real Live = state[1];
        real Dead = state[2];

        real hazard;
        real dYdt[2];

        //hazard = exp( (alpha-1)*log(time) - lgamma(alpha) - beta*time
        //        + alpha*log(beta) - log1m(gamma_p(alpha, beta*time)) );
        hazard = beta; //Here beta is taking on a different meaning than in the above hazard rate, but the ODEs should still be coherent. In the synthetic data, one should set alpha=1 to match the model.

        dYdt[1] = (mu - hazard) * Live;
        dYdt[2] = (hazard) * Live;

        return dYdt;
    }   
    real[] ODE_wrapper(real[] init_system, real t0, real[] times, real[] theta, real[] data_r, int[] data_i){
        real results[1, 2]; //Because integrate_ode only returns real[,], I need a [1,2] array, not a [2] array.
        results = integrate_ode_bdf(ODEstep, init_system, t0, times, theta, data_r, data_i, 1e-8, 1e-9, 1e6);
        //print(results);
        results[1, 1] = results[1, 1] + 1e-8;
        results[1, 2] = results[1, 2] + 1e-8;
        return {results[1, 1], results[1, 2]};
    }   
}
data {
    int nData; //Number of measurements
    int nFiltd; //Number of measurements for which filtrate was measured. Less/equal to nData.
    int nStrains; //Number of distinct strains, currently just 1:2.
    int nMolec; //Number of chem species measured, currently 1:2.
    int nExperi; //Number of experiments (one per batch), about 18 batches.

    //The predictors
    int Strain[nData];
    int ReverseTx[nData];
    int BatchCulture[nData];
    real<lower=0> Inoc[nData]; //Inoculum, expected initial # of live cells
    real<lower=0> Time[nData]; //Duration, of exposure
    real<lower=0> Drug[nData]; //Drug concentration

    //The regression outputs
    real<lower=0> AllX_obs[nData]; 
    real<lower=0> Dead_obs[nFiltd];
}
transformed data{
    real machineeps = 1e-15; //2.22044604925e-16

    real data_r[2] = {0.0, 0.0}; //dummy variable to satisfy integrate_ode() signature
    int data_i[2] = {0, 0}; //dummy variable to satisfy integrate_ode()
}
parameters {
    real<lower=0> mu;
    real<lower=0> alpha;
    real<lower=0> Bmax;
    real<lower=0> hill;
    real<lower=0> EC50[nStrains]; //Effective concentration 50
    real<lower=0> InocErr[nExperi]; //The error in Inoc due to different batches.

    real<lower=0> amp[nMolec]; //Amplification efficiency
    real<lower=0> noise_var[nMolec]; //constant variance, fixed amount of noise.
    real<lower=0> noise_cv; //constant coefficient of variance.
}
transformed parameters {
    real<lower=0> noise_sd[2] = sqrt(noise_var);

    real<lower=0> beta[nData];
    real<lower=0> L0[nData]; //Initial amount of live cells. Can later generalize as init_system[nData, 2];
    real system[nData, 2]; //<lower=0> 1=Live 2=Dead

    for (i in 1:nData){
        beta[i] = Bmax * pow(Drug[i], hill) / (pow(Drug[i], hill) + pow(EC50[Strain[i]], hill));
        L0[i] = Inoc[i]*InocErr[BatchCulture[i]];

        if (Time[i] < machineeps){
            system[i, 1] = L0[i];
            system[i, 2] = 0;
        } else if (beta[i] < machineeps){ //Trying to avoid log(0) in the ODE. Even if exp(-Inf) becomes 0, gradient may be Inf?
            system[i, 1] = L0[i] * exp(Time[i]*(mu));
            system[i, 2] = 0;
        } else {
            system[i, :] = ODE_wrapper({L0[i], 0}, //initial state, L0 live cells, 0 dead cells.
            0, //init_time
            {Time[i]}, //times
            {mu, alpha, beta[i]}, //theta
            data_r, data_i); //no data is used because I calculate beta[i] outside of the ODE.
        }
    }
}
model {
    vector[nData] AllX;
    vector[nFiltd] Dead;
    vector[nData] AllX_amp;
    vector[nFiltd] Dead_amp;

    mu ~ gamma( (0.024/0.003)^2, 0.024/(0.003^2) );

    alpha ~ gamma( (3.0/6)^2, 3.0/(6^2) );
    Bmax ~ gamma( (2.0/2)^2, 2.0/(2^2) );
    hill ~ gamma( (2.0/1)^2, 2.0/(1^2) );

    EC50[1] ~ gamma( (0.03125/0.1)^2, 0.03125/(0.1^2) ); //K12
    EC50[2] ~ gamma( (4.0/8)^2, 4.0/(8^2) ); //Ec38

    amp[1] ~ gamma( (1.0/0.5)^2, 1.0/(0.5^2) );
    amp[2] ~ gamma( (1/2.0)^2, 3e4/(3e4*2)^2 );

    noise_var[1] ~ inv_gamma( 2+(5.0/20)^2, (1+(5.0/20)^2)*5.0 );
    noise_var[2] ~ inv_gamma( 2+(20.0/40)^2, (1+(20.0/40)^2)*20.0 );
    noise_cv ~ gamma( (0.05/0.1)^2, 0.05/(0.1^2) );

    InocErr ~ gamma( 3, 3 );

    for (i in 1:nData){
        AllX[i] = system[i, 1] + system[i, 2];
        AllX_amp[i] = AllX[i]*amp[ReverseTx[i]];
        AllX_obs[i] ~ normal(AllX_amp[i], (noise_cv*AllX_amp[i] + noise_sd[ReverseTx[i]]));
    }

    for (i in 1:nFiltd){
        Dead[i] = system[i, 2];
        Dead_amp[i] = Dead[i]*amp[ReverseTx[i]];
        Dead_obs[i] ~ normal(Dead_amp[i], (noise_cv*Dead_amp[i] + noise_sd[ReverseTx[i]]));
    }
}

1-gamma_p is gamma_q as I recall - maybe use that directly?

Other than that:

  • try to avoid the machineeps if branching if you can. Maybe just add 1E-8 to the initial to ensure sufficient positivity… but this is probably not the problem you have right now, so solve that later.
  • try to find out a bit more for what values this goes crazy (print statements, solving this thing in R also gives better debugging)
  • it sounds as if you have issues with the numerics in some way I don’t see immediately. Maybe some special case for “hazard” leads to these issues you have and you can catch it with an if statement to return the right limit case.

If all of the above does not help… then a synthetic data set to run this could help (but I am tight with time right now, so no promise from my end).

Thanks for the advice so far! Alas, still no solution.

I have found that four sets of inputs made in my synthetic data code (see above R code in post #1), with certain initial parameter values (but not all), will cause the value of hazard to become Inf because of numeric issues with gamma_p or gamma_q.

Case 1: Data points 272, 280, 288
Inputs at i = 272 (and thus also 280 and 288) are:

  • AllX_obs[272] == 16.77046
  • Dead_obs[272] == 6.382884
  • Inoc[272] == 0
  • Time[272] = 60
  • Drug[272] == 1
  • Strain[272] == 2
  • BatchCulture[272] == 34
  • ReverseTx[272] == 1

Initial parameter values were:

  • Live == 0 (in second argument to the ODE step function)
  • Dead == 0 (in second argument to the ODE step function)
  • mu == 0.0248348
  • alpha == 2.82477
  • beta == 0.269412
  • time == 226.531 (the first argument for the ODE step function)

Case 2: Data points 320, 328, 336
Inputs at i = 320 are:

  • AllX_obs[320] == 5.138776
  • Dead_obs[320] == 11.60072
  • Inoc[320] == 0
  • Time[320] == 45
  • Drug[320] == 1
  • Strain[320] == 2
  • BatchCulture[320] == 40
  • ReverseTx[320] == 1

Initial parameter values were:

  • (same as Case 1) Live == 0
  • (same as Case 1) Dead == 0
  • (same as Case 1) mu == 0.0248348
  • (same as Case 1) alpha == 2.82477
  • (same as Case 1) beta == 0.269412
  • time == 169.898

Case 3: Data points 800, 808, 816
Inputs at i = 800 are:

  • AllX_obs[800] == 88.12069
  • Dead_obs[800] == 52.93195
  • (same as Case 1) Inoc[800] == 0
  • (same as Case 1) Time[800] = 60
  • (same as Case 1) Drug[800] == 1
  • (same as Case 1) Strain[800] == 2
  • (same as Case 1) BatchCulture[800] == 34
  • ReverseTx[800] == 2

Initial parameter values were:

  • (Same as Case 1) including time == 226.531

Case 4: Data points 848, 856, 864
Inputs at i = 848 are:

  • AllX_obs[848] == 94.45259
  • Dead_obs[848] == 117.5531
  • (Same as Case 2) except ReverseTx[848] == 2

Initial parameter values were:

  • (Same as Case 2), including time == 169.898

The exact values for time of 169.898 and 226.531 show up on different runs with different initial parameter values guesses. Not sure what is special about these values. How is it that the ODe solver hits underflow issues at the same time step?
I was using the bdf solver. With rk45, I haven’t seen this situation in the last 5 attempts (but I still get ‘gradient not finite’ warnings). I don’t recall if I’ve ever seen this situation with rk45.

In R, a call to the built-in function pgamma(shape=2.47, rate=0.28, q=169.898) returns 1, rather than 0.9999…something, and the expression for hazard evaluates to Inf. Same result if one calls pgamma(shape=2.47, rate=0.28*169.898). I take this to be a sign of underflow.

Main question for this post: Why is the time parameter passed by integrate_ode_*() to my ODE function taking on a value of 169.898 (or 226.531) when I am asking it to solve from t=0 to t=60 (or to t=45)?

To catch when gamma_p is small, I tried making an if/else statement to catch underflow and instead use

            hazard = (beta + (1-alpha)/mytime + (alpha-1)/(beta*(mytime^2))
                    + (alpha-1)*(alpha-3)/((beta^2)*(mytime^3))
                    + ((alpha^3)-9*(alpha^2)+21*alpha-13)/((beta^3)*(mytime^4)));

This expression is the Laurent series expansion of the hazard expression at t = infinity. Is this a gradient-compatible way to avoid under/overflow?

I tried placing the above expression inside of several different if/else statements, one at a time (see code below). But in all cases, I still get “Rejecting initial value” warnings.
Here’s the changed code block in its entirety:

        //if (gamma_q(alpha, beta*mytime) < 1e-7){
        //if (time > 145){
        if (gamma_p(alpha, beta*mytime) > (1 - 1e-7)){
            print("here ", {Live, Dead, mu, alpha, beta, mytime});
            print(gamma_p(alpha, beta*mytime));
            print(exp( (alpha-1)*log(mytime) - lgamma(alpha) - beta*mytime
                        + alpha*log(beta) - log1m(gamma_p(alpha, beta*mytime)) ));

            hazard = (beta + (1-alpha)/mytime + (alpha-1)/(beta*(mytime^2))
                    + (alpha-1)*(alpha-3)/((beta^2)*(mytime^3))
                    + ((alpha^3)-9*(alpha^2)+21*alpha-13)/((beta^3)*(mytime^4)));

            print(hazard);
        } else {
            hazard = exp( (alpha-1)*log(mytime) - lgamma(alpha) - beta*mytime
                    + alpha*log(beta) - log1m(gamma_p(alpha, beta*mytime)) ); //log(gamma_q(alpha, beta*mytime)) );
        }

        if (is_nan(hazard)){
            print("hazard is NAN");
            print({Live, Dead, mu, alpha, beta, mytime});
            //hazard = beta; //The limit of the hazard expression as t goes to infinity. This expression works when it replaces the gamma-containing hazard expressions.

            //Below is the first few terms of the Laurent series of hazard at t = infinity.
            //Expression was Calculated with Mathematica
            //Values appears correct to the 6th decimal when I compare it with R's built-in
            //pgamma(shape=alpha, rate=beta, q=100) function at high values of q.
            hazard = (beta + (1-alpha)/mytime + (alpha-1)/(beta*(mytime^2))
                    + (alpha-1)*(alpha-3)/((beta^2)*(mytime^3))
                    + ((alpha^3)-9*(alpha^2)+21*alpha-13)/((beta^3)*(mytime^4)));
        }
        if (is_inf(hazard)){
            print("hazard is INF");
            print({Live, Dead, mu, alpha, beta, mytime});
            hazard = (beta + (1-alpha)/mytime + (alpha-1)/(beta*(mytime^2))
                    + (alpha-1)*(alpha-3)/((beta^2)*(mytime^3))
                    + ((alpha^3)-9*(alpha^2)+21*alpha-13)/((beta^3)*(mytime^4)));
        }

Sometimes, hazard was neither Inf nor Nan for any of the data points (since nothing was printed), yet the initialization was still rejected. So the Inf value of hazard cannot be the only reason my Stan code cannot start. Is this reasoning sound?

Sometimes, within a set of 100 initialization attempts, there are no printouts saying hazard is Inf. Other times, the situation happens for >75% of initializations. Sometimes, only 2 of the 4 cases above appear. This is when I rerun the model, with our without compiling. I would think then that the synthetic data random number generation is having an effect, as that is the only thing different between runs.

Replacing 1-gamma_p with gamma_q does not resolve the error, but it does give a different error message:

Exception: grad_reg_inc_gamma: is not converging

This error message occurs for regardless of whether I remove/include any of the if/else blocks. (First if/else block is the “machineeps” if/else statement in the model block. The second if/else block is the new code I wrote in the function ODEstep(), copied above.) Same error message appears for either rk45 or bdf.

The problem was simpler than I thought.

I my expression for hazard, I have a log(time) term. My initial time for the ODE was set to 0, so NaNs were generated for that first step (only) of the ODE solver. Changing the init_time function argument to 1e-8 instead of 0 allows my Stan code to run without issue.

Interestingly, when I was testing the ODE solver in R after calling expose_stan_functions, NaNs only showed up in the ODE solutions when certain parameter values were chosen. So it took me a while to realize the Stan code was always returning [NaN, NaN] as my ODE solution.