Model running extremely slow

Hi all,

After running a very simple model (exponential decay ODE), including two random effects for the initial value of AB and the decay rate, I am experiencing that the model runs extremely slow. (Gradient evaluation took 0.063 seconds).
I am specifying the initial values as:

rAB0 = rep(0, participants)
rmuab = rep(0, participants)
r = cbind(rAB0, rmuab)
initebl = function(){
list(muab = -0.02,
logAB0 = 5, logsigmaAB0 = 0.5, logsigmamuab = 0.5,
rho_rhobit = 0.5, sigma = 1, r = r)
}

Any idea why could that be?

functions {
  real[] edecay(real t, real[] y, real[] theta,  real[] x_r, int[] x_i) {
    real dydt[1];
    dydt[1] = -theta[1]*y[1];
    
    return dydt;
  }
}
data {
  int<lower=1> N;
  int<lower=1> nsubjects;
  int<lower=1> subj_ids[N];
  real <lower=0> y[N];
  real ts[N];
  real t0;
}
transformed data {
    real x_r[0];
    int x_i[0]; 
}
parameters {
  real muab;
  
  real logAB0;
  real <lower=0> logsigmaAB0;
  real <lower=0> logsigmamuab;
  real <lower= 0> sigma;
  real rho_rhobit;
  vector[2] r[nsubjects];
}
model {
  real new_mu[N,1];
  real mu[1,1];
  real eval_time[1];
  real theta[1];
  vector[nsubjects] rAB0;
  vector[nsubjects] rmuab;
  
  real inits[1];
  
  real AB0 = exp(logAB0);
  real sigmaAB0 = exp(logsigmaAB0);
  real sigmamuab = exp(logsigmamuab);
  
  vector[2] mean_vec;
  matrix[2,2] Sigma; 
  real rho = (exp(rho_rhobit) - 1)/(exp(rho_rhobit) + 1);
  
  //prior distributions
  muab ~ normal(0, 10);
  logAB0 ~ normal(5, 10);
  sigma ~ normal(1, 10);
  logsigmaAB0 ~ cauchy(0, 1);
  logsigmamuab ~ cauchy(0, 1);
  rho_rhobit ~ uniform(-10,10);
  
  mean_vec[1] = -pow(sigmaAB0, 2.0)/2;
  mean_vec[2] = -pow(sigmamuab, 2.0)/2;
  
  Sigma[1,1] = pow(sigmaAB0, 2.0);
  Sigma[2,2] = pow(sigmamuab, 2.0);
  Sigma[1,2] = rho*sigmaAB0*sigmamuab;
  Sigma[2,1] = rho*sigmaAB0*sigmamuab;
  
  
  for (subj_id in 1:nsubjects) {
        r[subj_id] ~ multi_normal(mean_vec, Sigma);
        rAB0[subj_id] = r[subj_id,1];
        rmuab[subj_id] = r[subj_id,2];
  }
  
  for (obs_id in 1:N){
    inits[1] = AB0*exp(rAB0[subj_ids[obs_id]]);
    
    theta[1] = muab*exp(rmuab[subj_ids[obs_id]]); 
     
    eval_time[1] = ts[obs_id];
    if (eval_time[1] == 0){
      new_mu[obs_id,1] = log(inits[1]) - pow(sigma, 2.0)/2;
      }
    else{
      mu = integrate_ode_rk45(edecay, inits, t0, eval_time, theta, x_r, x_i);
      new_mu[obs_id,1] = log(mu[1,1]) - pow(sigma, 2.0)/2;
    }  
    //likelihood function for AB levels
    y[obs_id] ~ lognormal(new_mu[obs_id,1], sigma); 
  }
}
generated quantities {
  real z_pred[N];
  real sigma2 = sigma;
  for (t in 1:N){
   z_pred[t] = lognormal_rng(log(y[t] + 1e-5), sigma2); 
  }
}

Often the problem here is that random initializations in the tails cause stiffness in the differential equation, which manifests in many iterations and slow log density and gradient evaluation. Does the problem persist if you let it sample long enough to get near an answer?

If not, you might want to try using our stiff solver, which in general runs more slowly for non-stiff problems, but much faster for stiff problems. See this section of our function reference doc:

If you have an exponential decay ODE, you can probably write down an analytic solution rather than relying on a general ODE solver.

P.S. You can simplify your system code body to:

return {-theta[1] * y[1]};

Also, expressions like this using exp directly can be bad news due to overflow and underflow, depending on the scale of rho_rhobit.

  real rho = (exp(rho_rhobit) - 1)/(exp(rho_rhobit) + 1);

Also, you are trying to put a distribution on rho_rhobit, which is a transform. For this to make sense, you need to include the Jacobian adjustment for the transform, which is:

target += log absolute derivative of rho w.r.t. rho_rhobit

Also, rather than implementing rho as you do, I’d suggest implementing as

real rho = tanh(0.5 * rho_rhobit);

or just doubling the scale of rho_rhobit and dropping the 0.5.

1 Like

Oh, and there’s a lot more you can do in the model for efficiency. Check out the User’s Guide chapter on efficiency.

Ack, the index is broken in our docs (I just opened an issue) but you can start here:

The biggest issue is to not repeat computations, but to re-use already computed values. It saves a ton on autodiff and is what our vectorized functions do internally. Also, use our built-ins, like square(x) instead of pow(x, 2) (you don’t need the .0). And you can vectorize your generated quantities as this:

generated quantities {
  array[N] real z_pred = lognormal_rng(log(y + 1e-5), sigma);
}

No need to copy sigma into sigma2. But I have no idea what this is doing. It’s not predicting new y, it’s just adding lognormal noise around a jittered value of log(y[t]). To generate simulated data, you’d want to bring in new predictors, or to do posterior predictive tess, you’d just want to simulate y_sim as

generated quantities {
  array[N] y_sim = lognormal_rng(new_mu[ , 1], sigma);
}
1 Like

Hi,

Thanks for the reply. I will try the stiff solver. So far, this runs (without specifying a correlation nor a generated quantities block). I get a warning like:

Lognormal_lpdf: location parameter is inf, but must be finite! line 109

When I run the code with stan function, I see Rhat going beyond 1. I have changed the times (sorted from min to max, or per subject (e.g., 3 times per subj)). Unfortunately, this still does not solve it. Also, the same occurs when I write the eqs analytically in other program.

functions {
  real[] sldecay(real t, real[] y, real[] theta,  real[] x_r, int[] x_i) {
    real dydt[3];
    dydt[1] = -theta[1]*y[1];
    dydt[2] = -theta[2]*y[2];
    dydt[3] = theta[1]*y[1] + theta[2]*y[2] - theta[3]*y[3];
    
    return dydt;
  }
}
data {
  int<lower=1> N;
  int<lower=1> nsubjects;
  int<lower=1> subj_ids[N];
  real <lower=0> y[N];
  int <lower=1> ntimes;
  real ts[ntimes];
  real t0;
}
transformed data {
    real x_r[0];
    int x_i[0]; 
}
parameters {
  real  logps;
  real  logpl;
  real  logmuab;
  real  sigmalogmuab;
  
  real <lower= 0> logAB0;
  real  sigmalogAB0;
  real  logps0;
  real  logpl0;
  real   sigma;
  
  vector[2] r[nsubjects];
  //real rho_rhobit;
}
model{
  real mu[1, 3];
  real inits[3];
  real theta[3];
  real new_mu[ntimes, 3];
  vector[nsubjects] rAB0;
  vector[nsubjects] rmuab;
  real eval_time[1];
  // transformed parameters 
  real ps = exp(logps);
  real pl = exp(logpl);
  real muab = exp(logmuab);
  real sigma_muab = exp(sigmalogmuab);
  real AB0 = exp(logAB0);
  real sigma_AB0 = exp(sigmalogAB0);
  real ps0 = exp(logps0);
  real pl0 = exp(logpl0);
  real sigmap = exp(sigma);
  // defining the correlation 
  vector[2] mean_vec;
  matrix[2,2] Sigma;
  //real rho = (exp(rho_rhobit) - 1)/(exp(rho_rhobit) + 1);
 
  //prior distributions
  logps ~ normal(0, 1);
  logpl ~ normal(0, 1);
  logmuab ~ normal(0, 10);
  sigmalogmuab ~ normal(0, 1);
  logAB0 ~ normal(3, .5);
  sigmalogAB0 ~ normal(0, 1);
  logps0 ~ normal(0, 1);
  logpl0 ~ normal(0, 1);
  sigma ~ normal(0, 1);
  //rho_rhobit ~ uniform(-10, 10);
  
  //cov matrix
  mean_vec[1] = -pow(sigma_AB0, 2.0)/2;
  mean_vec[2] = -pow(sigma_muab, 2.0)/2;
  
  Sigma[1,1] = pow(sigma_AB0, 2.0);
  Sigma[2,2] = pow(sigma_muab, 2.0);
  Sigma[1,2] = 0;
  Sigma[2,1] = 0;
  
  
  for (j in 1:nsubjects) {
    r[j] ~ multi_normal_cholesky(mean_vec, Sigma);
    rAB0[j] = r[j,1];
    rmuab[j] = r[j,2];
  }
  for (i in 1:N){
    //defining the initial conditions
    inits[3] = AB0*exp(rAB0[subj_ids[i]]);
    inits[1] = inits[3]/ps0;
    inits[2] = inits[3]/pl0;
    //defining the parameters
    theta[1] = ps;
    theta[2] = pl;
    theta[3] = muab*exp(rmuab[subj_ids[i]]);
    // ode solver
    eval_time[1] = ts[i];
    if (eval_time[1] == 0){
      new_mu[i,1] = inits[1];
      new_mu[i,2] = inits[2];
      new_mu[i,3] = log(inits[3]) - pow(sigmap, 2.0)/2;}
    else{
      mu = integrate_ode_rk45(sldecay, inits, t0, eval_time, theta, x_r, x_i);
      new_mu[i,1] = mu[1,1];
      new_mu[i,2] = mu[1,2];
      new_mu[i,3] = log(mu[1,3]) - pow(sigma, 2.0)/2;}
      y[i] ~ lognormal(new_mu[i, 3], sigmap);
    }
}