Slow sampling fitting a PDE model

I have a heterogeneity PDE model and I’m trying to estimate two parameters. The model is about a bacterial system that consumes formaldehyde through converting that to methanol. Cells in this system are heterogeneous and each subpopulation has a different tolerance level to toxicity of formaldehyde. The model consists of one IDE for change in methanol concentration, one IDE for formaldehyde change and one PDE representing subpopulations of cells. As for now I’m generating some growth data by varying initial condition of formaldehyde in the model. I can get estimates but it’s not optimal. I do 100 iterations with adapt_delta=0.8 I get “divergent transitions after warmup” and “BMFI low”. I increased adapt_delta to 0.99 and number of iterations to 500, it didn’t help. Increasing iterations to more than 500 makes the sampling very slow. Any suggestion on how to increase the efficiency of sampling?
pde_stan_170822.R (7.9 KB)

Some things:

  • I see you have the pairplots. Do any of the parameters look correlated? This can be a good way to find issues with the model.

  • Are any of your parameters going off to crazy crazy values?

  • Especially with complicated models (a PDE definitely qualifies) it’s good to do generated quantities stuff to make sure you’re anywhere near your solution. (edit: even if you think everything looks normal – it’s just really hard to tell with complex models until you compare your generated stuff to your data)

  • Copy your model out to a separate file. It’ll be easier to work with :D.

Here’s a version that’s auto-indented. I don’t know why so many user take the embed-in-R-with-no-indentation approach. I’m afraid I don’t really know enough about things like PDEs to help any further!

functions {
  real[] pde(real t, real[] y, real[] eta,real[] x_r,int[] x_i) {

    //intialize parameters
    int NClasses = 10; // sub populations
    real zmin;
    real zmax;
    row_vector[NClasses] x;  // number of states in PDE == NClasses
    row_vector[NClasses] n;  // array of subpop to make nn
    row_vector[NClasses+1] nn;   // an array keeping first differences
    row_vector[NClasses+2] nnn;  // an array keeping second differences
    row_vector[NClasses] hx;
    row_vector[NClasses] FluxP1;  // advection flux
    row_vector[NClasses] FluxP2;  // diffusion flux
    real V;  // advection coefficient
    real D;  // diffusion coefficient
    real totPop; // total number of cells
    real m;      // methanol concentration
    real f;      // formaldeyde concentration

    real Dm;     // DE for change in methanol
    real Df;     // DE for change in formaldehyde
    row_vector[NClasses] Dn;  // PDE for change in cells
    zmin = 1.1;  //minimum phenotype
    zmax = 9.9;  //maximum phenotype

    x[1] = zmin;
    for (i in 2:NClasses) x[i] = x[i-1]+(zmax-zmin)/(NClasses-1); // defining x: grid in PDE
    V = eta[1]; // advection parameter to be estimated
    D = eta[2]; // diffusion parameter to be estimated
    m = y[1];    // methanol is the first DE in model
    f = y[2];     // formaldehyde is second DE in model
    n = to_row_vector(y[3:(NClasses+2)]);
    nn = append_col(n,0);
    nnn = append_col(0,append_col(n,0));

    nn[1] = 0;
    hx = f - x;
    for(i in 1:NClasses){
      hx[i] = int_step(hx[i]);
      if(hx[i] == 0)
	FluxP1[i] = 10*V*(nn[i+1] - nn[i]);
      else
	FluxP1[i] = -10*V*(nn[i+1] - nn[i]);
    }

    for(i in 1:NClasses){
      FluxP2[i] = 10*D*(nnn[i+2] - 2*nnn[i+1] + nnn[i]);
    }

    totPop = sum(n);
    //calculate the ode values, the equations are same as model in the r code
    Dm = -(4e-8)*m/(m+0.02)*totPop;
    Df = (4e-8)*(m/(m+0.02)-1.1*f/(f+0.2))*totPop;
    Dn = n*0.23*f/(f+0.2) - n .* (f-x) .* hx * 0.28;
    // Rate of change = Flux gradient + Biology
    Dn = Dn  + FluxP2 + FluxP1;
    return to_array_1d(append_col(Dm,append_col(Df,Dn)));
  } //end pde
} //end functions

data {
  int<lower=1> T; // time
  int<lower=1> C; //number of f initial conditions
  int<lower=1> NClasses; //number of phenotypic classes for the PDE
  real f0[C]; // pass a vector of length C initial fex values
  real y0[1+NClasses];     // initial values of state variables; f_0 in f0 now
  real t0;        // initial time
  real ts[T];     // real array that has n time steps
  int y[C,T,NClasses];       // getting the data, here I pass cell count
  real rel_tol;
  real abs_tol;
  int max_steps;
}

transformed data {
  real x_r[0];
  int x_i[0];
}

parameters {
  real<lower=0,upper=1> V;
  real<lower=0,upper=1> D;
}

model {
  real y_hat[T,NClasses+2];
  real y_hat_n[T,NClasses];
  row_vector[NClasses] x;
  real eta[2];
  real s0[2+NClasses]; //this will now have the initial conditions

  // priors, I want to fit V and D
  V ~ normal(0,1);
  D ~ normal(0,1);

  eta[1] = V;
  eta[2] = D;

  //kludging the x foo, 1.1 is the min z in PDE, 9.9 is max z in PDE
  x[1] = 1.1;
  for (i in 2:NClasses) x[i] = x[i-1]+(9.9-1.1)/(NClasses-1);

  //initial conditions for m, f, and n are not changing
  //so we define them outside of the loop
  s0[1] = y0[1]; //methanol
  s0[3:] = y0[2:];

  for (c in 1:C){
    //at the beginning of each c we create the correct f initial condition
    s0[2] = f0[c];

    y_hat = integrate_ode_bdf(pde, s0, t0, ts, eta, x_r, x_i, rel_tol, abs_tol, max_steps); // we pass s0 now
    y_hat_n = y_hat[,3:(NClasses+2)];

    for(j in 1:NClasses){
      y[c,,j] ~ poisson(y_hat_n[,j]); //note that I vectorized sampling and made sure to get column c of data
    }

  }
}

Thanks Ben,

  • I attached the pair plots, they don’t look like a fit so I don’t think I’m able to say anything about correlation of parameters right now, I think I need to do more sampling to be able to have an idea of what’s going on, but as I said I cannot go very far as the sampling is very slow.

  • As of now, the values of parameters don’t go crazy, I estimate two parameters V which should be 0.1 and D which should be 0.3 and I get 0.05 for V and 0.1 for D. So not very far real values but still is not what it should.

  • For generated quantities, I’m not very familiar with that. I do have this part in the model block:

     y_hat = integrate_ode_bdf(pde, s0, t0, ts, eta, x_r, x_i, rel_tol, abs_tol, max_steps); 
     y_hat_n = y_hat[,3:(NClasses+2)];
     for(j in 1:NClasses){
       y[c,,j] ~ poisson(y_hat_n[,j]);
     }
    

What kind of quantities you suggest to add in generated quantities block?

Thanks Bob for sending the model code!

Those pairs plots are from multiple chains? They pretty obviously have multi-modes but I’m not sure if that’s just because the different chains get totally stuck at different initial values and never move into the proper typical set. Given the lp__ value of positive 1e^11 I’m guessing your density has some kind of infinity in it somewhere and Stan is finding it rapidly and getting stuck. Probably this is due to an error in your code or your model. Possibly the infinity is non-normalizable. If you ask Stan to sample from a non-normalizable density with an infinity somewhere it will be attracted to that spot like a black hole. Once it gets into the vicinity it will diverge as even tiny timesteps will step into regions where your potential energy decreases unboundedly and hence your velocity increases unboundedly.

Look for bugs first in your model or code, then if you don’t find any come back and tell us what you found in your careful combing through and why lp__ really should be positive 10^11

For reference lp__ is the logarithm of the density. The normal(0,s) random variable has log-density equal to 1e11 at x = 0 when s is on the order of e^(-1e11) which is a number that might as well be something like 1/(1 followed by 10^11 zeros)

So, definitely a bug I think.

Not sure exactly what is going on in your model, but are the y values counts of cells? This is bacteria, so I assume these counts are dramatically large, like millions or billions. So when you do

y[c,j] ~ poisson(y_hat_n[,j])

you’re talking about y in the billions and the poisson distribution is delta-function like.

If this is the case, I strongly suggest you first define a typical scale for y, for example ytyp = y at end of experiment in some control conditions. Then, stop treating y as a count, and treat it as a continuous dimensionless ratio y/ytyp. Now all the values are O(1) and one cell plus or minus is irrelevant so we treat it as continuous, and we use something like normal(y_hat_n,sd_y) and decouple the mean and standard deviation this way.

On the other hand, if y really is counts, like O(100) or less then you can ignore this suggestion.

What @dlakelan said about the Poisson stuff is probly right. Since you’re already using a PDE, maybe just keep things continuous for now. It looks like the R code that is generating your data is spitting out counts in the millions, so probly fair to compute with concentrations here.

For the generated quantities, move the computation of y_hat to a transformed parameters block, and then add something like (assuming y is now a real and your likelihood is normal):

generated quantities {
  real y_hat_with_noise[C,T,NClasses];
  for(c in 1:C) {
    for(t in 1:T) {
      for(n in 1:NClasses) {
        y_hat_with_noise[c, t, n] = normal_rng(y_hat[c, t, n], sigma); // or however you parameterize noise
      }
    }
  }
}

I still think it’s a good place to start just so you can see that the PDE is working kinda well.

The parameters that you’re estimating – aren’t there ratios of these and such that’ll give you shocks and instabilities and stuff? Might have to watch out for these. I’m just wary at first of anything with advection in it – I just know those terms are harder than diffusion. If you end up needing an easier PDE, maybe just start with only diffusion and estimate one parameter?

Also, make sure and add some noise to the data you’re generating in R to test this. Sometimes the sampler can act a little funky if there’s a perfect answer to be had (if you’re fitting your exact model with no noise).

1 Like