Divergences in a prior predictive check, but only when a transformed parameters block is included?

OS: Windows 10 Home Edition, Ver. 1803
rstan: 2.18.2
stan: 2.18.0
R: 3.5.1

I’m trying to do a lazy prior predictive check on a very nonlinear model for a chemical reactor. I think the more correct way to sample from priors is by sampling independently from the marginal priors in R, than passing them to Stan as initial values with algorithm = "fixed_param". I wanted to get around this by just using NUTS to take a large sample from the priors, then take a small sample (~9 iterations). This works fine when I comment out my transformed parameters block, but strangely when I uncomment the block, recompile, and sample I get ~70%-90% divergences and large bias. This goes against how I thought Stan works, since the log posterior should only depend on what’s happening in the model block. I tried getting a reproducible example in a separate file, but then it stopped happening. When I went back to trying to process the results from the transformed parameters block, the problem reappeared, even while using the same seed. I could always fix this problem by writing extra R code, either for sampling the priors or for processing the raw parameters; the latter wouldn’t be feasible for when I get to the point of fitting my real data to this model if the problem persists.


Stan model:

data {
  int    n_trials ; // Number of trials
	real T[n_trials]; // Absolute temperature (K)
}

transformed data {}

parameters {
	// Reaction parameters
	real     T_10p; // Temperature required for 10% conversion at 100 ml/min
	real del_log_A; // Difference in log pre-exponential factors
	
	real E_a1 ; // Activation energy for rxn 1 (10^4 J/mol)
	real del_E; // Difference in activation energies (10^4 J/mol)
}

transformed parameters {
  //// Declarations

	// Constants
	real z_a0 ;
	real r_gas;
	real w_cat;
	real v_flo;
	real omega[2];

	// Processed reaction parameters
	real E_act[2];
	real log_A[2];

	// Reaction
	matrix[n_trials, 2] k_rxn; // 1st order rate constants (mol/g/s/Pa)
	matrix[n_trials, 3] z_hat; // Thermal response peak means

	//// Assignments

	// Constants
	z_a0  = 80.0; // Bypass thermal response peak = initial condition
	r_gas = 8.3144598; // J/mol/K
	w_cat = 1.5; // g catalyst
	v_flo = 100.0/60.0e6; // 100 ml/min -> m^3/s
	omega = {85.0/86.0, 85.0/64.5};

	// Processed reaction parameters
	E_act[1] = E_a1 * 1e4;
	E_act[2] = E_act[1] + del_E * 1e4;
	log_A[1] = E_act[1]/r_gas/T_10p + log(v_flo/r_gas/T_10p/w_cat * -log1m(0.10));
	log_A[2] = log_A[1] + del_log_A;


	for (i in 1:n_trials) {

		//// Reactor

		// Arrhenius equation for reaction rate constants
		k_rxn[i, 1] = exp(log_A[1] - E_act[1] / r_gas / T[i]);
		k_rxn[i, 2] = exp(log_A[2] - E_act[2] / r_gas / T[i]);

		// Analytical solution
		z_hat[i, 1] = z_a0 * exp( -r_gas*T[i]/v_flo * sum(k_rxn[i]) * w_cat);        // Reactant A
		z_hat[i, 2] = k_rxn[i, 1] / sum(k_rxn[i]) * (z_a0 - z_hat[i, 1]) / omega[1]; // Product D
		z_hat[i, 3] = k_rxn[i, 2] / sum(k_rxn[i]) * (z_a0 - z_hat[i, 1]) / omega[2]; // Product U
	}
	
}

model {
	//// Priors
	
	// Reaction
	T_10p     ~ normal(383.0, 10.0) T[0.0,];
	del_log_A ~ normal(  1.0,  1.0)        ;
	E_a1      ~ normal(  5.0,  1.0) T[0.0,];
	del_E     ~ normal(  3.0,  1.0)        ;
	
}

generated quantities{}


R code used in both cases within a R markdown document

# Model compilation
compiled_model <- stan_model("stan/model.stan")

# Data
n_trials <- 100
T <- seq(5, 780, length.out = n_trials)

# Sampling
raw_pars  <- c("T_10p","del_log_A","E_a1","del_E")

prior_draws <- 
  sampling(
    compiled_model,
    pars = raw_pars,
    chains = 1,
    iter = 2000,
    warmup = 1000,
    seed = 11222018
  )

Results with transformed parameters block commented out:


Results with transformed parameters block:


Notice the large sampling bias in parameter E_a1:


For context, this is the result of the transformed parameters block that I’m trying to use to tune my priors; I want the model to explore pairs of competing reactions that activate in the range 100 C - 400 C.

Looks like there are some divide by zeros and that’s causing some sort of exception to be thrown that is killing the sample. It’s just getting binned like a divergence, but you’re right, it’s weird.

If you add tiny positive numbers to your k_rxn numbers, the errors will go away:

z_hat[i, 2] = k_rxn[i, 1] / sum(k_rxn[i] + 1e-12) * (z_a0 - z_hat[i, 1]) / omega[1]; // Product D
z_hat[i, 3] = k_rxn[i, 2] / sum(k_rxn[i] + 1e-12) * (z_a0 - z_hat[i, 1]) / omega[2]; // Product U

Not a great fix. Is it possible they’re scaled wrong?

You can search for stuff like this by commenting out sections of code and throwing in print statements. Not glamorous but it works. Stuff like:

if(is_nan(z_hat[i, 2])) {
  print("i", i);
  print("z_hat", z_hat[i]);
  print("k_rxn", k_rxn[i]);
  print("omega", omega);
}

I actually just figured out the problem this morning, and I think you’re right. My temperature range input data T included values that were too small. So changing

T <- seq(5, 780, length.out = n_trials)
to
T <- seq(100, 780, length.out = n_trials)

fixed everything. I think that at these low values of T the parameter E_a1 wasn’t able to go to values above ~1.3 without introducing numerical issues. Still, it surprises me that issues unconnected to sampling will actually cause divergences. I also think that when I tried to make a reproducible example I must have been using a different input for T without realizing it.

I also tested the lowest I could make T and at around 30 it causes divergences.

Now it seems obvious that near zero kelvin my reaction rate constants would go to zero, and anywhere that they appeared in the denominator would cause problems. Ironically though, when I started doing a prior predictive check I found out that my priors were unrealistically producing massive rate constants near 0 K. Once I fixed my priors I created these division by 0 issues without realizing it.

Yeah, it’s not really a divergence here. I think at least std::domain_errors can get caught (the math is all C++ under the hood) and the sampler will just skip that sample and go on to the next thing. Must just be counted as a divergence for bookkeeping’s sake.

Glad to hear the modeling is making some sense!

1 Like