Causes for numerical discrepancies, or, help me figure out how these two model CPP files differ

Hey all,

We’re at the point on the stanc3 compiler where we have just one remaining model failure[0] on the new compiler and it’s just about ready for beta testing. But that last model failure is a very annoying slight numerical precision difference only present on this one linux machine (but not my linux machine at home running the same version of Ubuntu) and it involves gamma. I’m hoping to enlist the numerical experts here to ask what might be different between these two C++ model translations. The source Stan model is here; stan 2 output is here and stanc3 output is here.

It’s only a single parameter that fails the gold tests on our gelman-group-linux Jenkins machine: you can see if you click on the failing test: param beta.25 got mean -0.96411134, gold has mean -0.96411136 and stdev 0.813919082658 so we’re off by 2e-8. There is definitely something different about these two C++ files as the stanc3 one is 20% faster.

Tagging @Bob_Carpenter, @wds15, and @syclik as a few folks who have been helpful with these issues in the past.

[0] out of all the models we know how to automatically test, a set which does not have complete coverage of all language features.

1 Like

If they’re off by 2e-8, the question arises as to the old one or the new one is correct, or both and they just vary by order of evaluation somewhere.

Nothing jumps out at me as much as the use of assgn everywhere. How is compile time?

Also, we only need to deep copy containers, not scalars. The earlier tests on that were too strict, so we did too much copying in most cases.

1 Like

Compile times haven’t jumped out at me at all, though right now when I measure I’m also including the time to compile the compiler (as a sort of accidental byproduct) and the new compiler is much faster to compile, so that could be erasing a difference there.

I can fix those deepcopies, I think it’s fixed in a languishing PR but I can move it out. I’ll report back.

I’m inclined to just say that these are close enough, especially since this discrepancy doesn’t show up on my home Linux machine running the same OS and compiler.


I read through all the code that’s generated and I can’t see any way it’d be off other than floating point.

There is a slight difference in initialization in the log_prob function.


int oldn(0);
(void) oldn;  // dummy to suppress unused var warning
stan::math::fill(oldn, std::numeric_limits<int>::min());


  int oldn; 
  real m[N1];

This will cause differing behavior for unitialized variables. We need to make clear what the value is in the spec—I’d rather tighten the Stan spec rather than leave bits of it undefined like C++.

P.S. I need to get busy rewriting the sample models. That model’s terrible! We want to bring these up to Stan standards of priors and code (e.g., no diffuse normal or inverse gamma priors, non-centered parameterizations, priors on scales not variances, compound declare-define instead of define and later <-, etc. etc. That model, for example, should look something like this (the priors should be set to be weakly informative for the problem).

data {
  int<lower = 0> N1;
  int<lower = 0> N;
  real Yvec1[N1];
  real tvec1[N1];
  int<lower = 0> idxn1[N1];
  real y0[N]; 
transformed data {
  real y0_centered = y0 - mean(y0); 
parameters {
  real<lower = 0> sigma_y;
  real<lower = 0> sigma_alpha;
  real<lower = 0> sigma_beta; 
  real alpha0; 
  real beta0; 
  real<offset = alpha0, multiplier = sigma_alpha> alpha[N]; 
  real<offset = beta0, multiplier = sigma_beta> beta[N]; 
  real gamma; 
model {
  vector[N1] mu = alpha[idxn1]
      + beta[idxn1] * (tvec1 - 6.5)
      + gamma * y0_centered[idxn1];
  Yvec1 ~ normal(mu, sigma_y);
  alpha ~ normal(alpha0, sigma_alpha); 
  beta ~ normal(beta0, sigma_beta); 
  sigma_y ~ normal(0, 10);
  sigma_alpha ~ normal(0, 10);
  sigma_beta ~ normal(0, 10);
  alpha0 ~ normal(0, 5);
  beta0 ~ normal(0, 5);
  gamma ~ normal(0, 5);

It requires the 2.19 feature of offset/multiplier to get a non-centered parameterization without auxiliary variables.