Same code, different machines –> very different results. Why?

Hi,
I’m a newcomer to PyStan, and I am investigating using it to fit some XPS data. At the moment I am synthesising some data and making sure that my Stan model handles that well.

# Generate some synthetic data from the model.
N = 500
center=284
amp =1000
FWHM=2.
m=0.5
a=2.
b=1

f_true=0.02104 # a scale factor for adding the noise to the y data
x = np.sort(5*np.random.rand(N)+282)
yerr = 0.1+0.5*np.random.rand(N)
y = pV(m,x, center, FWHM,a,amp)
y += np.abs(f_true*y) *np.random.randn(N)
y += yerr * np.random.randn(N)
# convert the np arrays into list then to pandas dataframe
xpsdata = pd.DataFrame({'BindingEnergy': x, 'Counts': y})
model = """
functions {
  
  real wx(real x, real w0,real a ){
  // sigmoid function. x is E-E0, w0 is related to width of sigmoid (can be different to E0)
      return ((2.*w0)/(1.+exp(-a*(x))));
  }
  real pV(real m, real E, real E0,real w0, real a,real amp){
  // pseudo Voigt function
  // m is mixing ratio of lorentzian to gaussian
  // E0 is centre of peak
  // E is energy of evaluation
  // w0 is the centre of the sigmoid function
  // a is asymmetry
  real x=E-E0;
  real w=wx(x,w0,a);
  real w2=pow(w,2);
  return (amp*((1.-m)*sqrt((4.*log(2))/(pi()*w2))*exp(x*x*(-4.*log(2))/w2) + (4.*m*w)/(2.*pi()*(w2 + x*x))));
  }

}
data {
    int<lower=1> N; // number of data points
    vector[N] x;    // x data
    vector[N] y;    // y data
}
parameters {
    real<lower=280> center;
    real<lower=100> amp;
//    real<lower=0> sigma;
    real<lower=0> sdev;   
    real<lower=0> a;
    real<lower=0> w0;
    real<lower=0> m;
}

transformed parameters {
    vector[N] pVmodel; 
    for (i in 1:N){
      pVmodel[i]= pV(m,x[i],center,w0,a,amp);  
    }
    
    }

model {    
    y ~ normal(pVmodel , sdev);
    amp ~ normal(1000, 100);
    center ~ normal(284,2);
    w0 ~ normal(2,.1);
    sdev ~ normal(0,1000);
// note can put priors on the model parameters as above. The widths are important
    a ~ normal(2,0.1);
    m ~ normal(0.5,0.1);
}
"""

The model is called via:

fit = sm.sampling(data=data, iter=4000, chains=4, warmup=500, thin=1, seed=101)

When I run this on my MacBook Pro I get a decent fit to the data:
Stanik-fit-MBP

However, when I run the same code on a MacPro I get horrible fits:
Stanik-fit-MP
(the solid line is the mean, the fainter lines are samples)

Both machines are running in new Conda environments with PyStan 2.18.0.0, and both have the latest version Mojave as the OS. The main difference is the hardware, the MacPro is rather old (mid 2010 model), the MacBook is 2017 vintage.

I’m sticking with the MacBook for now, but I’d like to understand why the same code gives such different results.

Thanks

Simon

Judging from the code that you posted, it’s not clear if you used the exact same simulated data on both machines. I don’t see a call to np.random.seed(), and it isn’t clear

  • if you generated the simulated data once, saved it to a file, and then used the file on both machines, or
  • if you ran the same Python script to generate the random data on each machine, possibly generating different data on each machine.

ETA: It would also be helpful to see any diagnostics on your results, such as divergences, tree depths, Rhat values far from 1, etc. Printing a summary of the fit may help as well.

If I’m reading this right blue dots are data and the line is y=pV(…) using the fitted parameters. Parameter center is the x value where y peaks and amp is the maximum y value at the peak. The solid mean line in the second graph is nonsense because the samples come from two distinct populations (corresponding to different MCMC chains). There are some samples which match the shape and peak position of the data but somehow have a too high amp. There are other samples that have peak position that is way too low.

I’m guessing that if a chain starts sampling with an initial value of center that is outside the range of the data it gets stuck and cannot move the peak. Since you’ve declared lower=280 for center Stan’s default initialization is around 281 although values up to 287 are possible. It might also be a problem that amp initializes to a value that is way too low. Stan works better if you scale your data so that priors are around 1.

I don’t know how different behavior could be caused by different hardware, I’d expect some runs to look just fine and some to fail at random.

Also: Stan has builtin sigmoid function, it’s called inv_logit(), and I don’t know anything about Voigt functions but should m have upper bound 1?

Hi sorry about the lack of clarity. The same code was run on each machine.

I’ve been running variations on this today and it turns out its easy to get bad fits on both machines with no substantial changes to the code. I suspect I may have had a lucky seeding on the MacBook in my initial attempt.

I will dig deeper into the model.

Hi
yes the second fit is nonsense, I’m trying to understand why. I think I must have got lucky with the MacBook result.

I will have a think about your suggestions about the initial values.

Cheers

I meant taking the average of two chains that didn’t converge to the same value is nonsense. Some of the individual chains seem to fit just fine. My suggestion is to not mess with the initial values but to scale your data by 1000 so that the expected amp is around 1 (and change the priors accordingly) and set constraints

parameters {
  real<lower=min(x),upper=max(x)> center;
  real<lower=0> amp;
  ...
}

OK, I see, I’ll have a play with that.

Thanks again

Hi, thanks for the help with this–the above tips have really helped.

Better setting up of the parameters and scaling the data have helped enormously – coming from a traditional chi-squared minimisation background I guess I have a lot to learn about thinking about the implications of the set-up in each case.

Anyway the fits are more more reliable now and significantly quicker.

Cheers