Pystan converting reals to ints?

I have a variable taking on values greater than 5B. I want to transform it in my Stan model, to estimate some parameters (versus transforming it before including in model_data). But when I fit the model, I get the error:

OverflowError: value too large to convert to int

I understand that the maximum value for integers is 2^{31} - 1 ~= 2B. But my variable is a real, so why is Stan converting to int?

Here’s some code to reproduce the problem:

Stan model:

data {
  int N;
  real y[N];

parameters {
  real theta;
  real<lower=0> sigma;

model {
  y ~ normal(theta,sigma);

Fit the model:

sm = pystan.StanModel(model_code=model_code, verbose=False)
model_data = {
    'N': N,
    'y': y,
sm_fit = sm.sampling(data=model_data, iter=1000, chains=4)

Here’s a histogram of my data:

If I generate data on a similar scale, I don’t get the same error:

N = 100
y = np.random.exponential(scale=1e11, size=N)

With this data, running sm.sampling works fine.

So it seems the problem is the variance in the data, as opposed to the scale.

To check this, I was indeed able to recreate the error using:

model_data = {
  'N': 10,
  'y': [500,1000,50000000000,10000,1,2,3,4,5,6]

So does Pystan convert reals to ints when the data has very large variance?

I’m using PyStan in a Python notebook on Databricks.

In the toy example you provide, does the same error occur if you use 50000000000.0 instead of 50000000000?

Note also that Pystan 2 is no longer being maintained

1 Like

Ah, that does solve the error.

And returning to my original data, converting to float64 also solves the problem.

My guess is that PyStan2 is making some assumption that a Python int is convertible to a 64-bit integer under the hood, but this is not true (Python uses by default what some other languages call "BigInt"s that can be variable sizes)