I have a variable taking on values greater than 5B. I want to transform it in my Stan model, to estimate some parameters (versus transforming it before including in model_data
). But when I fit the model, I get the error:
OverflowError: value too large to convert to int
I understand that the maximum value for integers is 2^{31} - 1 ~= 2B. But my variable is a real, so why is Stan converting to int?
Here’s some code to reproduce the problem:
Stan model:
data {
int N;
real y[N];
}
parameters {
real theta;
real<lower=0> sigma;
}
model {
y ~ normal(theta,sigma);
}
Fit the model:
sm = pystan.StanModel(model_code=model_code, verbose=False)
model_data = {
'N': N,
'y': y,
}
sm_fit = sm.sampling(data=model_data, iter=1000, chains=4)
Here’s a histogram of my data:
If I generate data on a similar scale, I don’t get the same error:
N = 100
np.random.seed(123)
y = np.random.exponential(scale=1e11, size=N)
With this data, running sm.sampling
works fine.
So it seems the problem is the variance in the data, as opposed to the scale.
To check this, I was indeed able to recreate the error using:
model_data = {
'N': 10,
'y': [500,1000,50000000000,10000,1,2,3,4,5,6]
}
So does Pystan convert reals to ints when the data has very large variance?
I’m using PyStan 2.19.1.1 in a Python notebook on Databricks.